Google Cloud Data Pipeline Patterns

Here’s the deck and screencast demos from my talks for YOWNights in Australia on Google Cloud Platform services and data pipeline patterns.

The screencast demos are linked in the slide deck (starting after slide 4) and show the following GCP services:

  • GCS – Google Cloud Storage
  • GCE  – Google Compute Engine or VMs (Linux, Windows and SQL Server)
  • BigQuery – Managed data warehouse
  • Cloud Spanner – Managed scalable RDBMS – Beta release at this time of this recording
  • Cloud Vision API – Machine Learning (Vision API)

The architecture patterns for GCP services for common data pipeline workload scenarios, include the following: Data Warehouse, Time Series, IoT and Bioinformatics.  They are taken from Google’s reference architectures – found here.

Scaling Galaxy on GCP

Below are the slides from my talk for the GAME 2017 conference  on ‘Scaling Galaxy for GCP’ to be delivered in Feb 2017 at the University of Melbourne, Australia.  Galaxy is a bioinformatics tool used for genomic research.  A sample screen from Galaxy is shown below.

Screen Shot 2017-01-26 at 6.27.56 PM.pngIn this talk, I will show demos and patterns for scaling the Galaxy tool (and also for creating bioinformatics research data pipelines in general) via the Google Cloud Platform.

Patterns include the following:

  • Scaling UP using GCE virtual machines
  • Scaling OUT using GKE container clusters
  • Advanced scaling using combinations of GCP services, such as Google’s new Genomics API, along with using Big Query to analyze variants and more.  Core GCP Services used are shown below.Screen Shot 2017-01-26 at 7.10.26 PM.png

My particular area of interest is in the application of the results of using genomic sequencing for personalized medicine for  cancer genomics This is the application of the results of the totality of DNA sequence and gene expression differences between (cancer) tumor cells and normal host cells.

Building any type of genomics pipelines is true big data work, with EACH whole genome sequencing result producing 2.9 Billion base pairs (T,A,G,C).

Screen Shot 2017-01-27 at 10.56.30 PM.png

Google has an interesting genomic browser (shown above) that you can use on the reference genomic data that they host on GCP.

SQL Server on Google Cloud Platform

screen-shot-2016-11-15-at-9-57-49-am

I recently tried out running SQL Server 2016 on a Google Cloud Platform Windows-based Virtual Machine (GCE – Google Compute Engine Service).  This is a quick way to try out new features of the latest version of SQL Server.  In this case, I wanted to test out the R (language)-in database services.

Although you can certainly ‘click’ in the GCP console to start an instance of SQL Server on GCE,  you may want to script activity (for use with the Google gcloud tool).  To that end, I created a simple script to do this.  Also I added a script to enable and test the R-in database feature.  Here’s a link to my GitHub Repo.

What do you think?  Interested to try this out?  Let me know how it goes for you.

#happyExploring

AWS, GCP, Azure Consoles Graded

I work with all three major public cloud vendors for various clients.  I find it interesting to observe the differences in their approaches to the design (and subsequent usability) of their web consoles.

AWS

The AWS console reflects the state of their services (and their market share).  It is consistent, clean and very usable.  It loads very fast on browsers I use (Chrome mostly). This page show exactly the information I need (and no more). Interestingly, it does NOT show any of my security information by default on the main page.  Services are organized in a logical way, service icons ‘make sense’ in color, type and size. The ability to add service shortcuts at the top improves usability.  Also, surfacing resource groups on the first page is great, as this is a feature I use often.

I would like to see my total AWS spend per region per account on this page as well.

Grade A

AWS.png

GCP

The GCP console recently had a major overhaul and the results are very positive.  The amount of improvement from previous version is significant.  GCP uses the concept of one or more GCP projects as containers for billing and a set of GCP service instances.  I do find this convenient because I can easily see my total project costs.  I also like the ‘Billing’ widget on the first page.

Although the list of services available in GCP is easily findable by clicking the ‘hamburger’ (three white lines) menu in the upper left, I do find that this method of showing all possible services does confuse some customers (particularly those who are moving over or adding to AWS).

One feature I particularly like is the integrated command line tool (gcloud) console.  It’s fast, usable and works great!

Although I can’t think of how to do this (I am a UX consumer – rather than designer!), I’d like to see a more intuitive way to see all of the currently enabled GCP services (and all possible services) shown in the main console window.

Grade B+

GCP-1.png

Azure

Azure uses two consoles, both an a ‘classic’ and a ‘current’ console. For the purposes of this review, I am including the ‘current’ console only. As you can see by the image below, Not all of the tiles render in my browser (Chrome). I’ve tweeted about this bug a couple of times, but haven’t seen any improvement.

Azure uses the concept of subscriptions as containers for services and billing. I find the layout of this portal confusing and unintuitive. That coupled with the fact that the main page renders slowly and usually fails to render correctly is very frustrating to me.

Also the default listing of service types (which is some subset of the actual services available – some items are services, others are category names for groups of services) is once again, unintuitive and generally irritating to me.  What does ‘classic’ mean? Is it good, not good, should I use it, etc…?

Also the odd sizing of the tiles (too much blank space) is not helpful.

Generally, this ‘new’ Azure portal is not showing the increasingly more competitive set of Azure service in a positive way to me and my customers.

Grade D

Azure.png

I am interested in your opinion. Do you use any or all these cloud consoles?  If so, how do you find them?  What works well for you?  What doesn’t? What do you wish would be added and/or removed for improved usability?

Happy Programming (in the cloud)!

 

Trying out CloudBerry Lab Explorer Pro with Google Cloud Storage and Nearline

Tooling matters – particularly in the new-to-many-customers cloud world.  To that end, I’ve been using cloud storage management tools from CloudBerry Lab with several Enterprise customers and made a quick screencast demo of their Storage Explorer Pro (in this case for GCP – Google Cloud Storage and Nearline).

In addition to GCP, CloudBerry Lab makes cloud storage products which work with AWS, Azure, and more.