Cloud, Uncategorized

2019 Work & Talks

This year my team and I have been working with bioinformatics customers in Australia, US and UK. See my GitHub and Slides.com accounts (linked here) for more detail. I have also written several technical articles on Medium.

There are now 30 courses in the Linked In Learning / Lynda.com library of my creation – topics are Cloud, Big Data and more. Over 4 million students have watched these courses to date.

I’ve begun work on a book ‘Visualizing Cloud Systems’ and am in the process of delivering talks on this subject in the US and in Europe. Currently in Berlin, Germany working with these clients remotely.

Also notable in 2019, is that I have moved to Minneapolis, MN.

AWS, Big Data, Cloud

Use AWS? Try the og-aws

What is the og-aws? It’s a new kind of book (really booklet) crowd-sourced and published on GitHub.  ‘OG’ stands for open guide and the idea is that people who use AWS, but are NOT employees of AWS, have created a curated crib sheet with links to the stuff you really need to know, organized by category (such as ‘high availability’ or ‘billing’…) or by service (i.e. EC2, S3, etc…) and well-indexed so that you can quickly scan and get the USEFUL answer that you need.

Also, attention has been paid to common ‘mistakes’ or ‘gotchas’ when using one or more AWS services and information about mistakes has been provided as well.

There is an associated Slack for the og-aws, click the link at the top of the README.md page on the GitHub Repo to join in.  In the Slack there are active discussions about how best to use AWS services.  Also, the editors of the og-aws (including me) welcome additional community contributions (via GitHub pull requests.)  The editors have written a short guide to contributing — here.

All-in-all, this guide is useful, timely and FREE, so head over to GitHub to check out the og-aws — here.

screen-shot-2017-03-02-at-11-06-11-pm

Cloud, google

Bioinformatics Code Samples

As I’ve started working with cloud big data in the cancer genomics (bioinformatics) vertical, I’ve ‘collected’ my notes, code and work in a GitHub repo.

screen-shot-2017-03-02-at-10-43-17-pm

I have general information, i.e. terms, file types, etc… at the top level of the repo.  Next, I organized tools and libraries (such as Galaxy, Hail, etc…) by folder in the repo.  I’ve included sample code when I’ve had time to test it as well.

Samples and information are presented for either the AWS or the GCP cloud.

Big Data, Cloud, google, Uncategorized

GCP Data Pipeline Patterns

Here’s the deck and screencast demos from my talks for YOWNights in Australia on Google Cloud Platform services and data pipeline patterns.

The screencast demos are linked in the slide deck (starting after slide 4) and show the following GCP services:

  • GCS – Google Cloud Storage
  • GCE  – Google Compute Engine or VMs (Linux, Windows and SQL Server)
  • BigQuery – Managed data warehouse
  • Cloud Spanner – Managed scalable RDBMS – Beta release at this time of this recording
  • Cloud Vision API – Machine Learning (Vision API)

The architecture patterns for GCP services for common data pipeline workload scenarios, include the following: Data Warehouse, Time Series, IoT and Bioinformatics.  They are taken from Google’s reference architectures – found here.