Below are the slides from my talk for the GAME 2017 conference on ‘Scaling Galaxy for GCP’ to be delivered in Feb 2017 at the University of Melbourne, Australia. Galaxy is a bioinformatics tool used for genomic research. A sample screen from Galaxy is shown below.
In this talk, I will show demos and patterns for scaling the Galaxy tool (and also for creating bioinformatics research data pipelines in general) via the Google Cloud Platform.
Patterns include the following:
- Scaling UP using GCE virtual machines
- Scaling OUT using GKE container clusters
- Advanced scaling using combinations of GCP services, such as Google’s new Genomics API, along with using Big Query to analyze variants and more. Core GCP Services used are shown below.
My particular area of interest is in the application of the results of using genomic sequencing for personalized medicine for cancer genomics. This is the application of the results of the totality of DNA sequence and gene expression differences between (cancer) tumor cells and normal host cells.
Building any type of genomics pipelines is true big data work, with EACH whole genome sequencing result producing 2.9 Billion base pairs (T,A,G,C).
Google has an interesting genomic browser (shown above) that you can use on the reference genomic data that they host on GCP.