Cloud, Uncategorized

2019 Work & Talks

This year my team and I have been working with bioinformatics customers in Australia, US and UK. See my GitHub and accounts (linked here) for more detail. I have also written several technical articles on Medium.

There are now 30 courses in the Linked In Learning / library of my creation – topics are Cloud, Big Data and more. Over 4 million students have watched these courses to date.

I’ve begun work on a book ‘Visualizing Cloud Systems’ and am in the process of delivering talks on this subject in the US and in Europe. Currently in Berlin, Germany working with these clients remotely.

Also notable in 2019, is that I have moved to Minneapolis, MN.

Big Data, Cloud, google, Uncategorized

GCP Data Pipeline Patterns

Here’s the deck and screencast demos from my talks for YOWNights in Australia on Google Cloud Platform services and data pipeline patterns.

The screencast demos are linked in the slide deck (starting after slide 4) and show the following GCP services:

  • GCS – Google Cloud Storage
  • GCE  – Google Compute Engine or VMs (Linux, Windows and SQL Server)
  • BigQuery – Managed data warehouse
  • Cloud Spanner – Managed scalable RDBMS – Beta release at this time of this recording
  • Cloud Vision API – Machine Learning (Vision API)

The architecture patterns for GCP services for common data pipeline workload scenarios, include the following: Data Warehouse, Time Series, IoT and Bioinformatics.  They are taken from Google’s reference architectures – found here.

Cloud, google, SQL Server 2012, SQL Server 2014, SQL Server 2016, Uncategorized

SQL Server on GCP

I recently tried out running SQL Server 2016 on a Google Cloud Platform Windows-based Virtual Machine (GCE – Google Compute Engine Service).

This is a quick way to try out new features of the latest version of SQL Server.  In this case, I wanted to test out the R (language)-in database services.

Although you can certainly ‘click’ in the GCP console to start an instance of SQL Server on GCE,  you may want to script activity (for use with the Google gcloud tool).  To that end, I created a simple script to do this.  Also I added a script to enable and test the R-in database feature.  Here’s a link to my GitHub Repo.

What do you think?  Interested to try this out?  Let me know how it goes for you.




Using Redis with SQL Server

When is SQL Server ‘not enough’ and/or ‘too expensive’ for your application performance scalability needs?  Caching is a popular mechanism used to accelerate response times and help applications scale while reducing the load on RDBMS systems, and save on resources.

Redis is a widely used in-memory NoSQL database, which can be used as a database, cache and message broker. This blog post explains how to get started with Microsoft SQL Server and enterprise grade Redis from Redis Labs.

Application caching is a technique used to speed up application response times, to help applications scale and to keep critical data highly available in the event of system failure. It works by placing frequently needed data close to the application, most often placing this data in memory using a NoSQL database such as Redis for the fastest application access.

Redis is a fast, lightweight and powerful open source NoSQL database.  Redis is written in C and it compiles into extremely efficient machine code which requires very little overhead. It runs entirely in-memory, and is optimized to deliver more than 1.5M ops/sec with less than 1 ms latency with a single standard server. This allows for really fast application response times, increases throughput, reduces database traffic, with the lowest spend! This approach also reduces the number of RDBMS licenses you need.

Redis is used as a database, a message broker, or very commonly as a caching layer because it enables true statelessness for an application’s processes, while reducing duplication of data or requests to external data sources. Although entire data sets are most commonly served from RAM, Redis also can be enabled to support data persistence and backups. With Redis Labs Enterprise Cluster, you get a natively distributed architecture, that scales as the application workload scales and reduces your database workload.

Redis Labs is both the open source home and commercial provider of Redis. Redis Labs products provide an additional technology layer that encapsulates open-source Redis. This layer provides an enhanced deployment architecture for enterprises. They offer two enterprise Redis products:

  • Redis Labs Enterprise Cluster  – downloadable software, for on premises deployments or in the environment of your choice
  • Redis Cloud – fully managed service for cloud-based deployments in any of these public cloud providers – Azure, AWS, Google Cloud Platform, Softlayer, Heroku, Open Shift, Cloud Foundry and more…

Redis Labs offers 24/7 access to top-notch Redis experts with enterprise-class support. There’s zero downtime and zero performance impact while scaling Redis up or down. Redis Labs technology includes constant monitoring and rebalancing of shards to meet throughput goals, optimizations to ensure consistent high performance and automation for low operational overhead while ensuring high availability and scalability.

As an example, I’ll show how to use Redis Cloud as a application cache for a web application that uses SQL Server as it’s primary database.  Although the caching paradigm is a well-known application design pattern, I find that many of my customers are not familiar with using external NoSQL databases, such as Redis, to address this need. Shown below is an example architecture and also a link to a screencast with a demo of the application and even more information about using Redis Cloud.  The business scenario is to cache user login information.

Application Architecture shown below.



NOTE: This design pattern could be used by applications which are hosted on premises, by using Redis Labs Enterprise Cluster for the caching layer. In this case, the Redis Labs Enterprise Cluster would be installed locally, and would be used with an application that runs in house which uses an on premises SQL Server.

The simplest way to start is to quickly setup a demo of Redis Labs using the Redis Cloud evaluation cluster using the Redis Labs website.  From there you’ll want to add Redis Cloud caching to your applications.  Shown below is a the Redis Labs portal which lets you get up and running with a RedisCloud instance with just a few clicks.  There is a free tier for development and evaluation.


Next, as an example, I’ve set up a node.js web application which uses Azure SQL as a it’s primary database.  This website includes a user login functionality.  The user information is cached using Redis Cloud.  Complete application source code is on Github at this link –

In this demo, a simple node.js website uses Azure SQL for the user login information. The application login screen is shown below.


Also shown below is the structure (column names and datatypes) of the Azure SQL User Table.


The application uses Windows Azure website hosting along with Azure SQL for the database tables.  It is a standard node.js application.  In addition to the standard setup, I’ve added a call to the Redis Labs’ Redis Cloud instance which I created earlier (in this case the Redis Cloud instance is running on Windows Azure).

Why don’t we just use Azure cache? The answer is simple. Azure cache is not persistent and does not come with built-in high availability at the same level as Redis cloud. Redis Cloud runs in Azure, so if your application is hosted in Azure, then you don’t have to worry about latencies between Azure and other environments. Operationally, scaling Redis Cloud is also a lot simpler since it happens behind the scenes, without you needing to select larger instances or having to change your application in any way.

In this demo, the use case is to retrieve the user login credentials from cache (Redis Cloud) rather than from Azure SQL.  The business reason for this is to accelerate the website login performance.  The data types used are simple strings presented as a username and password.  This is obviously a POC application, just meant to demonstrate core functionality and is no way shape or form anywhere a production quality application,

To set up the demo, configure the ‘env’ file to use your values and rename it to ‘.env’.  Run the ‘’ in the azure-scripts folder to execute the setup scripts for the Azure artifacts (website and SQL database).  See detailed setup instructions at the end of this article.  If you’d prefer to create the Azure website and Azure SQL instance by clicking on the Azure portal, you can do that as an alternative to running the setup scripts that I have supplied for your setup.

To test the demo, create a new user on the webpage, and then login with the new user account.  Log out and log back in with that new user account.  You’ll then be able to see the traffic on the Redis Labs website for your Redis Cloud cluster.  An example is shown below.


The code that makes this call to a Redis Labs session store possible is straightforward. It is found in the /services/ folder of the GitHub project in the ‘redis-cloud.js’ file.  I’ve detailed the key lines and provided a copy of this code below.

— Lines 1-4 setup the objects you’ll need.
— Line 6 creates the ‘createSessionStore’ method.
— Lines 8-11 setup and configure a Redis Labs, Redis Cloud client.
— Lines 13-15 setup the call to connect to the Redis Cloud instance.
— Lines 17-19 authenticate the connection with your server password.
— Lines 24-29, we set up the Redis Cloud instance to be used for session store for our node.js application.


The users.routes.js’ & users.service.js’ files in the /users folder of the sample makes the actual calls to the session store for the username and password information (stored as key/value pairs), which has been set to use the Redis Cloud instance in our demo. Since we have already set Redis as our session store, any calls to session in our node.js code will access Redis.  Shown below on lines 28-33 from ‘users.routes.js’ is the call to get the session key.


This demo application demonstrates how quick and easy it is to add Redis Cloud to your SQL Server-backed web application for the user / password caching use case using the standard session pattern.


Redis provides pre-built data structures including strings, lists, sets, sorted sets with range queries, hashes, hyperloglogs, bitmaps and geospatial indexes with radius queries. Its ability to allow data objects to be stored in their original format, data can be processed on the database level rather than the application level. Using this architecture, an application can retrieve only discrete elements from the object, as opposed to other key / value data stores that require the application to retrieve an entire object value (blob), then de-serialize it and parse it in order to get the desired part. This enables blazingly fast access and analysis on stored data.

Redis’ true value is providing advanced data structures and operations to application developers. “Intelligent caching” is more than leveraging these data structures via the GET / SET operations; it’s the act of exploiting their unique properties to efficiently and optimally manipulate data. Two examples of intelligent caching are its commands that modify the data in the server and its ability to execute embedded Lua scripts.

Every operation in Redis is atomic, ensuring the integrity of the cached data and provides a consistent view to the processes sharing it. Additionally, Redis cache contributes to an application’s availability. External data sources can experience failures resulting in degraded or terminated service. During these outages, the cache can still serve data to the application, retaining availability. Redis cache has data persistence, in-memory cross-region / data center / cloud replication, instant automatic failover, backups, and disaster recovery.

A cache’s main purpose is to reduce the time needed to access data stored outside of the application’s main memory space, reducing bottlenecked performance and freeing up application resources for other uses.

Redis is a distributed shared cache providing very fast performance, a seamlessly scalable architecture and extremely low-latency data delivery. When you have very complex programming problems and/or very demanding data processing tasks, Redis’ data structures provide simple commands executed within the data store – not on the application-level – that results in cleaner, more elegant code with fewer lines, faster execution time, better application performance, and better CPU, I/O and network utilization.

Redis cache is a great alternative caching solution vs. caching available in SQL Server because the SQL Server’s main memory – RAM – is finite, sometimes volatile, normally non-sharable and comparatively expensive. Redis cache’s ability to store its data on disk allows for greater scalability, durability, concurrency, and cost-effectiveness. Redis cache is a great replacement for session storage such as for applications running on web farms or on Microsoft Azure where maintaining state is difficult.

When you have multiple data structures in various formats and sizes and need a flexible schema-less design, Redis data structures allows you to easily scale your database as if it was a simple key / value data store. Several of Redis’ commands operate on multiple keys. Multi-key operations provide better overall performance because there’s substantially less communication and administration. It is binary safe, and any single element in Redis can range in size from 0 bytes to 0.5 GB.  Also if complex data-structure is used, such as hash, list, set, (this limitation refers to a per element), then the entire object can be much larger.

Redis use cases include real-time analytics, high speed transactions, gaming, online advertising, IoT, message queues, high speed data ingest, session storage, in-app social functionality, in-database analytics, application job management, time-series processing, geo-searching and high-speed caching.

Redis Labs products, (Redis Cloud for cloud-based applications or Redis Enterprise for on premise applications) are simple, powerful and a natural choice for application caching with SQL Server.

–Source code for caching demo –
–Comparison of Commercial Redis offerings —
–SQL Server caching mechanisms –

Redis Labs RedisCloud as a user caching store for a node.js app with SQL Azure.  NOTE: Instructions are for OSX

  1. Prerequisites
    • install node.js
    • install azure cli
    • use a text editor (Visual Studio Code, Sublime, etc…)
  2. Redis Cloud
    • go to and sign up
    • login to and click ‘New Redis Subscription’
    • next to ‘Cloud’ click the drop down, select ‘Azure/west-us’ and 30MB/Free
    • go to ‘My Resources’ >’Manage Resources’, wait for the green checkmark
    • note your Redis Cloud endpoint address and Redis password
    • fill in these values on the Redis variables in your ‘env’ file
  3. Azure Setup
    • fill in your desired values in your ‘azure-scripts/redis-lab-demo-sql-server/parameters.json’ file
    • open ‘azure-scripts/redis-lab-demo-sql-server/’
    • run ‘azure-scripts/redis-lab-demo-sql-server/’ from a bash shell
    • connect to your SQL Azure instance with a client (i.e. Navicat), run scripts
      • run ‘user-login.sql’ and ‘user-status.sql’ to create tables
    • fill in your SQL values in your ‘env’ file
  4. Test the results
    • update ‘redis-cloud.js’ line 25 to use your value for the ‘secret’
    • rename the ‘env’ file to ‘.env’
    • run ‘npm install’ to install node dependencies on your local machine (if you wish to test locally before deplying to Azure)
    • test with localhost
    • get your ip address and set a firewall rule in azure to test remotely
    • run the commands in ‘azure-scripts/’
    • push your code to azure using the following commands
      • git add .
      • git commit – ‘{your commit message}’
      • git push azure master
    • test via your azure website endpoint (using a browser)

4 Million TPS GCP with Aerospike

Aerospike Whitesands Tool
I did some work with the Aerospike team and some other partners (@dchaley and @jamesrcounts) to validate Aerospike performance benchmarks on the Google Cloud using GCE instances.

In addition to blogging about the relatively simple 6-step process of setting up a 20-node cluster to get this mind-boggling performance, my team also wrote some scripts so that you can easily replicate our work.

Also, I recorded a screencast about Aerospike which includes a live demo of the performance benchmark – guess what? we actually got an even HIGHER benchmark tonight – between 5 and 6 MILLION TPS for read-only workloads. We also added a test for a mixed workload – 50% read/50% write. We got over 1 MILLION TPS for both the reads and the writes using the same size cluster – BAM!


Redshift Data Warehouse w/Matillion

Building a Data Warehouse on AWS
I led a great team at this year’s AWS re:Invent conference in building a workshop for attendees. We took on the daunting task of creating courseware for teams of students to build an end-to-end data warehouse in just two hours. Happily, all teams were successful!

So, how did we do it? We used AWS:Marketplace partners to ‘speed up’ our time-to-value. Specifically, we used Matillion ETL for Redshift to load and transform our data. Then we used Tableau to create a dashboard.

Want to know more?

I’ve posted our session notes / setup on slideshare for you to review.
Also, I’ve posted a setup guide on GitHub. This includes AWS cli commands for you to use if you wish to duplicate this exercise yourself.

Also, I’m part of a new site that AWS launched to help you to understand exactly what selected AWS:Marketplace Big Data partners have to offers. Here you’ll find interviews with technical leads from these companies, where we discuss what exactly their product is and does, architectural patterns, common use case and also customer success stories. Content is targeted at technical architects.

How do you use AWS Redshift? Which AWS:Marketplace Big Data partners have you explored? I’d love to hear from you in the comments section below.