I did some work with the Aerospike team and some other partners (@dchaley and @jamesrcounts) to validate Aerospike performance benchmarks on the Google Cloud using GCE instances.
In addition to blogging about the relatively simple 6-step process of setting up a 20-node cluster to get this mind-boggling performance, my team also wrote some scripts so that you can easily replicate our work.
Also, I recorded a screencast about Aerospike which includes a live demo of the performance benchmark – guess what? we actually got an even HIGHER benchmark tonight – between 5 and 6 MILLION TPS for read-only workloads. We also added a test for a mixed workload – 50% read/50% write. We got over 1 MILLION TPS for both the reads and the writes using the same size cluster – BAM!
I led a great team at this year’s AWS re:Invent conference in building a workshop for attendees. We took on the daunting task of creating courseware for teams of students to build an end-to-end data warehouse in just two hours. Happily, all teams were successful!
So, how did we do it? We used AWS:Marketplace partners to ‘speed up’ our time-to-value. Specifically, we used Matillion ETL for Redshift to load and transform our data. Then we used Tableau to create a dashboard.
Want to know more?
I’ve posted our session notes / setup on slideshare for you to review.
Also, I’ve posted a setup guide on GitHub. This includes AWS cli commands for you to use if you wish to duplicate this exercise yourself.
Also, I’m part of a new site that AWS launched to help you to understand exactly what selected AWS:Marketplace Big Data partners have to offers. Here you’ll find interviews with technical leads from these companies, where we discuss what exactly their product is and does, architectural patterns, common use case and also customer success stories. Content is targeted at technical architects.
How do you use AWS Redshift? Which AWS:Marketplace Big Data partners have you explored? I’d love to hear from you in the comments section below.
I am speaking at and attending the massive AWS re:Invent conference this week in Las Vegas along with over 19,000 other people.
Among the activities here, I was interviewed for Silicon Angle/theCube. Here’s the my interview…
We often get asked ‘what are the influences’ for TKP courseware? TKP courseware includes TKPJava, TKPSmallBasic and new courseware around Data Science and IoT concepts.
In addition to to the work of the TKP team that has created TKPJava courseware, the team is inspired by many other influences. These influences are varied and many (and listed below), in particular the ideas in this book inspire many of our lesson concepts:
Tooling matters – particularly in the new-to-many-customers cloud world. To that end, I’ve been using cloud storage management tools from CloudBerry Lab with several Enterprise customers and made a quick screencast demo of their Storage Explorer Pro (in this case for GCP – Google Cloud Storage and Nearline).
In addition to GCP, CloudBerry Lab makes cloud storage products which work with AWS, Azure, and more.
Posted in Cloud
In this whitepaper, I take a look at the various options for Hadoop Streaming. These include Apache Storm, Apache Spark Streaming and Apache Samza. Also I examine commercial alternatives, such as Data Torrent. I cover implementation details of streaming, including type of streaming and capacities of libraries and products included.
You can read this whitepaper online or download it via the included Slideshare link.
Here’s a whitepaper I wrote on the ‘state of Machine Learning’. It includes information about implementation via various cloud-based ML services (AWS, Azure, IBM) as well as category information (for architects). Your are welcome to read this whitepaper online or to download it if you prefer (linked to Slideshare source).