How to: Installing AerospikeDB on Google Compute Engine

Showing Aerospike at BigDataCampLA

Recently, I’ve been doing some work with AerospikeDB.  It is a super-fast in-memory NoSQL Database.  I gave a presentation at the recent BigDataCampLA on ‘Bleeding Edge Databases’ and included it because of impressive benchmarks, such as 1 Million TPS (read-only workload) PER SERVER and 40K TPS (read-write) on that same server.  Here’s the live presentation, also I did a screencast of this presentation.

In this blog post, I’ll detail how you can get started with the FREE community edition of AerospikeDB.  Again I’ll use Google Compute Engine as my platform of choice, due to the speed, ease of use and inexpensive cost for testing.  You’ll note from the screenshot below, that you can install the community edition on your own server, or on other clouds (such as AWS) as well.  I am writing this blog post because Aerospike didn’t have directions to get set up on GCE available prior to this blog post.

Aerospike Community Edition

Here’s a top level list of what you’ll need to do (below, I’ll detail each step) – I did the whole process start-to-finish in < 30 minutes.

    • Set up a Google Cloud project with Google Compute Engine (VM) API access
    • Spin up and configure a GCE instance
    • Install the Aerospike Community Edition, which runs on up to 2-nodes and can use up to 200 GB for your testing purposes
    • Run your tests and (optionally) add other nodes

Next I’ll drill into each of the steps listed above.  I’ll go into more detail and will provide sample commands for the Google Cloud test that I did.

Step One – Setup a Google Cloud project with Google Compute Engine access

If you are new to the Google Cloud, you’ll need to get the Google Cloud SDK for the command line utilities you’ll need to install and to connect to your cloud-hosted virtual machine.  There is a version of the SDK for Linux/Mac and also for Windows.

For this tutorial, I will be using Mac. There are only two steps to using the SDK:
a) From Terminal run

curl https://sdk.cloud.google.com | bash

b) Then restart Terminal and then run the command below from Terminal.  After it runs then a browser window will open, then click on your gmail account and then click on the ‘accept’ button and then login will complete in the terminal window

gcloud auth login

GCloud Authorization

If you already have a Google Cloud Project, then you can proceed to Step Two.  If you do not yet have a Google Cloud Project, then you will need to go to the Google Developer’s Console and create a new Project by clicking on the red ‘Create Project’ button at the top of the console.

Create Project

 

 

Note: Projects are containers for billing on the Google Cloud.  They can contain 1:M instances of each authorized service – in our case that would be 1:M instance of Google Compute Engine Virtual Machines.

To enable access to the GCE API in your project, click on the name of the project in Google Developer Console, then click on ‘APIS & AUTH’>’APIs’>Google Compute Engine “OFF” to turn the service availability to “ON”.  The button should turn green to indicate the service is available.

You will also have to enable billing under the ‘Billing & Settings’ section of the project.  Because you are reading this blog post, you can apply for $ 500 USD in Google Cloud Usage Credit at this URL – use code “gde-in” when you apply for the credit.

Google Cloud Usage Credit

To be complete there are many other types of cloud services available, such as Google App Engine, Google Big Query and many more.  Those services are not directly related to the topic of this article, so I’ll just link more information from the Google Cloud Developer documentation here.

Step Two – Spin up and configure a GCE instance

Note: All of the steps I describe below could be performed in the Terminal via GCloud command line tools (‘gcloud compute’ in this case), for simplicity, I will detail the steps using the web console.  Alternatively, here is a link to creating a GCE instance using those tools.

From within your project in Google Developers Console, click on your Project Name. From the project console page, click on ‘COMPUTE’ menu on the left side to expand it.  Next click on ‘COMPUTE ENGINE’>VM Instances.

Then click on the red button on the top of the page ‘New Instance’ to open the web page with the instance information as shown below.  Also here’s a quick summary of the values I selected: ZONE: US-Central1-b; MACHINE TYPE: n1-standard-1 (1 vCPU, 3.8 GB memory); IMAGE: Debian-7-wheezy-v20140606.

Other notes: You could use a g1-small instance type if you’d prefer, minimum machine requirements for the community edition of Aerospike are at least 1 GB RAM and 1 vCPU. You could use Red Hat and CentOS for the image, however my directions are specific to Debian 7 Linux.

GCE Configuration for Aerospike test

Click the blue ‘Create’ button to start your instance.  After the instance is available (takes less than a minute in my experience!), then you will see it listed in the project console window (COMPUTE ENGINE>VM Instance).  You can now test connectivity to your instance by clicking on the ‘SSH’ button to the right of the instance.

To test connectivity using SSH, open  Terminal, then use the ‘gcloud auth login’ command as described previously, then paste the gcutil command into the terminal, an example is shown below.

Testing Connectivity to GCE

The last configuration step for GCE is set up a firewall rule.  You’ll want to do this so that you can use the Aerospike (web-based) management console.  To create this rule do the following in the Google Developers Console for your project:  Click on COMPUTE>COMPUTE ENGINE>Networks>’default’>Firewall Rules>Create New.  Then add a new firewall rules with these settings: Name: AMC; Source Ranges: 0.0.0.0/0; Allowed Protocols or Ports: tcp: 8081

Step Three – Install the Aerospike Community Edition

To start, I set up a test Aerospike server with a single node.  To do this there are three required steps.  I have added a couple of optional steps as well since I found them to make my test of Aerospike more interesting

a) Connect to GCE via SSH
b) Download the Aerospike Community Edition
c) Extract and install the download (which is the server software plus command line tools)
d) start the service and test inserting data
e) install the node.js client (optional)
f) install the web-based management console

Notes: Be sure to run the scripts below as sudo.  Also my install instructions are based on downloading the version of Aerospike Database Server 3.2.9 than is designed to run on DEBIAN 7.

Here is a Bash script to automate this process:

#!/bin/bash
sudo apt-get -y install wget
wget -qO server.tgz “http://www.aerospike.com/community_downloads/3.2.9/aerospike-community-server-3.2.9-debian7.tgz&#8221;
tar xzf server.tgz
sudo dpkg -i aerospike*/aerospike*

sudo /etc/init.d/aerospike start #start aerospike now

To verify correct functioning of the server:

sudo /etc/init.d/aerospike status

I next used the included command line tool to further verify that the server was working properly by inserting, retrieving and then deleting some values from the server.  The command line tools is called ‘cli’ and is found in /usr/bin/cli.  Here are some sample test commands that I used:

cli -o set -n test -s “test_set” -k first_key -b bin_x -v “first value”
cli -o set -n test -s “test_set” -k first_key -b bin_y -v “second value”
cli -o set -n test -s “test_set” -k first_key -b bin_z -v “third value”

Next I retrieved these values with the following command:

cli -o get -n test -s “test_set” -k first_key
> {‘bin_z': ‘third value’, ‘bin_y': ‘second value’, ‘bin_x': ‘first value’}

Last I deleted the key:

cli -o delete -n test -s “test_set” -k first_key
cli -o get -n test -s “test_set” -k first_key
> no data retrieved

Tip: (Current Aerospike version: 3.2.9) After an instance reboot, the Aerospike service may fail to start.  To fix try creating an Aerospike directory in /var/run:

sudo mkdir /var/run/aerospike

Optional Step — Client Setup (Node.js)

Next I setup a simple Node.js Client on the same instance as the server. The process is as follows:  install Node.js and the Node Package Manager (NPM) and then install the Node.js Aerospike client package.

Note: There are a number of Aerospike clients available for different languages.  These are outside the scope of this document.  For more information go here: http://www.aerospike.com/aerospike-3-client-sdk/

This script automates this process:

#!/bin/bash
CWD=$(pwd)
sudo apt-get install python-software-properties
sudo add-apt-repository ppa:chris-lea/node.js # requires human interaction: ‘PRESS ENTER’
sudo apt-get install -y software-properties-common
sudo apt-get update
sudo apt-get -y upgrade
sudo apt-get install -y python g++ make nodejs
curl https://www.npmjs.org/install.js | sudo sh
sudo npm install aerospike -g # -g installs to /usr/lib, current dir otherwise

cd ${CWD}

 You will need to acknowledge the addition of the Node.js repository to your software repositories list. Once this completes, navigate to the examples directory:

 cd /usr/lib/node_modules/aerospike/examples

 Install the prerequisite packages:

 sudo npm install ../
 sudo npm update

These examples insert dummy data to a specified location in a similar fashion to the cli tool.

node put.js -n test -s “test_set” first_key
OK –  { ns: ‘test’, set: ‘test_set’, key: ‘first_key’ }
put: 3ms

node get.js -n test -s “test_set” first_key
OK –  { ns: ‘test’, set: ‘test_set’, key: ‘first_key’ } { ttl: 9827, gen: 2 } { bin_z: ‘third value’, i: 123, s: ‘abc’, arr: [ 1, 2, 3 ],map: { num: 3, str: ‘g3′, buff: <SlowBuffer 0a 0b 0c> }, b: <SlowBuffer 0a 0b 0c>,b2: <SlowBuffer 0a 0b 0c> }
get: 3ms 

node remove.js -n test -s “test_set” first_key 
OK –  { ns: ‘test’, set: ‘test_set’, key: ‘first_key’ }
remove: 7ms

Additionally there is a benchmarking tool I used to get a rough idea of the transactions per second available from my instance:

cd /usr/lib/node_modules/aerospike/benchmarks
npm install
node inspect.js

Management Console Setup

The Aerospike Management Console is a web based monitoring tool that will report all kinds of status information about your Aerospike deployment. Whether its a single instance or a large multi-datacenter cluster. To install the AMC I used the following script as superuser (e.g. sudo script.sh).

#!/bin/bash
apt-get -y install python-pip python-dev ansible
pip install markupsafe paramiko ecdsa pycrypto
wget -qO amc.deb ‘http://aerospike.com/amc/3.3.1/aerospike-management-console-3.3.1.all.x86_64.deb&#8217;
dpkg -i amc.deb

sudo /etc/init.d/amc start #start amc now

Once deployed I pointed my browser to port 8081 of the instance.  There will be a dialog asking for the hostname and port of an Aerospike instance.  Since I installed the server on the same instance as the amc I just used localhost and port 3000.

Aerospike Console

Step Four – Run tests and (optionally) add other nodes

As mentioned, you can test Aerospike on up to 2 nodes.  The next step I took in testing was to add another server node.  Here are the steps I took to do this.

First I added a firewall rule for TCP ports 3000-3004.  I did this using the same process (i.e. in the Google Developers Console) described previously.  Get to the ‘Create new firewall rule’ panel: Compute> Compute Engine> Networks> ‘default’> Firewall Rules> Create New.  Configure the rule by changing these values:Name: aerospike; Source Ranges: 0.0.0.0/0; Allowed Protocols or Ports: tcp:3000-3004

Next I opened the Aerospike configuration file located at /etc/aerospike/aerospike.conf. Inside the ‘networks’ section is a section called ‘heartbeat’ that looks like the following:

heartbeat {

mode multicast
address 239.1.99.222
port 9918

 # To use unicast-mesh heartbeats, comment out the 3 lines above and
# use the following 4 lines instead.
#mode mesh

#port 3002
#mesh-address 10.0.0.48
#mesh-port 3002

interval 150
timeout 10
}

I commented out the first three lines inside this section and then uncommented the four lines starting with mode mesh. I then replaced the ip address after mesh-address with the ip of my other node.  Next I save my changes then restarted the aerospike service:

sudo /etc/init.d/aerospike restart.

Next I repeated these changes on my second server instance, setting the mesh-address for this server to the ip address of the first server instance.  Each server instance will only need to know about one other server instance to connect to my Aerospike cluster. Everything else is handled automatically. To verify that the cluster is working correctly I checked the log file for ‘CLUSTER SIZE = 2’ like this:

sudo cat /var/log/aerospike/aerospike.log | grep CLUSTER
May 14 2014 23:42:48 GMT: INFO (partition): (fabric/partition.c::2876) CLUSTER SIZE = 2

Tip: If you are testing this out yourself, ensure that your instances can communicate with each other over the default ports 3000-3004.  To test connectivity use telnet for example: ‘telnet <remote ip> <port>’

Conclusions

In conclusion, I find Aerospike to be a superior performing database in its category.  I am curious about your experience with databases of this type (i.e. In-memory NoSQL).  Which vendors are you working with now?  What has been your experience?  Which type of setup works best for you – on premise (bare metal or virtualized) or in the cloud?  If in the cloud, which vendor’s cloud.

Also on the horizon, I am exploring up-and-coming light-weight application virtualization technologies, such as Docker.  Are you working with anything like this?  I will be posting more on using Docker with NoSQL and NewSQL databases over the next couple of months.

Posted in Agile, Big Data, Cloud, google, noSQL | Leave a comment

Bleeding Edge Databases – Aerospike, Algebraix and Google Big Query

I gave a talk called ‘Bleeding Edge Databases’ at this weekend’s BigDataCampLA.  Several attendees asked me to record the talk, so I did (and will link that post below).

Keynoting at BigDataCampLA

 

 

 

 

 

 

 

 

Here’s my favorite tweet from the event.

I am regularly invited to preview many, new database technologies, so you may be wondering why I chose these three solutions to highlight.  The first aspect of all three solutions is independently benchmarked noticeably better performance in their particular areas.  I also take usability into account.

Aerospike – 1 Million TPS (read-only workload) PER SERVER and 40K TPS (read-write) on that same server. Very good integration with client tools and libraries.

1M TPS Aerospike

AlgebraixData – 1 Billion Triples on ONE NODE and substantially better query response times than any competitor for the core benchmark queries for RDF databases.  Their core engine, which is optimized using patented mathematical algorithms interests me because of the solid performance benchmarks they’ve been able to achieve.

The Math of Algebraix Data

Google Big Query – 750 Millions Rows in 10 seconds, solidly ‘productized’ with usable integration points both in/out, reduced pricing and increased streaming – also nearly ANSI SQL-like querability.

TPC-H Benchmark for Big Query

 

 

 

 

 

 

 

 

My screencast includes demos of the first two products using Google Compute Engine (VMs on the Google Cloud).  I chose a Linux GCE instance for Aerospike and a Windows GCE instance for AlgebraixData.  I prefer to use the Google Cloud for this type of testing for a couple of reasons:

1) Fast, easy VM spin-up
2) Generous free tier, clear pricing information (i.e. no surprises)
3) As a GDE (Google Developer Expert), I have usage credits for the Google Cloud beyond typical, so I rarely encounter any fees for quick POC testing

If you are interesting in trying out these databases, the instructions on how to get access are in my screencast – enjoy!

Posted in Big Data, Cloud, google | Leave a comment

Intro to the Google Cloud for Developers

Here’s the deck from my talk at #Techorama  – enjoy.

Posted in Cloud, google | Leave a comment

AWS for the Database Professional

Updated my talk ‘AWS for the Database Professional’ – lots of screencast demos – enjoy!

Posted in AWS, Cloud | Leave a comment

Code Sample for D&B Business Verification API published

D&B Business Verification Service

I’ve been doing work with Dun & Bradstreet (and am also a D&B MVP), to that end, I wrote a code sample for working with their Business Verification service API in C# and published it to GitHub.

D&B’s Business Verification service is useful for many of my clients as it provides the ability to take partial (and potentially duplicate) customer information and returns back cleansed, completed, validated company information.  Also their unique identifier, the DUNS number, helps to identify duplicates.

My code sample is in C# and access their endpoint which is hosted in the Windows Azure Marketplace. D&B has a number of other data services, in addition to this one.  In addition to the Azure Marketplace, D&B has data services available at endpoints via their own D&B Direct site.

If you don’t need to code a custom solution the D&B Business Verification service is also available via the Windows Office store (as an Excel 2013 add-in) or via Excel Power Query.

D&B Business Verification Add-in for Excel

Enjoy my code sample and screencast (below) – happy programming!

 

Posted in Azure, Big Data | Tagged | Leave a comment

First Look – Windows Virtual Machine on the Google Cloud

Yes, you read this right.  Google announced last week at their cloud event, that their IaaS service – Google Compute Engine would begin to offer ‘premium operating systems’ – including Windows.  After getting approval from Google to join their limited preview, I tried this functionality out and here’s the results…

 

 

Posted in Cloud, google, Microsoft | Leave a comment

First Look – SQL Server 2014 RTM on Windows Azure

It’s a SQL Server kind of day.  First, congratulations to the team at Microsoft for releasing SQL Server 2014.  Also, thanks again to Microsoft to awarding me their MVP award for my SQL Server community education activities for the second year in a row.

As promised, the Azure team made a set of Azure Virtual Machine images with SQL Server 2014 RTM available today as well.
SQL Server 2014 RTM images on Windows Azure

I took the standard edition out for a spin and recorded the results in a short screencast (embedded below) – enjoy!

 

Posted in Azure, Cloud, Microsoft, SQL Server 2014 | Leave a comment