Recently, I’ve been doing some work with AerospikeDB. It is a super-fast in-memory NoSQL Database. I gave a presentation at the recent BigDataCampLA on ‘Bleeding Edge Databases’ and included it because of impressive benchmarks, such as 1 Million TPS (read-only workload) PER SERVER and 40K TPS (read-write) on that same server. Here’s the live presentation, also I did a screencast of this presentation.
In this blog post, I’ll detail how you can get started with the FREE community edition of AerospikeDB. Again I’ll use Google Compute Engine as my platform of choice, due to the speed, ease of use and inexpensive cost for testing. You’ll note from the screenshot below, that you can install the community edition on your own server, or on other clouds (such as AWS) as well. I am writing this blog post because Aerospike didn’t have directions to get set up on GCE available prior to this blog post.
Here’s a top level list of what you’ll need to do (below, I’ll detail each step) – I did the whole process start-to-finish in < 30 minutes.
- Set up a Google Cloud project with Google Compute Engine (VM) API access
- Spin up and configure a GCE instance
- Install the Aerospike Community Edition, which runs on up to 2-nodes and can use up to 200 GB for your testing purposes
- Run your tests and (optionally) add other nodes
Next I’ll drill into each of the steps listed above. I’ll go into more detail and will provide sample commands for the Google Cloud test that I did.
Step One – Setup a Google Cloud project with Google Compute Engine access
If you are new to the Google Cloud, you’ll need to get the Google Cloud SDK for the command line utilities you’ll need to install and to connect to your cloud-hosted virtual machine. There is a version of the SDK for Linux/Mac and also for Windows.
For this tutorial, I will be using Mac. There are only two steps to using the SDK:
a) From Terminal run
curl https://sdk.cloud.google.com | bash
b) Then restart Terminal and then run the command below from Terminal. After it runs then a browser window will open, then click on your gmail account and then click on the ‘accept’ button and then login will complete in the terminal window
gcloud auth login
If you already have a Google Cloud Project, then you can proceed to Step Two. If you do not yet have a Google Cloud Project, then you will need to go to the Google Developer’s Console and create a new Project by clicking on the red ‘Create Project’ button at the top of the console.
Note: Projects are containers for billing on the Google Cloud. They can contain 1:M instances of each authorized service – in our case that would be 1:M instance of Google Compute Engine Virtual Machines.
To enable access to the GCE API in your project, click on the name of the project in Google Developer Console, then click on ‘APIS & AUTH’>’APIs’>Google Compute Engine “OFF” to turn the service availability to “ON”. The button should turn green to indicate the service is available.
You will also have to enable billing under the ‘Billing & Settings’ section of the project. Because you are reading this blog post, you can apply for $ 500 USD in Google Cloud Usage Credit at this URL – use code “gde-in” when you apply for the credit.
To be complete there are many other types of cloud services available, such as Google App Engine, Google Big Query and many more. Those services are not directly related to the topic of this article, so I’ll just link more information from the Google Cloud Developer documentation here.
Step Two – Spin up and configure a GCE instance
Note: All of the steps I describe below could be performed in the Terminal via GCloud command line tools (‘gcloud compute’ in this case), for simplicity, I will detail the steps using the web console. Alternatively, here is a link to creating a GCE instance using those tools.
From within your project in Google Developers Console, click on your Project Name. From the project console page, click on ‘COMPUTE’ menu on the left side to expand it. Next click on ‘COMPUTE ENGINE’>VM Instances.
Then click on the red button on the top of the page ‘New Instance’ to open the web page with the instance information as shown below. Also here’s a quick summary of the values I selected: ZONE: US-Central1-b; MACHINE TYPE: n1-standard-1 (1 vCPU, 3.8 GB memory); IMAGE: Debian-7-wheezy-v20140606.
Other notes: You could use a g1-small instance type if you’d prefer, minimum machine requirements for the community edition of Aerospike are at least 1 GB RAM and 1 vCPU. You could use Red Hat and CentOS for the image, however my directions are specific to Debian 7 Linux.
Click the blue ‘Create’ button to start your instance. After the instance is available (takes less than a minute in my experience!), then you will see it listed in the project console window (COMPUTE ENGINE>VM Instance). You can now test connectivity to your instance by clicking on the ‘SSH’ button to the right of the instance.
To test connectivity using SSH, open Terminal, then use the ‘gcloud auth login’ command as described previously, then paste the gcutil command into the terminal, an example is shown below.
The last configuration step for GCE is set up a firewall rule. You’ll want to do this so that you can use the Aerospike (web-based) management console. To create this rule do the following in the Google Developers Console for your project: Click on COMPUTE>COMPUTE ENGINE>Networks>’default’>Firewall Rules>Create New. Then add a new firewall rules with these settings: Name: AMC; Source Ranges: 0.0.0.0/0; Allowed Protocols or Ports: tcp: 8081
Step Three – Install the Aerospike Community Edition
To start, I set up a test Aerospike server with a single node. To do this there are three required steps. I have added a couple of optional steps as well since I found them to make my test of Aerospike more interesting
a) Connect to GCE via SSH
b) Download the Aerospike Community Edition
c) Extract and install the download (which is the server software plus command line tools)
d) start the service and test inserting data
e) install the node.js client (optional)
f) install the web-based management console
Notes: Be sure to run the scripts below as sudo. Also my install instructions are based on downloading the version of Aerospike Database Server 3.2.9 than is designed to run on DEBIAN 7.
Here is a Bash script to automate this process:
#!/bin/bash
sudo apt-get -y install wget
wget -qO server.tgz “http://www.aerospike.com/community_downloads/3.2.9/aerospike-community-server-3.2.9-debian7.tgz”
tar xzf server.tgz
sudo dpkg -i aerospike*/aerospike*
sudo /etc/init.d/aerospike start #start aerospike now
To verify correct functioning of the server:
sudo /etc/init.d/aerospike status
I next used the included command line tool to further verify that the server was working properly by inserting, retrieving and then deleting some values from the server. The command line tools is called ‘cli’ and is found in /usr/bin/cli. Here are some sample test commands that I used:
cli -o set -n test -s “test_set” -k first_key -b bin_x -v “first value”
cli -o set -n test -s “test_set” -k first_key -b bin_y -v “second value”
cli -o set -n test -s “test_set” -k first_key -b bin_z -v “third value”
Next I retrieved these values with the following command:
cli -o get -n test -s “test_set” -k first_key
> {‘bin_z’: ‘third value’, ‘bin_y’: ‘second value’, ‘bin_x’: ‘first value’}
Last I deleted the key:
cli -o delete -n test -s “test_set” -k first_key
cli -o get -n test -s “test_set” -k first_key
> no data retrieved
Tip: (Current Aerospike version: 3.2.9) After an instance reboot, the Aerospike service may fail to start. To fix try creating an Aerospike directory in /var/run:
sudo mkdir /var/run/aerospike
Optional Step — Client Setup (Node.js)
Next I setup a simple Node.js Client on the same instance as the server. The process is as follows: install Node.js and the Node Package Manager (NPM) and then install the Node.js Aerospike client package.
Note: There are a number of Aerospike clients available for different languages. These are outside the scope of this document. For more information go here: http://www.aerospike.com/aerospike-3-client-sdk/
This script automates this process:
#!/bin/bash
CWD=$(pwd)
sudo apt-get install python-software-properties
sudo add-apt-repository ppa:chris-lea/node.js # requires human interaction: ‘PRESS ENTER’
sudo apt-get install -y software-properties-common
sudo apt-get update
sudo apt-get -y upgrade
sudo apt-get install -y python g++ make nodejs
curl https://www.npmjs.org/install.js | sudo sh
sudo npm install aerospike -g # -g installs to /usr/lib, current dir otherwise
cd ${CWD}
You will need to acknowledge the addition of the Node.js repository to your software repositories list. Once this completes, navigate to the examples directory:
cd /usr/lib/node_modules/aerospike/examples
Install the prerequisite packages:
sudo npm install ../
sudo npm update
These examples insert dummy data to a specified location in a similar fashion to the cli tool.
node put.js -n test -s “test_set” first_key
OK – { ns: ‘test’, set: ‘test_set’, key: ‘first_key’ }
put: 3ms
node get.js -n test -s “test_set” first_key
OK – { ns: ‘test’, set: ‘test_set’, key: ‘first_key’ } { ttl: 9827, gen: 2 } { bin_z: ‘third value’, i: 123, s: ‘abc’, arr: [ 1, 2, 3 ],map: { num: 3, str: ‘g3’, buff: <SlowBuffer 0a 0b 0c> }, b: <SlowBuffer 0a 0b 0c>,b2: <SlowBuffer 0a 0b 0c> }
get: 3ms
node remove.js -n test -s “test_set” first_key
OK – { ns: ‘test’, set: ‘test_set’, key: ‘first_key’ }
remove: 7ms
Additionally there is a benchmarking tool I used to get a rough idea of the transactions per second available from my instance:
cd /usr/lib/node_modules/aerospike/benchmarks
npm install
node inspect.js
Management Console Setup
The Aerospike Management Console is a web based monitoring tool that will report all kinds of status information about your Aerospike deployment. Whether its a single instance or a large multi-datacenter cluster. To install the AMC I used the following script as superuser (e.g. sudo script.sh).
#!/bin/bash
apt-get -y install python-pip python-dev ansible
pip install markupsafe paramiko ecdsa pycrypto
wget -qO amc.deb ‘http://aerospike.com/amc/3.3.1/aerospike-management-console-3.3.1.all.x86_64.deb’
dpkg -i amc.deb
sudo /etc/init.d/amc start #start amc now
Once deployed I pointed my browser to port 8081 of the instance. There will be a dialog asking for the hostname and port of an Aerospike instance. Since I installed the server on the same instance as the amc I just used localhost and port 3000.
Step Four – Run tests and (optionally) add other nodes
As mentioned, you can test Aerospike on up to 2 nodes. The next step I took in testing was to add another server node. Here are the steps I took to do this.
First I added a firewall rule for TCP ports 3000-3004. I did this using the same process (i.e. in the Google Developers Console) described previously. Get to the ‘Create new firewall rule’ panel: Compute> Compute Engine> Networks> ‘default’> Firewall Rules> Create New. Configure the rule by changing these values:Name: aerospike; Source Ranges: 0.0.0.0/0; Allowed Protocols or Ports: tcp:3000-3004
Next I opened the Aerospike configuration file located at /etc/aerospike/aerospike.conf. Inside the ‘networks’ section is a section called ‘heartbeat’ that looks like the following:
heartbeat {
mode multicast
address 239.1.99.222
port 9918
# To use unicast-mesh heartbeats, comment out the 3 lines above and
# use the following 4 lines instead.
#mode mesh
#port 3002
#mesh-address 10.0.0.48
#mesh-port 3002
interval 150
timeout 10
}
I commented out the first three lines inside this section and then uncommented the four lines starting with mode mesh. I then replaced the ip address after mesh-address with the ip of my other node. Next I save my changes then restarted the aerospike service:
sudo /etc/init.d/aerospike restart.
Next I repeated these changes on my second server instance, setting the mesh-address for this server to the ip address of the first server instance. Each server instance will only need to know about one other server instance to connect to my Aerospike cluster. Everything else is handled automatically. To verify that the cluster is working correctly I checked the log file for ‘CLUSTER SIZE = 2’ like this:
sudo cat /var/log/aerospike/aerospike.log | grep CLUSTER
May 14 2014 23:42:48 GMT: INFO (partition): (fabric/partition.c::2876) CLUSTER SIZE = 2
Tip: If you are testing this out yourself, ensure that your instances can communicate with each other over the default ports 3000-3004. To test connectivity use telnet for example: ‘telnet <remote ip> <port>’
Conclusions
In conclusion, I find Aerospike to be a superior performing database in its category. I am curious about your experience with databases of this type (i.e. In-memory NoSQL). Which vendors are you working with now? What has been your experience? Which type of setup works best for you – on premise (bare metal or virtualized) or in the cloud? If in the cloud, which vendor’s cloud.
Also on the horizon, I am exploring up-and-coming light-weight application virtualization technologies, such as Docker. Are you working with anything like this? I will be posting more on using Docker with NoSQL and NewSQL databases over the next couple of months.