Agile, Technical Conference, WiT

Speed Abstract Writing

A great teacher taught me a technique to use when I felt ‘blocked’ trying to write an essay.  I have used and refined this technique for years when I am doing professional writing, particularly as I write/submit abstracts to conference as a potential speaker. People often tell me ‘you write so fast!’.  Since I am now teaching my daughter this technique as she is writing a number of essays for college applications, I thought I’d share it here.

A side benefit of doing this is that when you learn to ‘submit faster’ then you become more inclined to apply more frequently.  This applies, of course, not only to college applications for high school seniors, but for many types of written submissions throughout your life.

Because I’ve learned to make the cost of writing lower (by using this time-saving process), I tend to write more in general and also to submit to more technical conferences as a potential speaker. More submissions result in more acceptances (more rejections too) but that’s a topic for another blog post.

I hope you find this process useful!

#happyWriting


Speed Essay / Abstract Writing

Take a one hour block of time and…

  • Set a timer for 10 minutes
  • Write out the first question
  • Write the first bullet point
  • Write a sentence using the bullet point
  • Read the sentence out loud
  • List the next bullet point
  • Write a sentence using the next bullet point
  • Continue until time runs out or done

NOTE: Take a break after every 10 minutes for 5 minutes – get up and move around

When done writing out all bullets into sentences

  • Put the sentences in a logical order
  • Write words (or new sentences) to connect the existing sentences
  • Read each new paragraph out loud, update as needed
  • Write a concluding sentence for your essay
  • Read the entire essay out loud
  • Check the word count (for limits)

Sleep on it

  • Read the entire essay out loud
  • Update as needed
  • Input the essay into the application
  • Submit it, verify that your submission was accepted
Agile, Big Data, Cloud, google, noSQL

Install AerospikeDB on GCE

Showing Aerospike at BigDataCampLA

Recently, I’ve been doing some work with AerospikeDB.  It is a super-fast in-memory NoSQL Database.  I gave a presentation at the recent BigDataCampLA on ‘Bleeding Edge Databases’ and included it because of impressive benchmarks, such as 1 Million TPS (read-only workload) PER SERVER and 40K TPS (read-write) on that same server.  Here’s the live presentation, also I did a screencast of this presentation.

In this blog post, I’ll detail how you can get started with the FREE community edition of AerospikeDB.  Again I’ll use Google Compute Engine as my platform of choice, due to the speed, ease of use and inexpensive cost for testing.  You’ll note from the screenshot below, that you can install the community edition on your own server, or on other clouds (such as AWS) as well.  I am writing this blog post because Aerospike didn’t have directions to get set up on GCE available prior to this blog post.

Aerospike Community Edition

Here’s a top level list of what you’ll need to do (below, I’ll detail each step) – I did the whole process start-to-finish in < 30 minutes.

    • Set up a Google Cloud project with Google Compute Engine (VM) API access
    • Spin up and configure a GCE instance
    • Install the Aerospike Community Edition, which runs on up to 2-nodes and can use up to 200 GB for your testing purposes
    • Run your tests and (optionally) add other nodes

Next I’ll drill into each of the steps listed above.  I’ll go into more detail and will provide sample commands for the Google Cloud test that I did.

Step One – Setup a Google Cloud project with Google Compute Engine access

If you are new to the Google Cloud, you’ll need to get the Google Cloud SDK for the command line utilities you’ll need to install and to connect to your cloud-hosted virtual machine.  There is a version of the SDK for Linux/Mac and also for Windows.

For this tutorial, I will be using Mac. There are only two steps to using the SDK:
a) From Terminal run

curl https://sdk.cloud.google.com | bash

b) Then restart Terminal and then run the command below from Terminal.  After it runs then a browser window will open, then click on your gmail account and then click on the ‘accept’ button and then login will complete in the terminal window

gcloud auth login

GCloud Authorization

If you already have a Google Cloud Project, then you can proceed to Step Two.  If you do not yet have a Google Cloud Project, then you will need to go to the Google Developer’s Console and create a new Project by clicking on the red ‘Create Project’ button at the top of the console.

Create Project

 

 

Note: Projects are containers for billing on the Google Cloud.  They can contain 1:M instances of each authorized service – in our case that would be 1:M instance of Google Compute Engine Virtual Machines.

To enable access to the GCE API in your project, click on the name of the project in Google Developer Console, then click on ‘APIS & AUTH’>’APIs’>Google Compute Engine “OFF” to turn the service availability to “ON”.  The button should turn green to indicate the service is available.

You will also have to enable billing under the ‘Billing & Settings’ section of the project.  Because you are reading this blog post, you can apply for $ 500 USD in Google Cloud Usage Credit at this URL – use code “gde-in” when you apply for the credit.

Google Cloud Usage Credit

To be complete there are many other types of cloud services available, such as Google App Engine, Google Big Query and many more.  Those services are not directly related to the topic of this article, so I’ll just link more information from the Google Cloud Developer documentation here.

Step Two – Spin up and configure a GCE instance

Note: All of the steps I describe below could be performed in the Terminal via GCloud command line tools (‘gcloud compute’ in this case), for simplicity, I will detail the steps using the web console.  Alternatively, here is a link to creating a GCE instance using those tools.

From within your project in Google Developers Console, click on your Project Name. From the project console page, click on ‘COMPUTE’ menu on the left side to expand it.  Next click on ‘COMPUTE ENGINE’>VM Instances.

Then click on the red button on the top of the page ‘New Instance’ to open the web page with the instance information as shown below.  Also here’s a quick summary of the values I selected: ZONE: US-Central1-b; MACHINE TYPE: n1-standard-1 (1 vCPU, 3.8 GB memory); IMAGE: Debian-7-wheezy-v20140606.

Other notes: You could use a g1-small instance type if you’d prefer, minimum machine requirements for the community edition of Aerospike are at least 1 GB RAM and 1 vCPU. You could use Red Hat and CentOS for the image, however my directions are specific to Debian 7 Linux.

GCE Configuration for Aerospike test

Click the blue ‘Create’ button to start your instance.  After the instance is available (takes less than a minute in my experience!), then you will see it listed in the project console window (COMPUTE ENGINE>VM Instance).  You can now test connectivity to your instance by clicking on the ‘SSH’ button to the right of the instance.

To test connectivity using SSH, open  Terminal, then use the ‘gcloud auth login’ command as described previously, then paste the gcutil command into the terminal, an example is shown below.

Testing Connectivity to GCE

The last configuration step for GCE is set up a firewall rule.  You’ll want to do this so that you can use the Aerospike (web-based) management console.  To create this rule do the following in the Google Developers Console for your project:  Click on COMPUTE>COMPUTE ENGINE>Networks>’default’>Firewall Rules>Create New.  Then add a new firewall rules with these settings: Name: AMC; Source Ranges: 0.0.0.0/0; Allowed Protocols or Ports: tcp: 8081

Step Three – Install the Aerospike Community Edition

To start, I set up a test Aerospike server with a single node.  To do this there are three required steps.  I have added a couple of optional steps as well since I found them to make my test of Aerospike more interesting

a) Connect to GCE via SSH
b) Download the Aerospike Community Edition
c) Extract and install the download (which is the server software plus command line tools)
d) start the service and test inserting data
e) install the node.js client (optional)
f) install the web-based management console

Notes: Be sure to run the scripts below as sudo.  Also my install instructions are based on downloading the version of Aerospike Database Server 3.2.9 than is designed to run on DEBIAN 7.

Here is a Bash script to automate this process:

#!/bin/bash
sudo apt-get -y install wget
wget -qO server.tgz “http://www.aerospike.com/community_downloads/3.2.9/aerospike-community-server-3.2.9-debian7.tgz&#8221;
tar xzf server.tgz
sudo dpkg -i aerospike*/aerospike*

sudo /etc/init.d/aerospike start #start aerospike now

To verify correct functioning of the server:

sudo /etc/init.d/aerospike status

I next used the included command line tool to further verify that the server was working properly by inserting, retrieving and then deleting some values from the server.  The command line tools is called ‘cli’ and is found in /usr/bin/cli.  Here are some sample test commands that I used:

cli -o set -n test -s “test_set” -k first_key -b bin_x -v “first value”
cli -o set -n test -s “test_set” -k first_key -b bin_y -v “second value”
cli -o set -n test -s “test_set” -k first_key -b bin_z -v “third value”

Next I retrieved these values with the following command:

cli -o get -n test -s “test_set” -k first_key
> {‘bin_z’: ‘third value’, ‘bin_y’: ‘second value’, ‘bin_x’: ‘first value’}

Last I deleted the key:

cli -o delete -n test -s “test_set” -k first_key
cli -o get -n test -s “test_set” -k first_key
> no data retrieved

Tip: (Current Aerospike version: 3.2.9) After an instance reboot, the Aerospike service may fail to start.  To fix try creating an Aerospike directory in /var/run:

sudo mkdir /var/run/aerospike

Optional Step — Client Setup (Node.js)

Next I setup a simple Node.js Client on the same instance as the server. The process is as follows:  install Node.js and the Node Package Manager (NPM) and then install the Node.js Aerospike client package.

Note: There are a number of Aerospike clients available for different languages.  These are outside the scope of this document.  For more information go here: http://www.aerospike.com/aerospike-3-client-sdk/

This script automates this process:

#!/bin/bash
CWD=$(pwd)
sudo apt-get install python-software-properties
sudo add-apt-repository ppa:chris-lea/node.js # requires human interaction: ‘PRESS ENTER’
sudo apt-get install -y software-properties-common
sudo apt-get update
sudo apt-get -y upgrade
sudo apt-get install -y python g++ make nodejs
curl https://www.npmjs.org/install.js | sudo sh
sudo npm install aerospike -g # -g installs to /usr/lib, current dir otherwise

cd ${CWD}

 You will need to acknowledge the addition of the Node.js repository to your software repositories list. Once this completes, navigate to the examples directory:

 cd /usr/lib/node_modules/aerospike/examples

 Install the prerequisite packages:

 sudo npm install ../
 sudo npm update

These examples insert dummy data to a specified location in a similar fashion to the cli tool.

node put.js -n test -s “test_set” first_key
OK –  { ns: ‘test’, set: ‘test_set’, key: ‘first_key’ }
put: 3ms

node get.js -n test -s “test_set” first_key
OK –  { ns: ‘test’, set: ‘test_set’, key: ‘first_key’ } { ttl: 9827, gen: 2 } { bin_z: ‘third value’, i: 123, s: ‘abc’, arr: [ 1, 2, 3 ],map: { num: 3, str: ‘g3’, buff: <SlowBuffer 0a 0b 0c> }, b: <SlowBuffer 0a 0b 0c>,b2: <SlowBuffer 0a 0b 0c> }
get: 3ms 

node remove.js -n test -s “test_set” first_key 
OK –  { ns: ‘test’, set: ‘test_set’, key: ‘first_key’ }
remove: 7ms

Additionally there is a benchmarking tool I used to get a rough idea of the transactions per second available from my instance:

cd /usr/lib/node_modules/aerospike/benchmarks
npm install
node inspect.js

Management Console Setup

The Aerospike Management Console is a web based monitoring tool that will report all kinds of status information about your Aerospike deployment. Whether its a single instance or a large multi-datacenter cluster. To install the AMC I used the following script as superuser (e.g. sudo script.sh).

#!/bin/bash
apt-get -y install python-pip python-dev ansible
pip install markupsafe paramiko ecdsa pycrypto
wget -qO amc.deb ‘http://aerospike.com/amc/3.3.1/aerospike-management-console-3.3.1.all.x86_64.deb&#8217;
dpkg -i amc.deb

sudo /etc/init.d/amc start #start amc now

Once deployed I pointed my browser to port 8081 of the instance.  There will be a dialog asking for the hostname and port of an Aerospike instance.  Since I installed the server on the same instance as the amc I just used localhost and port 3000.

Aerospike Console

Step Four – Run tests and (optionally) add other nodes

As mentioned, you can test Aerospike on up to 2 nodes.  The next step I took in testing was to add another server node.  Here are the steps I took to do this.

First I added a firewall rule for TCP ports 3000-3004.  I did this using the same process (i.e. in the Google Developers Console) described previously.  Get to the ‘Create new firewall rule’ panel: Compute> Compute Engine> Networks> ‘default’> Firewall Rules> Create New.  Configure the rule by changing these values:Name: aerospike; Source Ranges: 0.0.0.0/0; Allowed Protocols or Ports: tcp:3000-3004

Next I opened the Aerospike configuration file located at /etc/aerospike/aerospike.conf. Inside the ‘networks’ section is a section called ‘heartbeat’ that looks like the following:

heartbeat {

mode multicast
address 239.1.99.222
port 9918

 # To use unicast-mesh heartbeats, comment out the 3 lines above and
# use the following 4 lines instead.
#mode mesh

#port 3002
#mesh-address 10.0.0.48
#mesh-port 3002

interval 150
timeout 10
}

I commented out the first three lines inside this section and then uncommented the four lines starting with mode mesh. I then replaced the ip address after mesh-address with the ip of my other node.  Next I save my changes then restarted the aerospike service:

sudo /etc/init.d/aerospike restart.

Next I repeated these changes on my second server instance, setting the mesh-address for this server to the ip address of the first server instance.  Each server instance will only need to know about one other server instance to connect to my Aerospike cluster. Everything else is handled automatically. To verify that the cluster is working correctly I checked the log file for ‘CLUSTER SIZE = 2’ like this:

sudo cat /var/log/aerospike/aerospike.log | grep CLUSTER
May 14 2014 23:42:48 GMT: INFO (partition): (fabric/partition.c::2876) CLUSTER SIZE = 2

Tip: If you are testing this out yourself, ensure that your instances can communicate with each other over the default ports 3000-3004.  To test connectivity use telnet for example: ‘telnet <remote ip> <port>’

Conclusions

In conclusion, I find Aerospike to be a superior performing database in its category.  I am curious about your experience with databases of this type (i.e. In-memory NoSQL).  Which vendors are you working with now?  What has been your experience?  Which type of setup works best for you – on premise (bare metal or virtualized) or in the cloud?  If in the cloud, which vendor’s cloud.

Also on the horizon, I am exploring up-and-coming light-weight application virtualization technologies, such as Docker.  Are you working with anything like this?  I will be posting more on using Docker with NoSQL and NewSQL databases over the next couple of months.

Agile, Technical Conference

ApprovalTests at Agile 2012

Woody Zuill and I are presenting on the open source unit testing library ApprovalTests at the Agile 2012 conference this week.  Below are the slides.  The presentation will also be recorded and I will link that recording here after it’s posted.

Are you using ApprovalTests?  How’s it going? Let me know on the comments in this blog.  Also, if you are a .NET developer, there is a new release of ApprovalTests for .NET (v.20) with a bunch of new features.

Happy testing!

Agile, AWS, Azure, Big Data, Cloud, Data Science, google, Hadoop, Microsoft, noSQL, Technical Conference

My SoCalCodeCamp decks – Hadoop, ApprovalTests, BigData and more

I am going to be one busy lady on Saturday, June 23 at SoCalCodeCamp at UCSD in San Diego.  Here’s the schedule.  I am presenting at 5 different talks there all on Saturday.  Here are the decks and sessions:

1) Harnassing the good intentions of others, increasing contributions to open source projects.  Deck TBD – here’s a video talking about the session (which we also present in July at OsCon 2012)

2) Intro to SketchUp  – presented by my 13 year old daughter Samantha (I am just there to smile and say ‘that’s my girl!’

3) Better Unit Testing with ApprovalTests – presenting with Woody Zuill.  Will also be presented at Agile 2012 (national Agile conference in August 2012 – on the Testing Tracks)

4) Intro to Hadoop on Azure – article coming in next month’s MSDN Magazine (publishes July 25, 2012) as well

5) BigData Panel – state of the data industry hosted by Stacey Broadwell.

Agile, Technical Conference

Real-world Entity Framework

Here’s the deck and sample code from our talk ‘Real-world Entity Framework’ at DevTeach Vancouver, BC.  In this talk we show the mechanics of working with database-first EF, using an order entry system.  Our code examples include both read and CRUD operations.  We also talk about lazy vs. eager loading.  We show patterns (using Loaders), which make your code more readable, reusable and testable too.

Sample code includes sample database set up and sample data and completed demo code files – get everything here.

Agile, Big Data, Cloud, noSQL

MongoDB for the .NET Developer

I’ve been spending some time taking a closer look at MongoDB, in conjunction with a consulting job that I’ve been working on for a start-up.  The developers were been unhappy with their experience trying to develop on a RDBMS for the cloud and they asked me to propose alternatives.  While NoSQL is NOT the be-all or end-all for all start-ups, there are certain types of data that lend themselves well to this model. To that end, I’ve created a presentation which I’ll be be sharing at technical events this year.