AWS, Azure, Big Data, Cloud, Hadoop

Hadoop on Azure – JavaScript MapReduce using AWS S3 data

What? Use a Microsoft Azure product (Hadoop on Azure) to run a MapReduce job (using JavaScript) on data stored on AWS S3? Seems like a great blog topic for April 1, doesn’t it? Enjoy the video.

Here’s Microsoft’s Denny Lee’s original blog post, which inspired me to try this out.
Also, in case you are wondering, here is the source code (from the Samples section of the Hadoop on Azure beta site, in JavaScript, to run the ‘WordCount’ MapReduce job.

var map = function (key, value, context)
{ var words = value.split(/[^a-zA-Z]/);
for (var i = 0; i < words.length; i++)
{ if (words[i] !== “”)
{ context.write(words[i].toLowerCase(), 1); }   } };

var reduce = function (key, values, context)
{  var sum = 0;
while (values.hasNext())
{ sum += parseInt(values.next());  }
context.write(key, sum);};

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s