What? Use a Microsoft Azure product (Hadoop on Azure) to run a MapReduce job (using JavaScript) on data stored on AWS S3? Seems like a great blog topic for April 1, doesn’t it? Enjoy the video.
Here’s Microsoft’s Denny Lee’s original blog post, which inspired me to try this out.
Also, in case you are wondering, here is the source code (from the Samples section of the Hadoop on Azure beta site, in JavaScript, to run the ‘WordCount’ MapReduce job.
var map = function (key, value, context)
{ var words = value.split(/[^a-zA-Z]/);
for (var i = 0; i < words.length; i++)
{ if (words[i] !== “”)
{ context.write(words[i].toLowerCase(), 1); } } };
var reduce = function (key, values, context)
{ var sum = 0;
while (values.hasNext())
{ sum += parseInt(values.next()); }
context.write(key, sum);};