Big Data, noSQL, SQL Server 2012

MongoDB MapReduce vs. SQL Server group by – Which is faster?

I decided to try out a well-written sample on the MongoVue site to compare just how MapReduce with MongoDB vs. good old T-SQL group by really work.  I had to make a couple of tweaks to what they had written because:

1) There was one error in the MapReduce code example.  I point this out in the screencast.

2) I wanted to use SQL Server rather than MySQL for the RDBMS test.

I made two short screencasts showing exactly how I proceeded.  For completeness on the SQL Server side I did two comparisons.  The first one was over a heap table.  Next I took the recommendations of the query optimizer, which included to create not only a clustered index, but also a non-clustered index with included columns for the lat/long values.  Below is the 5 minute video showing the detail.

Then I ran the MapReduce job as written in the example on their blog using the MongoVue interface.  It took me a bit of fiddling around to get the sample into the MongoVue interface.  I have a tip if you plan to try to get this to work — work with the output console (named ‘Learn Shell’) at the bottom of the MongoVue interface to verify that you’ve entered the MapReduce code correctly and into the correct section of the GUI interface.  Finally the ‘In & Out’ section of the MapReduce interface in MongoVue wasn’t very well explained on the MongoVue site.  Take a look at my screencast to see how I chose to work with that section.

It was interesting to note a couple of things through this process:

1) The T-SQL execution proved to be the fastest solution for this problem for a data set of this size, even BEFORE I did any optimization.  The T-SQL query (group by) query on the view against the heap table ran in 5 seconds on the 37,000+ records.  After optimization (adding the indexes to the base table and recreating the view), that same query ran in 3 seconds.

2) The MapReduce took 11 seconds to run on my instance of MongoDB. My instance has no sharding and no replicas.

If you’d like to try this out as well, I’ve zipped the source data and sample code and made it available for download here.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s