Big Data, Data Science, Microsoft

Looking at Slideshare data with Excel PowerBI

I’ll admit it, I am really hooked on PowerBI in Excel 2013 – in particular, I find many uses for PowerQuery (shown below).


To that end, I was playing around with my Slideshare analytics data in PowerBI and I learned a thing or two.  Below is a screen of my data as I was filtering and shaping it in PowerQuery.  You may note some of the transformation steps to the right of the screen below.

Using PowerQuery to filter and shape Slideshare data


If you are a scripting kind of person, here’s a look at part of the transformation script generated by PowerQuery.

PowerQuery transformation script

I also made a short screencast, showing the process I used in more detail.

In case you don’t want to watch the video, here’s what I found – database people tweet!

Database people love to tweet!


Cloud, Data Science, Microsoft, SQL Server 2012

Using D&B Company Cleanse Match with PowerBI and SSIS

I’ve been doing some work with the Dun & Bradstreet Company Cleanse & Match offer in the Windows Azure Marketplace.  A common data quality scenario I encounter is the business need to create more complete customer (company) records for various business reasons (marketing, collection, etc…).

Shown below is a sample of how this offer works.  You can see that three records with different types and amounts of information have been combined into one completed record.  You may also note that the D&B D-U-N-S number has been associated with the identified company.

D&B Cleanse&Match
D&B Cleanse&Match

This D&B offer brings three important concepts to this table:

1) The D&B business database is global, comprehensive and verified.
2) D&B uses a proprietary, powerful and configurable cleanse/match algorithm to correct, complete and de-duplicate records.
3) D&B offers flexibility in terms of integration with Microsoft APIs and Tools.

I’ve already blogged (and screencasted) about the integration between D&B Cleanse & Match and SQL Server 2012 Data Quality Services.  In today’s blog, I’ll include information about working with SQL Server 2012 SSIS and about integration with PowerBI.

Note: In addition to the D&B Cleanse & Match offer, D&B has several other offers in the Windows Azure Marketplace and also in the Windows Azure Store.  For example, here is a screenshot of the integration of their ‘Business Insight’ offer from within the Windows Azure Store (in the Windows Azure portal).

D&B Business Insight
D&B Business Insight

If you are at the SQL Pass Summit this week, be sure to stop by the D&B booth to get access to the preview SSIS component and to learn more about their many offers on the Windows Azure Marketplace.  They are also running some fun contests (for cash!) at the show.

Below are a series of screencasts which show the integration between D&B Cleanse & Match and Microsoft products in greater detail.  First is the SSIS component demo.

Next is a series of 3 demos, which provide a detailed use case (creating a rich customer contact list for a growing business in a particular industry) using public data, PowerBI and D&B data and algorithms to produce a complete, validated, useful prospect list.

In part one, I use PowerQuery to shape public (US Census) data

In part two, I use PowerQuery and D&B data to create a targeted company contact list, with the attributes I value for this scenario (such as ‘green-certified’) and those that I’ve identified based on my earlier data research (such as which US States I was to focus on).

In part three, I again use PowerQuery and D&B data to further enrich the prospect list, by adding actual contact information (names, email addresses and phone numbers) to produce an actionable prospect list for my marketing team. I also show the new Data Gateway (Eldorado) from Office 365.

Are you interested in learning more?  Check out this information page.

Big Data, Cloud, Data Science, Microsoft, SQL Server 2012

Working with DnB Company Cleanse Match Data

DnB Cleanse Match

I’ve been working on some data cleansing projects lately and to that end I’ve tried out working with the DnB Company Cleanse Match Dataset in the Windows Azure Marketplace.  This dataset allows you get more complete information about companies and to combine duplicate records.  Shown below is a screenshot which illustrates what you can do with this service.

DnB Company Cleanse Match

To try it out, you can email DnB for a promo code (send mail to ‘‘).  You can use this service in a couple of different ways, these include using it with Excel (PowerQuery or any other service that supports consuming OData feeds), SQL Server 2012 Data Quality Services or programmatically by downloading the proxy class for C# from the Azure Data market (available after you subscribe to the service) and coding against the API.

I’ve made two screencasts to show how this works.  First, here’s the screencast on Power Query / API.

Second, here’s the screencast using the dataset with SQL Server 2012 DQS.

Also here’s the stub code for the API:

string USER_ID = "<windows live id user id>";
string ACCT_KEY = "<your key>";
var ROOT_URI = "";
var serviceClient = new DnB.DnBContainer(new Uri(ROOT_URI));
 serviceClient.Credentials = new NetworkCredential(USER_ID,ACCT_KEY);
var l =
(from d in serviceClient.SuggestCompanyDetails
 ("Dell", null, null, null, "TX", null, "US", null, 3, 0)
 select d);

foreach (var a in l)
    Console.WriteLine("Result " + a.DunsNumber);


AWS, Cloud, Data Science

Understanding AWS Pricing

AWS Console
AWS Console

Because I get asked so regularly, I made a deck and screencast with the goal of helping you to understand and to get the best value for AWS pricing.  Here are list of useful tools, when trying to understand AWS pricing:

1) AWS Free Tier information – here
2) AWS Pricing Calculator – here
3) About AWS billing – here
4) RightScale Plan for Cloud cross-cloud pricing calculator – here

Here’s the deck

Here’s the screencast

Feel free to share tips and information that you have for understanding AWS pricing in the comments section of this blog post as well.


Big Data, Data Science, facebook

Visualizing Facebook Birthday greetings using Power Query

To help to visualize the many, wonderful birthday greetings I got from Facebook yesterday, I tried out using Power Query for Excel so that I could visualize the locations from which I got greetings.  Other tools I used were LINQPad and the Facebook Developer’s Graph API Explorer tool.

PowerQuery with Facebook Data

Below is a screenshot of the results and a screencast.

Are you using Power Query?  Share your feedback here.

AWS, Azure, Big Data, Cloud, Data Science, Hadoop, Microsoft

New YouTube Series – Hadoop MapReduce Fundamentals

Hadoop MapReduce
Hadoop MapReduce

I’ve been working with Hadoop MapReduce in several formats over the past couple of years.  I decided to pull together my experience and record this as a free, multi-part screencast series on YouTube.

The course consists of 5 screencasts – from 30 – 50 minutes per part.  Each part tackles some aspect of Hadoop MapReduce, from basic, conceptual understanding to most common tuning processes.  Throughout the series, I’ve included screencast demos using a variety of vendor distributions of Hadoop.  These demos include Cloudera CHD4, Windows Azure HDInsight, AWS MapReduce and more.

Below is the first module of the course.

Here is a link to the entire Power Point deck.

Here is a link to the course demo files.