I’ve been working on some data cleansing projects lately and to that end I’ve tried out working with the DnB Company Cleanse Match Dataset in the Windows Azure Marketplace. This dataset allows you get more complete information about companies and to combine duplicate records. Shown below is a screenshot which illustrates what you can do with this service.
To try it out, you can email DnB for a promo code (send mail to ‘DNB_MS_Partnership_CoreTeam@DNB.com‘). You can use this service in a couple of different ways, these include using it with Excel (PowerQuery or any other service that supports consuming OData feeds), SQL Server 2012 Data Quality Services or programmatically by downloading the proxy class for C# from the Azure Data market (available after you subscribe to the service) and coding against the API.
I’ve made two screencasts to show how this works. First, here’s the screencast on Power Query / API.
Second, here’s the screencast using the dataset with SQL Server 2012 DQS.
Also here’s the stub code for the API:
string USER_ID = "<windows live id user id>"; string ACCT_KEY = "<your key>"; var ROOT_URI = "https://api.datamarket.azure.com/DNB/DQSCompanyMatch/v1/"; var serviceClient = new DnB.DnBContainer(new Uri(ROOT_URI)); serviceClient.Credentials = new NetworkCredential(USER_ID,ACCT_KEY); var l = (from d in serviceClient.SuggestCompanyDetails ("Dell", null, null, null, "TX", null, "US", null, 3, 0) select d); foreach (var a in l) { Console.WriteLine("Result " + a.DunsNumber); } Console.ReadKey(); }