A new beta release this week from Microsoft is Data Explorer. There are two flavors of this light-weight ETL tool – desktop application or cloud-based tool. Shown below is the header part of both editions.
The team released both CTP versions of the product along with a number of samples to get you started.
What is Data Explorer?
Data Explorer is a data mashup tool. To use it, you identify and configure various data sources, clean the data, merge it together and then package it up for consumption by whatever client you are using, i.e. PowerPivot, Excel, etc… The opening ‘pick data’ screen is shown below.
After you add data, then you ‘clean it’ as shown below. The tools (or menus) are context-sensitive, i.e. when you click on different areas or data types on the mashup, the appropriate tools appear after you click the ‘more tools’ button on the top right of the interface.
How do you work with DE?
After you download and install Data Explorer locally, you can also launch it from the start menu or from inside of Excel 2010. Shown below is the update to the Excel toolbar which includes two buttons, one to open and one to import data into Data Explorer.
In this beta release, importing data is restricted to importing data from an existing DE mashup, as shown in the screen below.
The provider name is Montego (the product code name) as shown below.
Of course when you are working in either the desktop client or the web client for DE, you can save and optionally publish your mashup. The publish screen is shown below.
What are the scenarios to use Data Explorer?
The idea here is to expose classic ETL (Extract, Transform and Load) technologies to a non-technical user. The interface is designed to look like an Office ribbon. As mentioned, the interface is also context-sensitive, i.e. where you click or what you are working on shows a particular to that item, an example list of tools (for a tabular object) is shown below.
The user can mashup data from all of the types of sources included, i.e. relational, RSS, SharePoint, files, etc… and then clean it up so that it is more meaningful. Then the result can be packaged and sent for consumption by any type of data viewer that can consume one of the output formats produced, for example OData reader, PowerPivot, Excel, etc…
What features are interesting?
The flexibility of working with the mashed-up data seems useful to me. You can import, export, publish, snapshot and extend your mashups. Also I am happy to see a pretty broad variety of data sources well-organized into the three categories – Connect (Data), Files (Files) and Input (Direct Copy or just type info). I think these support the business analyst user well. I tried out many of the input formats and they all seemed to work just fine – SQL Server Data, SQL Azure data, Windows Azure Data Market data, RSS feed, website data, CSV files, direct text input and formula-generated input.
The UX, exposing only those clean-up tools that are appropriate seems pretty well-designed and intuitive to me.
In addition to clicking to explore and/or clean up data, there is an included Data Explorer formula-language (similar in format to Excel formula language). You can type directly in the formula toolbar as shown below.
Also here is a blog post with more info on how formulas work in Data Explorer. Shown below is an example of generating a list of numbers using the List.Numbers formula.
Another part of the UX that I really like is the ability to click on any phase of the data transformation pipeline (as highlighted in the screen shot below) and to be able to view the data when it was in that state.
I also tried out the statistical sampling functionality. These tools are shown below. Admittedly these tools are very light-weight, however for what is there, the tools are pretty discoverable and easy-to-understand.
Another thing I tried out is the Merge functionality. Shown below, it helps the user to ‘join’ data in a simple interface.
Your published feed can be accessed via a number of formats, such as CSV, Excel, a DE mashup or an OData feed – show below is sample published mashup output.
What seems to be missing?
I would like to see the ability to branch in the data transformation flow. I am not sure if the team has this in mind yet, but it seems like a natural enhancement.
What seems to be poorly implemented?
Although the web-edition seems to work ok in IE, I experienced much crashing when trying to use it in Google Chrome. The product team verified that they are “focusing on IE functionality at this point in the release cycle.”
What are the next steps?
The DE team published a bunch of scenarios to try out on their team blog – here. This is a first beta release and I look forward to the maturing of this interesting product. I’d be interested to hear from others who are trying it out too. What are your thoughts?