Today at the Strata conference The Stanford Visualization Group debuted a Web-based visual tool for cleaning up messy data called DataWrangler. According to its website, “Wrangler allows interactive transformation of messy, real-world data into the data tables analysis tools expect.” Data can be exported as a CSV or TSV or as JSON data.
Data wranglers can use the tool with the group’s data visualization tool Protovis, or with tools such as Excel, R and Tableau.
The group has released a paper explaining how the tool works. Joseph M. Hellerstein explains the origins of the project in a blog post:
Another thing I often hear is that a large fraction of the time spent by analysts — some say the majority of time — involves data preparation and cleaning: transforming formats, rearranging nesting structures, removing outliers, and so on. (If you think this is easy, you’ve never had a stack of ad hoc Excel spreadsheets to load into a stat package or database!)
Putting these together, something is very wrong: high-powered people are wasting most of their time doing low-function work. And the challenge of improving this state of affairs has fallen in the cracks between the analysts and computer scientists.
It will compete with Google Refine, which we covered here.
0 Responses
Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.