OpenRefine (ex-Google Refine) is a powerful tool for working with messy data, cleaning it, transforming it from one format into another, extending it with web services, and linking it to databases like Freebase.
Please note that since October 2nd, 2012, Google is not supporting actively this project which have been rebranded to OpenRefine. Project development, documentation and promotion is now fully supported by volunteers. Find out more about the history of OpenRefine and how you can help the community.
Using OpenRefine - The Book
Using OpenRefine by Ruben Verborgh, Max De Wilde offer a great introduction for anyone with little experience with OpenRefine. Organized by recipes with hands on example, the book cover the following topics:
- Import data in various formats
- Explore datasets in a matter of seconds
- Apply basic and advanced cell transformations
- Deal with cells that contain multiple values
- Create instantaneous links between datasets
- Filter and partition your data easily with regular expressions
- Use named-entity extraction on full-text fields to automatically identify topics
- Perform advanced data operations with the General Refine Expression Language
Introduction to OpenRefine
1. Explore Data
OpenRefine can help you explore large data sets with ease. You can find out more about this functionality by watching the video below and going through these articles
2. Clean and Transform Transform Data
3. Reconcile / Match
OpenRefine can be used to link and extend your dataset with various webservices. Some services like Freebase also allow OpenRefine to update your cleaned data to the central database. A growing list of extension and plugin with sources is available on the wiki.