Skip to main content

Using OpenRefine: a manual

· 2 min read
Ruben Verborgh

“How do I get started?” is the question we received most during our hands-on workshops on data cleaning and enhancing. OpenRefine is a very powerful tool in the hands of a skilled user, but how do you become one?

There is a wiki, several screencasts, and a list of helpful resources. However, until recently, no complete OpenRefine manual existed, so you had to collect documentation from different sources if you wanted to master OpenRefine.

Using OpenRefine book cover This is why we've written an OpenRefine manual called Using OpenRefine that leads you from your first steps to all advanced OpenRefine topics.

Using the entire dataset of the Powerhouse Museum, it lets you experience OpenRefine techniques in a hands-on way, starting from creating a project and inspecting data and gradually evolving towards complex operations. Rather than being a one-directional text, this book offers detailed recipes you can pick whenever you need them.

In particular, you'll learn about these topics in Using OpenRefine:

  • importing data in various formats
  • exploring datasets in a matter of seconds
  • applying basic and advanced cell transformations
  • dealing with cells that contain multiple values
  • creating instantaneous links between datasets
  • filtering and partition your data easily with regular expressions
  • using named-entity extraction on full-text fields to automatically identify topics
  • performing advanced data operations with the General Refine Expression Language

Get started with OpenRefine right way—for free

Download the entire second chapter of the book for free, so you can already learn about sorting, facets, filters, duplicates, and more. It's the fastest way to get you up to speed with OpenRefine. If you also want to learn about advanced transformation and about connecting your data to the Linked Data cloud, buy the paperback or e-book today!

—Ruben and Max
authors of Using OpenRefine