Skip to main content

OpenRefine Usage

OpenRefine is a free, open source power tool for working with messy data and improving it: cleaning it, transforming it from one format into another, and extending it with web services and external data. Requiring no knowledge of a programming or query language, it lets users find and fix inconsistencies interactively, match their data to external databases, pull additional data from these, and perform many other useful operations. The resulting workflows can be extracted and applied to other datasets.

OpenRefine is downloaded on average 15,500 times per month and received over 800 academic citations in 2023.

Our Users Community

OpenRefine is used by many communities and industries due to its user-friendly interface and flexibility.

  1. Journalists and Media Professionals use OpenRefine to clean and prepare data for investigative reporting, analysis, and visualization in news stories.
  2. GLAM (Galleries, Libraries, Archives, and Museums) utilizes OpenRefine to clean and enhance catalog records related to artworks and cultural heritage artifacts.
  3. Wikipedians and Wikimedia Contributors: OpenRefine is a popular tool within the Wikipedia community, enabling users to manage and improve structured data on Wikimedia projects like Wikidata and Wiki Commons.
  4. Scientists and Researchers across various scientific disciplines, including social, natural, and health sciences, use OpenRefine to clean, transform, and organize research data.
  5. Data Analysts and Scientists leverage OpenRefine to preprocess and clean data, ensuring high data quality before analysis.
  6. Educators and Trainers: OpenRefine is integrated into educational curricula and workshops, allowing educators to teach students data wrangling and cleaning skills effectively.

The graphic below shows which communities our users identified with most, based on our 2022 user survey. Please note that each user may identify with multiple communities.

Academic Citations

OpenRefine is used by many academics in their research and cited in their publications. OpenRefine is also available on Zenodo with the DOI-10.5281 if you intend to cite it. The table below track the number of citation per year based by searching the following terms on Google Scholar:

Forum Statistics

In November 2022, we moved from email lists hosted by Google Groups to a Discourse forum.

As of March 20, 2024, over the last 12 months:

  • 244 new users signed up on our forum for a total of 470 users.
  • 324 topics were created for a total of 1,700 messages.

Contribution Statistics

The statistics below only track activity on our primary GitHub repository, which includes code, design, and translation contributions. Our documentation is available in a separate repository.

As of March 20th, 2024, over the last 12 months we had:

The following graphic represent a three months rolling average of the number of active contributors on the main repository 1. You can also review the Github pulse and Github traffic pages for real time insights on the activities on our main repository.

Footnotes

  1. We are counting the number of contributor using the following command git log --all --pretty="%an" | sort | uniq | wc -l