OpenRefine Usage
OpenRefine is a free, open source power tool for working with messy data and improving it: cleaning it, transforming it from one format into another, and extending it with web services and external data. Requiring no knowledge of a programming or query language, it lets users find and fix inconsistencies interactively, match their data to external databases, pull additional data from these, and perform many other useful operations. The resulting workflows can be extracted and applied to other datasets.
OpenRefine is downloaded on average 15,500 times per month and received over 800 academic citations in 2023.
Our Users Community
OpenRefine is used by many communities and industries due to its user-friendly interface and flexibility.
- Journalists and Media Professionals use OpenRefine to clean and prepare data for investigative reporting, analysis, and visualization in news stories.
- GLAM (Galleries, Libraries, Archives, and Museums) utilizes OpenRefine to clean and enhance catalog records related to artworks and cultural heritage artifacts.
- Wikipedians and Wikimedia Contributors: OpenRefine is a popular tool within the Wikipedia community, enabling users to manage and improve structured data on Wikimedia projects like Wikidata and Wiki Commons.
- Scientists and Researchers across various scientific disciplines, including social, natural, and health sciences, use OpenRefine to clean, transform, and organize research data.
- Data Analysts and Scientists leverage OpenRefine to preprocess and clean data, ensuring high data quality before analysis.
- Educators and Trainers: OpenRefine is integrated into educational curricula and workshops, allowing educators to teach students data wrangling and cleaning skills effectively.
The graphic below shows which communities our users identified with most, based on our 2022 user survey. Please note that each user may identify with multiple communities.
Academic Citations
OpenRefine is used by many academics in their research and cited in their publications. OpenRefine is also available on Zenodo with the DOI-10.5281 if you intend to cite it. The table below track the number of citation per year based by searching the following terms on Google Scholar:
(*) 2024 data are up to December 3rd, 2024.
Forum Statistics
In November 2022, we moved from email lists hosted by Google Groups to a Discourse forum.
As of December 3rd, 2024, over the last 12 months:
- 271 new users signed up1 on our forum for a total of 637 users2.
- 364 topics3 were created for a total of 1,900 messages4.
Contribution Statistics
The statistics below only track activity on our primary GitHub repository, which includes code, design, and translation contributions. Our documentation is available in a separate repository.
As of December 3rd, 2024, over the last 12 months we had:
- 33 active GitHub contributors;
- 241 issues created and 192 closed;
- 223 PRs merged (excluding those created by dependabot)
The following graphic represents the average number of active contributors to the main repository each year5. You can also review the Github pulse and Github traffic pages for real time insights on the activities on our main repository.
(*) 2024 data are up to December 3rd, 2024.
Footnotes
-
We are counting the number of contributor using the following command
git log --all --pretty="%an" | sort | uniq | wc -l
↩