OpenRefine's 2022 user survey: the results are in!

Every two years, OpenRefine holds an extensive survey among its users. Our fifth edition was live in April-May 2022. No less than 207 people participated, which breaks our record of 2020 when we received 178 responses.

This year’s survey was a bit more extensive than the previous ones (in 2012, 2014, 2018 and 2020); we now also included questions related to support and communications in the community. For the first time, we also asked you to give the software a general score. On average, survey respondents gave OpenRefine a solid 8 out of 10! This makes the team very happy, and of course (with your help) we hope to improve this score even more over time.

Now, on to more details.

Who are you?

In our 2020 survey, librarians (37.64%) formed an overwhelming majority of respondents. This year, we very proactively reached out to many of OpenRefine’s typical user communities; this has resulted in a more ‘even’ distribution of sectors and communities that our survey respondents hailed from. Based on new popular answers in previous surveys, we added a few extra options to the question “Which field(s), discipline(s) or community/ies do you most identify with?” and respondents could provide more than one answer. Today, librarians are still the largest group of OpenRefine users (15.1%), followed by cultural sector professionals (11.6%), Linked Open Data / semantic web aficionadas (11.2%), researchers (10.1%) and data scientists (9.5%).

The vast majority of people use OpenRefine professionally, but 14.9% of our users do indicate that they mainly use the software in their free time. Unsurprisingly, many of them indicate that they are active in the Wikimedia or OpenStreetMap communities.

For the first time, we asked in which language(s) you use OpenRefine (both its interface and the datasets you interact with). English is dominant with 64.1%; followed by French (9.7%), German (8.1%), Spanish (6.5%) and various other languages (3.2%).

2022 survey respondents use OpenRefine a bit more often than we saw in previous editions.

And you are also increasingly becoming OpenRefine ‘veterans’, with a solid 66.7% saying that you have used the software for more than two years.

You are rating your OpenRefine skill level a bit higher than in the past, too.

You are using OpenRefine for roughly the same purposes as two years ago. We added “data imports from other resources” as a new option, and that’s indeed frequently done. Analyzing existing datasets has become more popular (50.2% now), and preparing datasets before visualization in other applications is done less often (22.16%) than in 2020. Reconciliation (55.1%) also keeps (slowly) growing as a typical activity inside OpenRefine.

What does your OpenRefine installation look like?

53.4% of respondents usually update OpenRefine to its current stable release; an additional 23.8% use an earlier version of OpenRefine 3.x.

As we expected, most people (85%) work with a local version of OpenRefine. However, nearly 10% use, or even run, a version via cloud hosting. We are aware that our users are interested in this, and are curious to see whether this number increases over time.

Which plug-in(s) or extension(s) do you use in OpenRefine, if any? We have revamped this question a bit compared to previous years, because the OpenRefine extension ecosystem is in constant change. While more than 50% of users are either unaware of the existence of extensions, or don’t consciously use any, the Wikidata extension (installed by default in OpenRefine) is used by no less than 26.9% of respondents. Other popular extensions are the RDF extension (8%), VIB-Bits (4%), and GeoJSON export (3.4%).

As mentioned above, reconciliation is slowly becoming more and more popular in OpenRefine. The Wikidata reconciliation service (shipped in OpenRefine by default) is quite dominant, used by 49.1% of survey respondents. VIAF (15.4%), the Getty vocabularies, and in-house reconciliation services (8.6%) follow in popularity. Under “Other”, we see a few new responses from our community of users in the biodiversity domain: Bionomia and the GBIF taxonomy.

How do you perceive OpenRefine?

Which features make you choose OpenRefine over other tools? Many people mention that they appreciate OpenRefine’s GUI, price (free!), flexibility, power, reconciliation features, and relative ease of use.

Which tool(s) would you use if OpenRefine would not be available to you? Excel is a winner here, and quite a few people also mention Python and R. QuickStatements would be an important alternative for Wikimedia and Wikibase users. One user mentions they would “sob the entire time”, which we of course want to prevent.

How do you describe OpenRefine to someone else? Many of the descriptions involve the words “data”, “powerful” and “cleaning”, and we very much appreciate phrases like “spreadsheets on steroids”, “the Ferrari of spreadsheets”, “a librarian’s dream”, “swiss army knife”, or simply “magic”.

We received many feature requests as answers to the prompt “It would be awesome if OpenRefine…”. Quite a few of these major requests are very familiar to us, and also mentioned on our roadmap.

  • Support for large datasets with many columns and/or rows (3 requests). Good news: we are working towards this goal for OpenRefine’s major new 4.x release.
  • A better UX (3 requests), which makes it easier for newcomers to use OpenRefine (3 requests).
  • A free online instance of OpenRefine / hosted OpenRefine (3 requests).
  • Multi-user support in OpenRefine (2 requests).
  • Better Python support (3).
  • Allow working with R for syntax based work (2).
  • More ‘point and click’ functions to replace GREL (2).
  • Some simple data visualizations (2 requests), including the possibility to plot georeferenced data on a map.
  • Easier import from (2 requests) and reconciliation with external datasets via APIs
  • A feature to add new rows (2)
  • Better developed, more explicit and more detailed notifications and warnings (2)

Some feature requests are specific to Wikimedia, Wikibase and Wikidata support:

Several requests relate to reconciliation services:

  • Faster and more powerful reconciliation
  • Reconciliation against a SPARQL query
  • More reconciliation services
  • Less abandoned extensions and reconciliation services (cleanup of inactive and deprecated ones)

Some more requests related to usability and ease of use of OpenRefine:

  • Drag and drop for columns
  • Keyboard accessible GUI
  • Dark mode
  • A language that is more accessible than regex
  • Auto-update when new versions become available

Some requests relate to the way in which OpenRefine works with, and stores, files:

  • More transparent way to store files
  • Integration in OpenOffice/LibreOffice
  • Dynamic links with Google Sheets

Requests related to exporting data:

  • Improved workflow handling, including import/export and multi-project history
  • Preserve hierarchical structure of a dataset and export it too
  • Upload data directly into database
  • Better encoding of diverse characters during export

And finally:

  • More clustering algorithms
  • Improved record mode
  • Parse JSON or XML automatically
  • API calls beyond get
  • Make OpenRefine suited for georeferencing
  • More training (including in underrepresented contexts)

Communication, help and support

37.1% of survey respondents were unaware that OpenRefine has a user mailing list; 25.2% is subscribed to it. You can find, and subscribe to, the mailing list here: https://groups.google.com/g/openrefine

We asked if respondents would like to communicate with other OpenRefine users online and, if so, which channel(s) they would prefer. An online forum, like GitHub discussions or Discourse (like recently initiated by the OpenStreetMap community) is preferred, but our current mailing list is also appreciated. Slack, which is heavily used in many professional contexts, comes third. Good news: we are indeed investigating if an online forum, in addition to OpenRefine’s user mailing list, would be useful and maintainable.

How you want to help OpenRefine

Finally, OpenRefine’s 2022 survey included a question exploring in which ways the community would be willing to help the project. Many individuals and several institutions are interested in donating money, and quite a few people have indicated that they would be willing to translate OpenRefine’s interface or participate in one of its committees. We thank everyone who expressed interest in this, and will follow up via email where relevant.

If you want to help translating OpenRefine’s interface to your language, you can actually get started right away! We offer translation via the web-based Weblate tool. Just click to get started!

Many thanks to everyone who has completed the survey, and we wish you happy refining!

comments powered by Disqus