GREL functions
Besides the reference guide below, OpenRefine's GitHub wiki has a page with many examples and recipes of frequently-used GREL functions.
Reading this reference
For the reference below, the function is given in full-length notation and the in-text examples are written in dot notation. Shorthands are used to indicate the kind of data type used in each function: s for string, b for boolean, n for number, d for date, a for array, p for a regex pattern, and o for object (meaning any data type), as well as “null” and “error” data types.
If a function can take more than one kind of data as input or can output more than one kind of data, that is indicated with more than one letter (as with “s or a”) or with o for object, meaning it can take any type of data (string, boolean, date, number, etc.).
We also use shorthands for substring (“sub”) and separator string (“sep”). Optional arguments will say “(optional)”.
In places where OpenRefine will accept a string (s) or a regex pattern (p), you can supply a string by putting it in quotes. If you wish to use any regex notation, wrap the pattern in forward slashes.
Boolean functions
and(b1, b2, ...)
Uses the logical operator AND on two or more booleans to output a boolean. Evaluates multiple statements into booleans, then returns true if all of the statements are true. For example, (1 < 3).and(1 < 0)
returns false because one condition is true and one is false.
or(b1, b2, ...)
Uses the logical operator OR on two or more booleans to output a boolean. For example, (1 < 3).or(1 > 7)
returns true because at least one of the conditions (the first one) is true.
not(b)
Uses the logical operator NOT on a boolean to output a boolean. For example, not(1 > 7)
returns true because 1 > 7 itself is false.
xor(b1, b2, ...)
Uses the logical operator XOR (exclusive-or) on two or more booleans to output a boolean. Evaluates multiple statements, then returns true if only one of them is true. For example, (1 < 3).xor(1 < 7)
returns false because more than one of the conditions is true.
String functions
length(s)
Returns the length of string s as a number.
levenshteinDistance(s1, s2)
Returns an integer indicating how many single-character changes you need to make to turn one string into another. This includes adding characters, removing characters, and changing characters to match between the two strings. For example, levenshteinDistance("New York", "newyork")
will return 1 and levenshteinDistance("M. Makeba", "Miriam Makeba")
will return 5.
Note: Before calculating the Levenshtein distance, the input strings are preprocessed by trimming spaces, removing punctuation and control characters, collapsing consecutive whitespace, and converting them to lowercase.
toString(o, string format (optional))
Takes any value type (string, number, date, boolean, error, null) and gives a string version of that value.
You can use toString() to convert numbers to strings with rounding, using an optional string format. For example, if you applied the expression value.toString("%.0f")
to a column:
Input | Output |
---|---|
3.2 | 3 |
0.8 | 1 |
0.15 | 0 |
100.0 | 100 |
You can also convert dates to strings, using date parsing syntax built into OpenRefine (see the toDate() function for details). For example, value.toString("MMM-dd-yyyy")
would convert the date value [2024-10-15T00:00:00Z] to “Oct-15-2024”.
Note: In OpenRefine, using toString() on a null cell outputs the string “null”.
Testing string characteristics
startsWith(s, sub)
Returns a boolean indicating whether s starts with sub. For example, "food".startsWith("foo")
returns true, whereas "food".startsWith("bar")
returns false.
endsWith(s, sub)
Returns a boolean indicating whether s ends with sub. For example, "food".endsWith("ood")
returns true, whereas "food".endsWith("odd")
returns false.
contains(s, sub or p)
Returns a boolean indicating whether s contains sub, which is either a substring or a regex pattern. For example, "food".contains("oo")
returns true whereas "food".contains("ee")
returns false.
You can search for a regular expression by wrapping it in forward slashes rather than quotes: "rose is a rose".contains(/\s+/)
returns true. startsWith() and endsWith() can only take strings, while contains() can take a regex pattern, so you can use contains() to look for beginning and ending string patterns.
Basic string modification
Case conversion
toLowercase(s)
Returns string s converted to all lowercase characters.
toUppercase(s)
Returns string s converted to all uppercase characters.
toTitlecase(s, delimiters (optional))
Returns string s converted into titlecase: a capital letter starting each word, and the rest of the letters lowercase. For example, "Once upon a midnight DREARY".toTitlecase()
returns the string “Once Upon A Midnight Dreary”.
Optionally, a string of delimiter characters used to separate words. The first character and the first non-delimiter character after a delimiter will be captialized. For example, "jean-michel jarre".toTitlecase(" -")
returns "Jean-Michel Jarre".
Trimming
trim(s)
Returns a copy of the string s with leading and trailing whitespace removed. For example, " island ".trim()
returns the string “island”. Identical to strip().
strip(s)
Returns a copy of the string s with leading and trailing whitespace removed. For example, " island ".strip()
returns the string “island”. Identical to trim().
chomp(s, sep)
Returns a copy of string s with the string sep removed from the end if s ends with sep; otherwise, just returns s. For example, "barely".chomp("ly")
and "bare".chomp("ly")
both return the string “bare”.
Substring
substring(s, n from, n to (optional))
Returns the substring of s starting from character index from, and up to (excluding) character index to. If the to argument is omitted, substring will output to the end of s. For example, "profound".substring(3)
returns the string “found”, and "profound".substring(2, 4)
returns the string “of”.
Remember that character indices start from zero. A negative character index counts from the end of the string. For example, "profound".substring(0, -1)
returns the string “profoun”.
slice(s, n from, n to (optional))
Identical to substring() in relation to strings. Also works with arrays; see Array functions section.
get(s, n from, n to (optional))
Similar to substring() when used in relation to strings, but when using get
in the case that the second argument n to
is omitted a single character will be returned. get()
also works with named fields, for example to retrieve a JSON property or OpenRefine variable. Also works with arrays; see Array functions section.
Find and replace
indexOf(s, sub)
Returns the first character index of sub as it first occurs in s; or, returns -1 if s does not contain sub. For example, "internationalization".indexOf("nation")
returns 5, whereas "internationalization".indexOf("world")
returns -1.
lastIndexOf(s, sub)
Returns the first character index of sub as it last occurs in s; or, returns -1 if s does not contain sub. For example, "parallel".lastIndexOf("a")
returns 3 (pointing at the second “a”).
replace(s, s or p find, s replace)
Returns the string obtained by replacing the find string with the replace string in the inputted string. For example, "The cow jumps over the moon and moos".replace("oo", "ee")
returns the string “The cow jumps over the meen and mees”. Find can be a regex pattern. For example, "The cow jumps over the moon and moos".replace(/\s+/, "_")
will return “The_cow_jumps_over_the_moon_and_moos”.
You cannot find or replace nulls with this, as null is not a string. You can instead:
- Facet by null and then bulk-edit them to a string, or
- Transform the column with an expression such as
if(value==null,"new",value)
.
replaceChars(s, s find, s replace)
Returns the string obtained by replacing a character in s, identified by find, with the corresponding character identified in replace. For example, "Téxt thát was optícálly recógnízéd".replaceChars("