GREL functions

Besides the reference guide below, OpenRefine's GitHub wiki has a page with many examples and recipes of frequently-used GREL functions.

Reading this reference

For the reference below, the function is given in full-length notation and the in-text examples are written in dot notation. Shorthands are used to indicate the kind of data type used in each function: s for string, b for boolean, n for number, d for date, a for array, p for a regex pattern, and o for object (meaning any data type), as well as “null” and “error” data types.

If a function can take more than one kind of data as input or can output more than one kind of data, that is indicated with more than one letter (as with “s or a”) or with o for object, meaning it can take any type of data (string, boolean, date, number, etc.).

We also use shorthands for substring (“sub”) and separator string (“sep”). Optional arguments will say “(optional)”.

In places where OpenRefine will accept a string (s) or a regex pattern (p), you can supply a string by putting it in quotes. If you wish to use any regex notation, wrap the pattern in forward slashes.

Boolean functions

and(b1, b2, ...)

Uses the logical operator AND on two or more booleans to output a boolean. Evaluates multiple statements into booleans, then returns true if all of the statements are true. For example, (1 < 3).and(1 < 0) returns false because one condition is true and one is false.

or(b1, b2, ...)

Uses the logical operator OR on two or more booleans to output a boolean. For example, (1 < 3).or(1 > 7) returns true because at least one of the conditions (the first one) is true.

not(b)

Uses the logical operator NOT on a boolean to output a boolean. For example, not(1 > 7) returns true because 1 > 7 itself is false.

xor(b1, b2, ...)

Uses the logical operator XOR (exclusive-or) on two or more booleans to output a boolean. Evaluates multiple statements, then returns true if only one of them is true. For example, (1 < 3).xor(1 < 7) returns false because more than one of the conditions is true.

String functions

length(s)

Returns the length of string s as a number.

levenshteinDistance(s1, s2)

Returns an integer indicating how many single-character changes you need to make to turn one string into another. This includes adding characters, removing characters, and changing characters to match between the two strings. For example, levenshteinDistance("New York", "newyork") will return 1 and levenshteinDistance("M. Makeba", "Miriam Makeba") will return 5.

Note: Before calculating the Levenshtein distance, the input strings are preprocessed by trimming spaces, removing punctuation and control characters, collapsing consecutive whitespace, and converting them to lowercase.

toString(o, string format (optional))

Takes any value type (string, number, date, boolean, error, null) and gives a string version of that value.

You can use toString() to convert numbers to strings with rounding, using an optional string format. For example, if you applied the expression value.toString("%.0f") to a column:

Input	Output
3.2	3
0.8	1
0.15	0
100.0	100

You can also convert dates to strings, using date parsing syntax built into OpenRefine (see the toDate() function for details). For example, value.toString("MMM-dd-yyyy") would convert the date value [2024-10-15T00:00:00Z] to “Oct-15-2024”.

Note: In OpenRefine, using toString() on a null cell outputs the string “null”.

info

Date/times without timezone info were interpreted as local up until May 2018 when OpenRefine 3.0 was released, at which point they were switched from local to UTC. See issue #6009

Testing string characteristics

startsWith(s, sub)

Returns a boolean indicating whether s starts with sub. For example, "food".startsWith("foo") returns true, whereas "food".startsWith("bar") returns false.

endsWith(s, sub)

Returns a boolean indicating whether s ends with sub. For example, "food".endsWith("ood") returns true, whereas "food".endsWith("odd") returns false.

contains(s, sub or p)

Returns a boolean indicating whether s contains sub, which is either a substring or a regex pattern. For example, "food".contains("oo") returns true whereas "food".contains("ee") returns false.

You can search for a regular expression by wrapping it in forward slashes rather than quotes: "rose is a rose".contains(/\s+/) returns true. startsWith() and endsWith() can only take strings, while contains() can take a regex pattern, so you can use contains() to look for beginning and ending string patterns.

Basic string modification

Case conversion

toLowercase(s)

Returns string s converted to all lowercase characters.

toUppercase(s)

Returns string s converted to all uppercase characters.

toTitlecase(s, delimiters (optional))

Returns string s converted into titlecase: a capital letter starting each word, and the rest of the letters lowercase. For example, "Once upon a midnight DREARY".toTitlecase() returns the string “Once Upon A Midnight Dreary”.

Optionally, a string of delimiter characters used to separate words. The first character and the first non-delimiter character after a delimiter will be captialized. For example, "jean-michel jarre".toTitlecase(" -") returns "Jean-Michel Jarre".

Trimming

trim(s)

Returns a copy of the string s with leading and trailing whitespace removed. For example, " island ".trim() returns the string “island”. Identical to strip().

strip(s)

Returns a copy of the string s with leading and trailing whitespace removed. For example, " island ".strip() returns the string “island”. Identical to trim().

chomp(s, sep)

Returns a copy of string s with the string sep removed from the end if s ends with sep; otherwise, just returns s. For example, "barely".chomp("ly") and "bare".chomp("ly") both return the string “bare”.

Substring

substring(s, n from, n to (optional))

Returns the substring of s starting from character index from, and up to (excluding) character index to. If the to argument is omitted, substring will output to the end of s. For example, "profound".substring(3) returns the string “found”, and "profound".substring(2, 4) returns the string “of”.

Remember that character indices start from zero. A negative character index counts from the end of the string. For example, "profound".substring(0, -1) returns the string “profoun”.

slice(s, n from, n to (optional))

Identical to substring() in relation to strings. Also works with arrays; see Array functions section.

get(s, n from, n to (optional))

Similar to substring() when used in relation to strings, but when using get in the case that the second argument n to is omitted a single character will be returned. get() also works with named fields, for example to retrieve a JSON property or OpenRefine variable. Also works with arrays; see Array functions section.

Find and replace

indexOf(s, sub)

Returns the first character index of sub as it first occurs in s; or, returns -1 if s does not contain sub. For example, "internationalization".indexOf("nation") returns 5, whereas "internationalization".indexOf("world") returns -1.

lastIndexOf(s, sub)

Returns the first character index of sub as it last occurs in s; or, returns -1 if s does not contain sub. For example, "parallel".lastIndexOf("a") returns 3 (pointing at the second “a”).

replace(s, s or p find, s replace)

Returns the string obtained by replacing the find string with the replace string in the inputted string. For example, "The cow jumps over the moon and moos".replace("oo", "ee") returns the string “The cow jumps over the meen and mees”. Find can be a regex pattern. For example, "The cow jumps over the moon and moos".replace(/\s+/, "_") will return “The_cow_jumps_over_the_moon_and_moos”.

You cannot find or replace nulls with this, as null is not a string. You can instead:

Facet by null and then bulk-edit them to a string, or
Transform the column with an expression such as if(value==null,"new",value).

replaceChars(s, s find, s replace)

Returns the string obtained by replacing a character in s, identified by find, with the corresponding character identified in replace. For example, "Téxt thát was optícálly recógnízéd".replaceChars("áéíóú", "aeiou") returns the string “Text that was optically recognized”. You cannot use this to replace a single character with more than one character.

replaceEach(s, a find, a replace)

Returns the string obtained by replacing each element in s of a find array with the corresponding element of a replace array, sequentially. For example, "The cow jumps over the moon and moos".replaceEach(["th", "moo"], ["ex", "mee"]) returns the string “The cow jumps over exe meen and mees”.

The length of the find array must be the same as the length of the replace array. If the length of the find array is greater than the length of the replace array, the last element of the replace array is used for all remaining elements in the find array.

This function is available since OpenRefine 3.6.

find(s, sub or p)

Outputs an array of all consecutive substrings inside string s that match the substring or regex pattern p. For example, "abeadsabmoloei".find(/[aeio]+/) would result in the array [ "a", "ea", "a", "o", "oei" ].

You can supply a substring instead of p, by putting it in quotes, and OpenRefine will compile it into a regex pattern. Anytime you supply quotes, OpenRefine interprets the contents as a string, not regex. If you wish to use any regex notation, wrap the pattern in forward slashes.

tip

match(s, p)

Attempts to match the string s in its entirety against the regex pattern p and, if the pattern is found, outputs an array of all capturing groups (found in order). For example, "230.22398, 12.3480".match(/.*(\d\d\d\d)/) returns an array of 1 substring: [ "3480" ]. It does not find 2239 as the first sequence with four digits, because the regex indicates the four digits must come at the end of the string.

You will need to convert the array to a string to store it in a cell, with a function such as toString(). An empty array [] is returned when there is no match to the desired substrings. A null is output when the entire regex does not match.

Remember to enclose your regex in forward slashes, and to escape characters and use parentheses as needed. Parentheses denote a desired substring (capturing group); for example, “.*(\d\d\d\d)” would return an array containing a single value, while “(.*)(\d\d\d\d)” would return two. So, if you are looking for a desired substring anywhere within a string, use the syntax value.match(/.*(desired-substring-regex).*/).

For example, if value is “hello 123456 goodbye”, the following would occur:

Expression	Result
`value.match(/\d{6}/)`	null (does not match the full string)
`value.match(/.\d{6}./)`	[ ] (no indicated substring)
`value.match(/.(\d{6})./)`	[ "123456" ] (array with one value)
`value.match(/(.)(\d{6})(.)/)`	[ "hello ", "123456", " goodbye" ] (array with three values)

tip

String parsing and splitting

toNumber(s)

Returns a string converted to a number. Will attempt to convert other formats into a string, then into a number. If the value is already a number, it will return the number.

split(s, s or p sep, b preserveTokens (optional))

Returns the array of strings obtained by splitting s by sep. The separator can be either a string or a regex pattern. For example, "fire, water, earth, air".split(",") returns an array of 4 strings: [ "fire", " water", " earth", " air" ]. Note that the space characters are retained but the separator is removed. If you include “true” for the preserveTokens boolean, empty segments are preserved.

splitByLengths(s, n1, n2, ...)

Returns the array of strings obtained by splitting s into substrings with the given lengths. For example, "internationalization".splitByLengths(5, 6, 3) returns an array of 3 strings: [ "inter", "nation", "ali" ]. Excess characters are discarded from the output array.

Like other functions that return an array, it also allows array slicing on the returned array. In that case, it returns the array consisting of a subset of elements between i1 and (i2 – 1). For example,

Expression	Result
`"internationalization".splitByLengths(5, 6, 3)[0,3]`	Returns an array of 3 strings: [ "inter", "nation", “ali” .
`"internationalization".splitByLengths(5, 6, 3)[0,2]`	Returns an array of 2 strings: [ "inter", "nation" ]
`"internationalization".splitByLengths(5, 6, 3)[1,3]`	Returns an array of 2 string: [ "nation", “ali” ]
`"internationalization".splitByLengths(5, 6, 3)[1]`	Returns string at position 1: "nation"

smartSplit(s, s or p sep (optional))

Returns the array of strings obtained by splitting s by sep, or by guessing either tab or comma separation if there is no sep given. Handles quotes properly and understands cancelled characters. The separator can be either a string or a regex pattern. For example, value.smartSplit("\n") will split at a carriage return or a new-line character.

Note: value.escape('javascript') is useful for previewing unprintable characters prior to using smartSplit().

splitByCharType(s)

Returns an array of strings obtained by splitting s into groups of consecutive characters each time the characters change Unicode categories. For example, "HenryCTaylor".splitByCharType() will result in an array of [ "H", "enry", "CT", "aylor" ]. It is useful for separating letters and numbers: "BE1A3E".splitByCharType() will result in [ "BE", "1", "A", "3", "E" ].

partition(s, s or p fragment, b omitFragment (optional))

Returns an array of strings [ a, fragment, z ] where a is the substring within s before the first occurrence of fragment, and z is the substring after fragment. Fragment can be a string or a regex. For example, "internationalization".partition("nation") returns 3 strings: [ "inter", "nation", "alization" ]. If s does not contain fragment, it returns an array of [ s, "", "" ] (the original unpartitioned string, and two empty strings).

If the omitFragment boolean is true, for example with "internationalization".partition("nation", true), the fragment is not returned. The output is [ "inter", "alization" ].

As an example of using a regex for the fragment, the expression "abcdefgh".partition(/c.e/) will output ["ab", "cde", "fgh" ].

rpartition(s, s or p fragment, b omitFragment (optional))

Returns an array of strings [ a, fragment, z ] where a is the substring within s before the last occurrence of fragment, and z is the substring after the last instance of fragment. (Rpartition means “reverse partition.”) For example, "parallel".rpartition("a") returns 3 strings: [ "par", "a", "llel" ]. Otherwise works identically to partition() above.

Encoding and hashing

diff(s1, s2, s timeUnit (optional))

Takes two strings and compares them, returning a string. Returns the remainder of s2 starting with the first character where they differ. For example, "cacti".diff("cactus") returns "us". Also works with dates; see Date functions.

escape(s, s mode)

Escapes s in the given escaping mode. The mode can be one of: "html", "xml", "csv", "url", "javascript". Note that quotes are required around your mode. See the recipes for examples of escaping and unescaping.

unescape(s, s mode)

Unescapes s in the given escaping mode. The mode can be one of: "html", "xml", "csv", "url", "javascript". Note that quotes are required around your mode. See the recipes for examples of escaping and unescaping.

encode(s, s encoding)

Encodes the string, s in the specified encoding. The encoding can be one of: "base16", "base32", "base32hex", "base64", "base64url". For example, encode("abc", "base64") returns "YWJj".

This function is available since OpenRefine 3.6.

decode(s, s encoding)

Decodes the string, s in the specified encoding. The encoding can be one of: "base16", "base32", "base32hex", "base64", "base64url". For example, decode("YWJj", "base64") returns "abc".

This function is available since OpenRefine 3.6.

md5(o)

Returns the MD5 hash of an object. If fed something other than a string (array, number, date, etc.), md5() will convert it to a string and deliver the hash of the string. For example, "internationalization".md5() will return 2c55a1626e31b4e373ceedaa9adc12a3.

sha1(o)

Returns the SHA-1 hash of an object. If fed something other than a string (array, number, date, etc.), sha1() will convert it to a string and deliver the hash of the string. For example, "internationalization".sha1() will return cd05286ee0ff8a830dbdc0c24f1cb68b83b0ef36.

phonetic(s, s encoding)

Returns a phonetic encoding of a string, based on an available phonetic algorithm. See the section on phonetic clustering for more information. Can be one of the following supported phonetic methods: metaphone, doublemetaphone, metaphone3, soundex, cologne-phonetic, daitch-mokotoff, beider-morse. Quotes are required around your encoding method. For example, "Ruth Prawer Jhabvala".phonetic("metaphone") outputs the string “R0PRWRJHBFL”.

reinterpret(s, s encoderTarget, s encoderSource)

Returns s reinterpreted through the given character encoders. You must supply one of the supported encodings for each of the original source and the target output. Note that quotes are required around your character encoder.

When an OpenRefine project is started, data is imported and interpreted. A specific character encoding is identified or manually selected at that time (such as UTF-8). You can reinterpret a column into another specificed encoding using this function. This function may not fix your data; it may be better to use this in conjunction with new projects to test the interpretation, and pre-format your data as needed.

fingerprint(s)

Returns the fingerprint of s, a string that is the first step in fingerprint clustering methods: it will trim whitespaces, convert all characters to lowercase, remove punctuation, sort words alphabetically, etc. For example, "Ruth Prawer Jhabvala".fingerprint() outputs the string “jhabvala prawer ruth”.

ngram(s, n)

Returns an array of the word n-grams of s. That is, it lists all the possible consecutive combinations of n words in the string. For example, "Ruth Prawer Jhabvala".ngram(2) would output the array [ "Ruth Prawer", "Prawer Jhabvala" ]. A word n-gram of 1 simply lists all the words in original order; an n-gram larger than the number of words in the string will only return the original string inside an array (e.g. "Ruth Prawer Jhabvala".ngram(4) would simply return ["Ruth Prawer Jhabvala"]).

ngramFingerprint(s, n)

Returns the n-gram fingerprint of s. For example, "banana".ngram(2) would output “anbana”, after first generating the 2-grams “ba an na an na”, removing duplicates, and sorting them alphabetically.

unicode(s)

Returns an array of strings describing each character of s in their full unicode notation. For example, "Bernice Rubens".unicode() outputs [ 66, 101, 114, 110, 105, 99, 101, 32, 82, 117, 98, 101, 110, 115 ].

unicodeType(s)

Returns an array of strings describing each character of s by their unicode type. For example, "Bernice Rubens".unicodeType() outputs [ "uppercase letter", "lowercase letter", "lowercase letter", "lowercase letter", "lowercase letter", "lowercase letter", "lowercase letter", "space separator", "uppercase letter", "lowercase letter", "lowercase letter", "lowercase letter", "lowercase letter", "lowercase letter" ].

Translating

detectLanguage(s)

Returns a string locale describing the language of s, with support for 71 languages as stated here. For example, "Hello, world!".detectLanguage() outputs "en".

This function is available since OpenRefine 3.6.

Format-based functions (JSON, HTML, XML)

jsonize(o)

Quotes a value as a JSON literal value.

parseJson(s)

Parses a string as JSON. get() can then be used with parseJson(): for example, parseJson(" { 'a' : 1 } ").get("a") returns 1.

For example, from the following JSON array in value, we want to get all instances of “keywords” having the same object string name of “text”, and combine them, using the forEach() function to iterate over the array.

    {
       "status":"OK",
       "url":"",
       "language":"english",
       "keywords":[
          {
             "text":"York en route",
             "relevance":"0.974363"
          },
          {
             "text":"Anthony Eden",
             "relevance":"0.814394"
          },
          {
             "text":"President Eisenhower",
             "relevance":"0.700189"
          }
       ]
    }

The GREL expression forEach(value.parseJson().keywords,v,v.text).join(":::") will output “York en route:::Anthony Eden:::President Eisenhower”.

Jsoup XML and HTML parsing

parseHtml(s)

Given a cell full of HTML-formatted text, parseHtml() simplifies HTML tags (such as by removing “ /” at the end of self-closing tags), closes any unclosed tags, and inserts linebreaks and indents for cleaner code. You cannot pass parseHtml() a URL, but you can pre-fetch HTML with the Add column by fetching URLs menu option.

A cell cannot store the output of parseHtml() unless you convert it with toString(): for example, value.parseHtml().toString().

When parseHtml() simplifies HTML, it can sometimes introduce errors. When closing tags, it makes its best guesses based on line breaks, indentation, and the presence of other tags. You may need to manually check the results.

You can then extract or select() which portions of the HTML document you need for further splitting, partitioning, etc. An example of extracting all table rows from a div using parseHtml().select() together is described more in depth at StrippingHTML.

parseXml(s)

Given a cell full of XML-formatted text, parseXml() returns a full XML document and adds any missing closing tags. You can then extract or select() which portions of the XML document you need for further splitting, partitioning, etc. Functions the same way as parseHtml() is described above.

select(s, element)

Returns an array of all the desired elements from an HTML or XML document, if the element exists. Elements are identified using the Jsoup selector syntax. For example, value.parseHtml().select("img.portrait")[0] would return the entirety of the first “img” tag with the “portrait” class found in the parsed HTML inside value. Returns an empty array if no matching element is found. Use with toString() to capture the results in a cell. A tutorial of select() is shown in StrippingHTML.

You can use select() more than once:

value.parseHtml().select("div#content")[0].select("tr").toString()

htmlAttr(s, element)

Returns a string from an attribute on an HTML element. Use it in conjunction with parseHtml() as in the following example: value.parseHtml().select("a.email")[0].htmlAttr("href") would retrieve the email address attached to a link with the “email” class.

xmlAttr(s, element)

Returns a string from an attribute on an XML element. Functions the same way htmlAttr() is described above. Use it in conjunction with parseXml().

htmlText(element)

Returns a string of the text from within an HTML element (including all child elements), removing HTML tags and line breaks inside the string. Use it in conjunction with parseHtml() and select() to provide an element, as in the following example: value.parseHtml().select("div.footer")[0].htmlText().

xmlText(element)

Returns a string of the text from within an XML element (including all child elements). Functions the same way htmlText() is described above. Use it in conjunction with parseXml() and select() to provide an element.

wholeText(element)

Selects the (unencoded) text of an element and its children, including any new lines and spaces, and returns a string of unencoded, un-normalized text. Use it in conjunction with parseHtml() and select() to provide an element as in the following example: value.parseHtml().select("div.footer")[0].wholeText().

This function is available since OpenRefine 3.5.

innerHtml(element)

Returns the inner HTML of an HTML element. This will include text and children elements within the element selected. Use it in conjunction with parseHtml() and select() to provide an element.

innerXml(element)

Returns the inner XML elements of an XML element. Does not return the text directly inside your chosen XML element - only the contents of its children. To select the direct text, use ownText(). To select both, use xmlText(). Use it in conjunction with parseXml() and select() to provide an element.

ownText(element)

Returns the text directly inside the selected XML or HTML element only, ignoring text inside children elements (for this, use innerXml()). Use it in conjunction with a parser and select() to provide an element.

parent(element)

Returns the parent node or null if no parent. Use it in conjunction with parseHtml() and select() to provide an element.

This function is available since OpenRefine 3.6.

URI parsing

parseUri(s)

Given a valid URI string (for example: https://www.openrefine.org:80/documentation#download?format=xml&os=mac), parseUri() returns a JSON object with the following properties:

scheme: The scheme of the URI, e.g. http
host: the host of the URI (e.g. www.openrefine.org)
port: the port of the URI (e.g. 80)
path: the path of the URI (e.g. /documentation)
query: the query of the URI (e.g. format=xml&os=mac)
authority: the authority of the URI (e.g. www.openrefine.org:80)
fragment: the fragment of the URI (e.g. download)
query_params: the query of the URI as an object (e.g. {format: "xml", os: "mac"})

This function is available since OpenRefine 3.6.

Array functions

length(a)

Returns the size of an array, meaning the number of objects inside it. Arrays can be empty, in which case length() will return 0.

slice(a, n from, n to (optional))

Returns a sub-array of a given array, from the first index provided and up to and excluding the optional last index provided. Remember that array objects are indexed starting at 0. If the to value is omitted, it is understood to be the end of the array. For example, [0, 1, 2, 3, 4].slice(1, 3) returns [ 1, 2 ], and [ 0, 1, 2, 3, 4].slice(2) returns [ 2, 3, 4 ]. Also works with strings; see String functions.

get(a, n from, n to (optional))

Returns a sub-array of a given array, from the first index provided and up to and excluding the optional last index provided. Remember that array objects are indexed starting at 0.

If the to value is omitted, only one array item is returned, as a string, instead of a sub-array. To return a sub-array from one index to the end, you can set the to argument to a very high number such as value.get(2,999) or you can use something like with(value,a,a.get(1,a.length())) to count the length of each array.

Also works with strings; see String functions.

inArray(a, s)

Returns true if the array contains the desired string, and false otherwise. Will not convert data types; for example, [ 1, 2, 3, 4 ].inArray("3") will return false.

reverse(a)

Reverses the array. For example, [ 0, 1, 2, 3].reverse() returns the array [ 3, 2, 1, 0 ].

sort(a)

Sorts the array in ascending order. Sorting is case-sensitive, uppercase first and lowercase second. For example, [ "al", "Joe", "Bob", "jim" ].sort() returns the array [ "Bob", "Joe", "al", "jim" ].

sum(a)

Return the sum of the numbers in the array. For example, [ 2, 1, 0, 3 ].sum() returns 6.

join(a, sep)

Joins the items in the array with sep, and returns it all as a string. For example, [ "and", "or", "not" ].join("/") returns the string “and/or/not”.

uniques(a)

Returns the array with duplicates removed. Case-sensitive. For example, [ "al", "Joe", "Bob", "Joe", "Al", "Bob" ].uniques() returns the array [ "Joe", "al", "Al", "Bob" ].

As of OpenRefine 3.4.1, uniques() reorders the array items it returns; in 3.4 beta 644 and onwards, it preserves the original order (in this case, [ "al", "Joe", "Bob", "Al" ]).

zip(a, b, …)

Combines multiple arrays into one array of arrays, where each sub-array contains elements at the same index from the input arrays. Truncates to the shortest input array. For example, [1, 2, 3].zip(['A', 'B', 'C']) returns [ [1, 'A'], [2, 'B'], [3, 'C'] ].

Date functions

now()

Returns the date object according to your system clock. For example, now() returns [date 2020-11-26T15:53:00Z] and now.toString() returns "Nov 26, 2020".

toDate(o, b monthFirst, s format1, s format2, ...)

Returns the inputted object converted to a date object. Without arguments, it returns the ISO 8601 extended format. With arguments, you can control the output format:

monthFirst: set false if the date is formatted with the day before the month.
formatN: attempt to parse the date using an ordered list of possible formats. Supply formats based on the SimpleDateFormat syntax (and see the table below for a handy reference).

For example, you can parse a column containing dates in different formats, such as cells with “Nov-09” and “11/09”, using value.toDate('MM/yy','MMM-yy').toString('yyyy-MM') and both will output “2009-11”. For another example, “1/4/2012 13:30:00” can be parsed into a date using value.toDate('d/M/y H:m:s'). If parsing a date with text components in a language other than your system language you can specify a language code as the format1 argument. For example, a French language date such as "10 janvier 2023" could be parsed with value.toDate('fr','dd MMM yyyy').

Letter	Date or Time Component	Presentation	Examples
G	Era designator	Text	AD
u	Year	Year	1996; 96
y	year-of-era	year	1996; 96
M/L	Month in year	Number/text	7; 07; Jul; July; J
Q/q	quarter-of-year	number/text	3; 03; Q3; 3rd quarter
Y	week-based-year	year	1996;96
w	Week in year	Number	27
W	Week in month	Number	2
D	Day in year	Number	189
d	Day in month	Number	10
F	Day of week in month	Number	2
E	Day name in week	Text	Tuesday; Tue
e/c	localized day-of-week	number/text	2; 02; tue; tuesday; T
a	AM/PM marker	Text	PM
H	Hour in day (0-23)	Number	0
k	Hour in day (1-24)	Number	24
K	Hour in AM/PM (0-11)	Number	0
h	Hour in AM/PM (1-12)	Number	12
m	Minute in hour	Number	30
s	Second in minute	Number	55
S	Millisecond	Number	978
n	Nanosecond	Number	789000
z	Time zone	General time zone	Pacific Standard Time; PST; GMT-08:00
Z	Time zone	RFC 822 time zone	-0800
X	Time zone	ISO 8601 time zone	-08; -0800; -08:00

diff(d1, d2, s timeUnit)

Given two dates, returns a number indicating the difference in a given time unit (see the table below). For example, diff(("Nov-11".toDate('MMM-yy')), ("Nov-09".toDate('MMM-yy')), "weeks") will return 104, for 104 weeks, or two years. The later date should go first. If the output is negative, invert d1 and d2.

Also works with strings; see diff() in string functions.

inc(d, n, s timeUnit)

Returns a date changed by the given amount in the given unit of time (see the table below). The default unit is “hour”. A positive value increases the date, and a negative value moves it back in time. For example, if you want to move a date backwards by two months, use value.inc(-2,"month").

datePart(d, s timeUnit)

Returns part of a date. The data type returned depends on the unit (see the table below).

OpenRefine supports the following values for timeUnit:

Unit	Date part returned	Returned data type	Example using [date 2014-03-14T05:30:04.000789000Z] as value
years, year	Year	Number	value.datePart("years") → 2014
months, month	Month	Number	value.datePart("months") → 2
weeks, week, w	Week of the month	Number	value.datePart("weeks") → 3
days, day, d	Day of the month	Number	value.datePart("days") → 14
weekday	Day of the week	String	value.datePart("weekday") → Friday
hours, hour, h	Hour	Number	value.datePart("hours") → 5
minutes, minute, min	Minute	Number	value.datePart("minutes") → 30
seconds, sec, s	Seconds	Number	value.datePart("seconds") → 04
milliseconds, ms, S	Millseconds	Number	value.datePart("milliseconds") → 789
nanos, nano, n	Nanoseconds	Number	value.datePart("n") → 789000
time	Milliseconds between input and the Unix Epoch	Number	value.datePart("time") → 1394775004000

timeSinceUnixEpochToDate(duration, scale)

Converts a time as measured by the duration since the Unix Epoch (1970-01-01) to a date object. The second parameter indicates the unit of the duration, and can be "second", "millisecond" or "microsecond". If the unit is not provided, it is assumed to be "second".

This function is available since OpenRefine 3.6.

Math functions

For integer division and precision, you can use simple evaluations such as 1 / 2, which is equivalent to floor(1/2) - that is, it returns only whole number results. If either operand is a floating point number, they both get promoted to floating point and a floating point result is returned. You can use 1 / 2.0 or 1.0 / 2 or 1.0 * x / y (if you're working with variables of unknown contents).

caution

Some of these math functions don't recognize integers when supplied as the first argument in dot notation (e.g., 5.cos() simply returns 5 instead of the expected result). To ensure operations are successful, always wrap the first argument in brackets, such as (value).cos().

Function	Use	Example
`abs(n)`	Returns the absolute value of a number.	`abs(-6)` returns 6.
`acos(n)`	Returns the arc cosine of an angle, in the range 0 through PI.	`acos(0.345)` returns 1.218557541697832.
`asin(n)`	Returns the arc sine of an angle in the range of -PI/2 through PI/2.	`asin(0.345)` returns 0.35223878509706474.
`atan(n)`	Returns the arc tangent of an angle in the range of -PI/2 through PI/2.	`atan(0.345)` returns 0.3322135507465967.
`atan2(n1, n2)`	Converts rectangular coordinates (n1, n2) to polar (r, theta). Returns number theta.	`atan2(0.345,0.6)` returns 0.5218342798144103.
`ceil(n)`	Returns the ceiling of a number.	`3.7.ceil()` returns 4 and `-3.7.ceil()` returns -3.
`combin(n1, n2)`	Returns the number of combinations for n2 elements as divided into n1.	`combin(20,2)` returns 190.
`cos(n)`	Returns the trigonometric cosine of a value.	`cos(5)` returns 0.28366218546322625.
`cosh(n)`	Returns the hyperbolic cosine of a value.	`cosh(5)` returns 74.20994852478785.
`degrees(n)`	Converts an angle from radians to degrees.	`degrees(5)` returns 286.4788975654116.
`even(n)`	Rounds the number up to the nearest even integer.	`even(5)` returns 6.
`exp(n)`	Returns e raised to the power of n.	`exp(5)` returns 148.4131591025766.
`fact(n)`	Returns the factorial of a number, starting from 1.	`fact(5)` returns 120.
`factn(n1, n2)`	Returns the factorial of n1, starting from n2.	`factn(10,3)` returns 280.
`floor(n)`	Returns the floor of a number.	`3.7.floor()` returns 3 and `-3.7.floor()` returns -4.
`gcd(n1, n2)`	Returns the greatest common denominator of two numbers.	`gcd(95,135)` returns 5.
`lcm(n1, n2)`	Returns the least common multiple of two numbers.	`lcm(95,135)` returns 2565.
`ln(n)`	Returns the natural logarithm of n.	`ln(5)` returns 1.6094379124341003.
`log(n)`	Returns the base 10 logarithm of n.	`log(5)` returns 0.6989700043360189.
`max(n1, n2)`	Returns the larger of two numbers.	`max(3,10)` returns 10.
`min(n1, n2)`	Returns the smaller of two numbers.	`min(3,10)` returns 3.
`mod(n1, n2)`	Returns n1 modulus n2. Note: `value.mod(9)` will work, whereas `74.mod(9)` will not work.	`mod(74, 9)` returns 2.
`multinomial(n1, n2 …(optional))`	Calculates the multinomial of one number or a series of numbers.	`multinomial(2,3)` returns 10.
`odd(n)`	Rounds the number up to the nearest odd integer.	`odd(10)` returns 11.
`pow(n1, n2)`	Returns n1 raised to the power of n2. Note: value.pow(3)`will work, whereas`2.pow(3)` will not work.	`pow(2, 3)` returns 8 (2 cubed) and `pow(3, 2)` returns 9 (3 squared). The square root of any numeric value can be called with `value.pow(0.5)`.
`quotient(n1, n2)`	Returns the integer portion of a division (truncated, not rounded), when supplied with a numerator and denominator.	`quotient(9,2)` returns 4.
`radians(n)`	Converts an angle in degrees to radians.	`radians(10)` returns 0.17453292519943295.
`random(n lowerBound, n upperBound)`	Returns a random integer in the interval between the lower and upper bounds (inclusively). Will output a different random number in each cell in a column. If no arguments are provided, returns a number in the range `0.0 <= x < 1.0`
`round(n)`	Rounds a number to the nearest integer.	`3.7.round()` returns 4 and `-3.7.round()` returns -4.
`sin(n)`	Returns the trigonometric sine of an angle.	`sin(10)` returns -0.5440211108893698.
`sinh(n)`	Returns the hyperbolic sine of an angle.	`sinh(10)` returns 11013.232874703393.
`sum(a)`	Sums the numbers in an array. Ignores non-number items. Returns 0 if the array does not contain numbers.	`sum([ 10, 2, three ])` returns 12.
`tan(n)`	Returns the trigonometric tangent of an angle.	`tan(10)` returns 0.6483608274590866.
`tanh(n)`	Returns the hyperbolic tangent of a value.	`tanh(10)` returns 0.9999999958776927.

Other functions

type(o)

Returns a string with the data type of o, such as undefined, string, number, boolean, etc. For example, a Transform operation using value.type() will convert all cells in a column to strings of their data types.

facetCount(choiceValue, s facetExpression, s columnName)

Returns the facet count corresponding to the given choice value, by looking for the facetExpression in the choiceValue in columnName. For example, to create facet counts for the following table, we could generate a new column based on “Gift” and enter in value.facetCount("value", "Gift"). This would add the column we've named “Count”:

Gift	Recipient	Price	Count
lamp	Mary	20	1
clock	John	57	2
watch	Amit	80	1
clock	Claire	62	2

The facet expression, wrapped in quotes, can be useful to manipulate the inputted values before counting. For example, you could do a textual cleanup using fingerprint(): (value.fingerprint()).facetCount(value.fingerprint(),"Gift").

hasField(o, s name)

Returns a boolean indicating whether o has a member field called name. For example, cell.recon.hasField("match") will return false if a reconciliation match hasn’t been selected yet, or true if it has. You cannot chain your desired fields: for example, cell.hasField("recon.match") will return false even if the above expression returns true).

coalesce(o1, o2, o3, ...)

Returns the first non-null from a series of objects. For example, coalesce(value, "") would return an empty string “” if value was null, but otherwise return value.

cross(cell, s projectName (optional), s columnName (optional))

Returns an array of zero or more rows in the project projectName for which the cells in their column columnName have the same content as the cell in your chosen column. For example, if two projects contained matching names, and you wanted to pull addresses for people by their names from a project called “People” you would apply the following expression to your column of names:

cell.cross("People","Name")[0].cells["Address"].value

This would match your current column to the “Name” column in “People” and, using those matches, pull the respective “Address” value into your current project.

You may need to do some data preparation with cross(), such as using trim() on your key columns or deduplicating values.

The first argument will be interpreted as cell.value if set to cell. If you omit projectName and columnName, they will default to the current project and index column (number 0).

Recipes and more examples for using cross() can be found on our wiki.

Reading this reference​

Boolean functions​

and(b1, b2, ...)​

or(b1, b2, ...)​

not(b)​

xor(b1, b2, ...)​

String functions​

length(s)​

levenshteinDistance(s1, s2)​

toString(o, string format (optional))​

Testing string characteristics​

startsWith(s, sub)​

endsWith(s, sub)​

contains(s, sub or p)​

Basic string modification​

Case conversion​

toLowercase(s)​

toUppercase(s)​

toTitlecase(s, delimiters (optional))​

Trimming​

trim(s)​

strip(s)​

chomp(s, sep)​

Substring​

substring(s, n from, n to (optional))​

slice(s, n from, n to (optional))​

get(s, n from, n to (optional))​

Find and replace​

indexOf(s, sub)​

lastIndexOf(s, sub)​

replace(s, s or p find, s replace)​

replaceChars(s, s find, s replace)​

replaceEach(s, a find, a replace)​

find(s, sub or p)​

match(s, p)​

String parsing and splitting​

toNumber(s)​

split(s, s or p sep, b preserveTokens (optional))​

splitByLengths(s, n1, n2, ...)​

smartSplit(s, s or p sep (optional))​

splitByCharType(s)​

partition(s, s or p fragment, b omitFragment (optional))​

rpartition(s, s or p fragment, b omitFragment (optional))​

Encoding and hashing​

diff(s1, s2, s timeUnit (optional))​

escape(s, s mode)​

unescape(s, s mode)​

encode(s, s encoding)​

decode(s, s encoding)​

md5(o)​

sha1(o)​

phonetic(s, s encoding)​

reinterpret(s, s encoderTarget, s encoderSource)​

fingerprint(s)​

ngram(s, n)​

ngramFingerprint(s, n)​

unicode(s)​

unicodeType(s)​

Translating​

detectLanguage(s)​

Format-based functions (JSON, HTML, XML)​

jsonize(o)​

parseJson(s)​

Jsoup XML and HTML parsing​

parseHtml(s)​

parseXml(s)​

select(s, element)​

htmlAttr(s, element)​

xmlAttr(s, element)​

htmlText(element)​

xmlText(element)​

wholeText(element)​

innerHtml(element)​

innerXml(element)​

ownText(element)​

parent(element)​

URI parsing​

parseUri(s)​

Array functions​

length(a)​

Reading this reference

Boolean functions

and(b1, b2, ...)

or(b1, b2, ...)

not(b)

xor(b1, b2, ...)

String functions

length(s)

levenshteinDistance(s1, s2)

toString(o, string format (optional))

Testing string characteristics

startsWith(s, sub)

endsWith(s, sub)

contains(s, sub or p)

Basic string modification

Case conversion

toLowercase(s)

toUppercase(s)

toTitlecase(s, delimiters (optional))

Trimming

trim(s)

strip(s)

chomp(s, sep)

Substring

substring(s, n from, n to (optional))

slice(s, n from, n to (optional))

get(s, n from, n to (optional))

Find and replace

indexOf(s, sub)

lastIndexOf(s, sub)

replace(s, s or p find, s replace)

replaceChars(s, s find, s replace)

replaceEach(s, a find, a replace)

find(s, sub or p)

match(s, p)

String parsing and splitting

toNumber(s)

split(s, s or p sep, b preserveTokens (optional))

splitByLengths(s, n1, n2, ...)

smartSplit(s, s or p sep (optional))

splitByCharType(s)

partition(s, s or p fragment, b omitFragment (optional))

rpartition(s, s or p fragment, b omitFragment (optional))

Encoding and hashing

diff(s1, s2, s timeUnit (optional))

escape(s, s mode)

unescape(s, s mode)

encode(s, s encoding)

decode(s, s encoding)

md5(o)

sha1(o)

phonetic(s, s encoding)

reinterpret(s, s encoderTarget, s encoderSource)

fingerprint(s)

ngram(s, n)

ngramFingerprint(s, n)

unicode(s)

unicodeType(s)

Translating

detectLanguage(s)

Format-based functions (JSON, HTML, XML)

jsonize(o)

parseJson(s)

Jsoup XML and HTML parsing

parseHtml(s)

parseXml(s)

select(s, element)

htmlAttr(s, element)

xmlAttr(s, element)

htmlText(element)

xmlText(element)

wholeText(element)

innerHtml(element)

innerXml(element)

ownText(element)

parent(element)

URI parsing

parseUri(s)

Array functions

length(a)