Grails Cookbook - A collection of tutorials and examples

Groovy String Tokenize Examples

The String class in Groovy has the tokenize() method as a convenient way to tokenize a String or character sequence and return the result as List of String. The given String or character sequence is delimited by whitespace as a default, or can be used passing delimiter as a paramater. Below are some examples on how to use Groovy String's tokenize() method.

Groovy String Tokenize()

When the Tokenize() method is invoked with no parameter,t is assumed that the String will be broken using whitespace as delimiter. Example:
def sampleText = "The quick brown fox"
println sampleText.tokenize()

The sample above will break the String using whitespace to separate items. Hence we get four items. This will print:

[The, quick, brown, fox]

Space, tabs and carriage return are whitespaces. The code below will yield the same output because \t or tab, and \n or next line are valid whitespaces:

def sampleText = "The quick\tbrown\nfox"
println sampleText.tokenize()

Extra white space will not affect the result.
def sampleText = "The quick \n\n\n brown         fox"
println sampleText.tokenize()

Will have the same 4 items:

[The, quick, brown, fox]

Groovy String Tokenize() Result Type

The result of the Groovy String's tokenize method is of type java.util.List. Hence the code:
def sampleText = "The quick brown fox"
println sampleText.tokenize() instanceof String[]
println sampleText.tokenize() instanceof List
Will return false for the first line because the result is not an array of String, and true for the second because it is actually an instance of a List:

Groovy Split() with Delimiter

If we don't want to use whitespace, we can pass the delimiter to use to tokenize the String. For example:
def sampleText = "no-one-leaves"
println sampleText.tokenize("-")

The code will print:

[no, one, leaves]

It will only use the given delimiter and treat whitespace as part of the resulting items, example:

def sampleText = "a no-a one-a leaves"
println sampleText.tokenize("-")

Will not use space as delimiter, but the dash only. Hence, we get items:

[a no, a one, a leaves]

If the delimiter is more than one character, it treats each character as a delimiter. For example:

def sampleText = "pen pineapple apple pen"
println sampleText.tokenize(" p")

By passing space and p, the String is tokenized either by space or the letter p. It will not look for exactly space followed by a p, but the occurance of either letter will delimit the String. Hence the output will be:
[en, inea, le, a, le, en]

Groovy Split() and Regular Expression

Groovy String's tokenize() method does not work with regular expression. Hence the code:

Split by a single digit

def sampleText = "A1B23C456D"
println sampleText.tokenize(/\d/)
The regular expression means any digit, but the tokenize will not separate the String given any digit. The result will be:
Because regular expression does not work with tokenize(). You may want to look at Groovy String's split() method instead if you want to use regular expressions.