Module lessons (4/4)
Split with regex
String.prototype.split(separator) accepts not only a fixed string but
also a regex as separator. This makes it a powerful tool to tokenize
structured text.
'uno, due,tre quattro'.split(/[,\s]+/);
// ["uno", "due", "tre", "quattro"]The regex /[,\s]+/ matches "one or more of comma or whitespace": split
splits on any compound delimiter.
Typical cases
- Permissive CSV:
text.split(/\s*,\s*/)to handle spaces around commas. - Naive tokenizer:
text.split(/\s+/)to extract words. - Keep the separator: if the regex contains capturing groups, the content of the groups is included in the result array.
'a=1; b=2; c=3'.split(/(;)\s*/);
// ["a=1", ";", "b=2", ";", "c=3"]Without the () group the semicolon would disappear. With (;) you keep it
in the result.
Preserving separators in split operations
If you place split separators inside capturing parentheses, the output of String.prototype.split will include the separators themselves as elements in the final array, instead of discarding them.
Try it
Find every permissive CSV separator: a comma with optional spaces around it. This way you could use it in split to tokenize the list.
Show hint
Use \\s* before and after the comma to absorb any optional spaces.
Solution available after 3 attempts
Review exercise
Find every 'spaces or semicolons' separator (one or more). This way split would tokenize the text into words.
Show hint
Combine \\s and ; in a class [\\s;] with the + quantifier.
Solution available after 3 attempts
Additional challenge
Write a regex to use in a split that separates numbers while keeping math operators `+`, `-`, `*`, `/` as elements of the array.
Show hint
Enclose the math operators character class in capture parentheses to preserve them in the split array.
Solution available after 3 attempts