Skip to main content
eLearner.app
Module 5 · Lesson 4 of 420/32 in the course~12 min
Module lessons (4/4)

Unicode property escapes

The classes \w, \d, \s in ASCII are not enough for Italian, French, Greek or emoji text. Modern JavaScript (with the u flag -- Unicode) offers property escapes \p{...}: semantic classes based on the Unicode properties of characters.

Code
Pattern: \p{L}+        (with flag u)
Sample:  Ciao caffe' \u00fcber \u4e16\u754c
         ^^^^ ^^^^^^^ ^^^^ ^^^^

\p{L} = "any Letter (Unicode)": includes accented letters, Chinese ideograms, Cyrillic, Greek\u2026 everything. The most common ones:

  • \p{L} -- letter (of any alphabet).
  • \p{N} -- number (Arabic digits, Roman, Indian\u2026).
  • \p{P} -- punctuation.
  • \p{S} -- symbol (mathematical, currency, emoji\u2026).
  • \p{Z} -- space/separator.
  • \p{Script=Latin} -- specifically the Latin alphabet.
  • \p{Script=Greek} -- the Greek alphabet. And so on.

And the negated versions \P{L}, \P{N}, \u2026

Difference with \w and \d

Code
\\w matches [A-Za-z0-9_]              -- ASCII only, no "caffe'"
\\p{L}\\p{N}_  with flag u            -- includes accented characters

For a robust parser of Italian text, prefer \p{L} over \w: citta', perche', andro' correctly match as words.

Unicode properties and browser compatibility

Unicode properties like \\p{L} (Letters) or \\p{Script=Latin} extend classes to international alphabets. In JavaScript, they strictly require the u (or v) flag, otherwise the engine throws a syntax error.

Try it

Exercise#regex.m5.l4.e1
Attempts: 0Loading…

Find every word, including those with accents (citta', perche', e' \u2026). Use the property escape \\p{L} with flag u.

Loading editor…
Show hint

Replace \\w+ with \\p{L}+ and add the u flag (in addition to g).

Solution available after 3 attempts

Review exercise

Exercise#regex.m5.l4.e2
Attempts: 0Loading…

Find every Unicode symbol (currencies, math, emoji) in the text, excluding letters and digits.

Loading editor…
Show hint

\\p{S} matches the Symbol category of Unicode. Remember the u flag.

Solution available after 3 attempts

Additional challenge

Exercise#regex.m5.l4.e3
Attempts: 0Loading…

Find all words consisting of Cyrillic alphabet letters using `p{Script=Cyrillic}`.

Loading editor…
Show hint

Use \p{Script=Cyrillic} with the + quantifier and the u flag.

Solution available after 3 attempts