跳转到主要内容
eLearner.app
模块 5 · 第 4 课(共 4)课程中的20/32~12 min
模块课程(4/4)

Unicode 属性转义

The classes \w, \d, \s in ASCII are not enough for Italian, French, Greek or emoji text. Modern JavaScript (with the u flag -- Unicode) offers property escapes \p{...}: semantic classes based on the Unicode properties of characters.

Code
Pattern: \p{L}+        (with flag u)
Sample:  Ciao caffe' \u00fcber \u4e16\u754c
         ^^^^ ^^^^^^^ ^^^^ ^^^^

\p{L} = "any Letter (Unicode)": includes accented letters, Chinese ideograms, Cyrillic, Greek\u2026 everything. The most common ones:

  • \p{L} -- letter (of any alphabet).
  • \p{N} -- number (Arabic digits, Roman, Indian\u2026).
  • \p{P} -- punctuation.
  • \p{S} -- symbol (mathematical, currency, emoji\u2026).
  • \p{Z} -- space/separator.
  • \p{Script=Latin} -- specifically the Latin alphabet.
  • \p{Script=Greek} -- the Greek alphabet. And so on.

And the negated versions \P{L}, \P{N}, \u2026

Difference with \w and \d

Code
\\w matches [A-Za-z0-9_]              -- ASCII only, no "caffe'"
\\p{L}\\p{N}_  with flag u            -- includes accented characters

For a robust parser of Italian text, prefer \p{L} over \w: citta', perche', andro' correctly match as words.

Unicode properties and browser compatibility

Unicode properties like \\p{L} (Letters) or \\p{Script=Latin} extend classes to international alphabets. In JavaScript, they strictly require the u (or v) flag, otherwise the engine throws a syntax error.

Try it

锻炼#regex.m5.l4.e1
尝试:0加载中...

Find every word, including those with accents (citta', perche', e' \u2026). Use the property escape \\p{L} with flag u.

正在加载编辑器...
显示提示

Replace \\w+ with \\p{L}+ and add the u flag (in addition to g).

3 次尝试后可用的解决方案

Review exercise

锻炼#regex.m5.l4.e2
尝试:0加载中...

Find every Unicode symbol (currencies, math, emoji) in the text, excluding letters and digits.

正在加载编辑器...
显示提示

\\p{S} matches the Symbol category of Unicode. Remember the u flag.

3 次尝试后可用的解决方案

Additional challenge

锻炼#regex.m5.l4.e3
尝试:0加载中...

Find all words consisting of Cyrillic alphabet letters using `p{Script=Cyrillic}`.

正在加载编辑器...
显示提示

Use \p{Script=Cyrillic} with the + quantifier and the u flag.

3 次尝试后可用的解决方案