Mô-đun 8 · Bài học 4 trong tổng số 432/36 trong khóa học~12 min

Bài học theo mô-đun (4/4)

mô-đun lại: biểu thức chính quy trong Python

The re module exposes regular expressions in Python. If you already took the Regex course on this site, you will find all the syntax familiar; here we look at the APIs.

Raw string: `r"..."`

Regexes use many backslashes (\d, \b, \s). Always write them as raw strings so you don't have to double them up:

Python

import re
# raw: \d is a single token
pattern = r"\d+"
# NOT raw: you would have to write "\\d+" otherwise Python interprets \d as "d"

`re.search`: find the first occurrence

Python

import re
m = re.search(r"\d+", "ho 42 mele e 17 pere")
m.group()    # '42'
m.start()    # 3
m.end()      # 5

Returns a Match object if found, None otherwise. Idiomatic pattern:

Python

if m := re.search(r"\d+", testo):
    print(m.group())

`re.match`: only at the start of the string

Python

re.match(r"\d+", "42 anni")     # match!
re.match(r"\d+", "ho 42 anni")  # None (doesn't start with a digit)

To search anywhere prefer re.search.

`re.findall`: all occurrences

Python

re.findall(r"\d+", "ho 42 mele, 17 pere, 3 banane")
# ['42', '17', '3']

If the pattern has capture groups, findall returns the groups, not the whole match:

Python

re.findall(r"(\w+)@(\w+)", "ada@x bob@y")
# [('ada', 'x'), ('bob', 'y')]

`re.sub`: substitution

Python

re.sub(r"\d+", "###", "ho 42 mele e 17 pere")
# 'ho ### mele e ### pere'

It also works with a replacement function (which receives the Match):

Python

re.sub(r"\d+", lambda m: str(int(m.group()) * 2), "ho 42 mele")
# 'ho 84 mele'

Named groups

Python

m = re.search(r"(?P<anno>\d{4})-(?P<mese>\d{2})-(?P<giorno>\d{2})", "2025-01-15")
m.group("anno")    # '2025'
m.groupdict()      # {'anno': '2025', 'mese': '01', 'giorno': '15'}

Useful flags

Python

re.search(r"ciao", "CIAO", re.IGNORECASE)       # case-insensitive
re.findall(r"^.+$", testo, re.MULTILINE)        # ^/$ match every line
re.search(r"a.b", "a\nb", re.DOTALL)            # . matches newline too

re.finditer for memory safety

If you need to process matches in a huge text block, re.findall allocates a list in memory containing all extracted match strings. It is preferred to use re.finditer(), which yields a lazy iterator of Match objects, consuming minimal memory.

Try it

tập thể dục#python.m8.l4.e1

Nỗ lực: 0Đang tải…

Given `text = 'ordine #123, totale €45.50, ricevuta #456'`, extract ALL the order numbers (the digits after #) into `orders` as a list of strings. Evaluate `orders`.

Đang tải trình chỉnh sửa…

Hiển thị gợi ý

re.findall(r"#(\d+)", text) — capture group for digits only.

Giải pháp khả dụng sau 3 lần thử

Review exercise

tập thể dục#python.m8.l4.e2

Nỗ lực: 0Đang tải…

Given `s = 'tel: 06-12345678 e tel: 02-9876'`, replace all phone numbers (format \\d+-\\d+) with the string 'XXX' and assign to `masked`. Evaluate `masked`.

Đang tải trình chỉnh sửa…

Hiển thị gợi ý

re.sub(r"\d+-\d+", "XXX", s)

Giải pháp khả dụng sau 3 lần thử

Additional challenge

tập thể dục#python.m8.l4.e3

Nỗ lực: 0Đang tải…

Import the `re` module. Extract all numbers consisting of one or more digits from the string `log_line = "Error 404 in 15ms"`. Store the list in `numbers` and evaluate it.

Đang tải trình chỉnh sửa…

Hiển thị gợi ý

Use re.findall(r'\d+', log_line) to extract all numbers as a list of strings.

Giải pháp khả dụng sau 3 lần thử

mô-đun lại: biểu thức chính quy trong Python

Raw string: r"..."

re.search: find the first occurrence

re.match: only at the start of the string

re.findall: all occurrences

re.sub: substitution

Named groups

Useful flags

re.finditer for memory safety

Try it

Review exercise

Additional challenge

Raw string: `r"..."`

`re.search`: find the first occurrence

`re.match`: only at the start of the string

`re.findall`: all occurrences

`re.sub`: substitution