Bài học theo mô-đun (4/4)
mô-đun lại: biểu thức chính quy trong Python
The re module exposes regular expressions in Python. If you already took
the Regex course on this site, you will find all the syntax familiar; here
we look at the APIs.
Raw string: r"..."
Regexes use many backslashes (\d, \b, \s). Always write them as
raw strings so you don't have to double them up:
import re
# raw: \d is a single token
pattern = r"\d+"
# NOT raw: you would have to write "\\d+" otherwise Python interprets \d as "d"re.search: find the first occurrence
import re
m = re.search(r"\d+", "ho 42 mele e 17 pere")
m.group() # '42'
m.start() # 3
m.end() # 5Returns a Match object if found, None otherwise. Idiomatic pattern:
if m := re.search(r"\d+", testo):
print(m.group())re.match: only at the start of the string
re.match(r"\d+", "42 anni") # match!
re.match(r"\d+", "ho 42 anni") # None (doesn't start with a digit)To search anywhere prefer re.search.
re.findall: all occurrences
re.findall(r"\d+", "ho 42 mele, 17 pere, 3 banane")
# ['42', '17', '3']If the pattern has capture groups, findall returns the groups, not the
whole match:
re.findall(r"(\w+)@(\w+)", "ada@x bob@y")
# [('ada', 'x'), ('bob', 'y')]re.sub: substitution
re.sub(r"\d+", "###", "ho 42 mele e 17 pere")
# 'ho ### mele e ### pere'It also works with a replacement function (which receives the Match):
re.sub(r"\d+", lambda m: str(int(m.group()) * 2), "ho 42 mele")
# 'ho 84 mele'Named groups
m = re.search(r"(?P<anno>\d{4})-(?P<mese>\d{2})-(?P<giorno>\d{2})", "2025-01-15")
m.group("anno") # '2025'
m.groupdict() # {'anno': '2025', 'mese': '01', 'giorno': '15'}Useful flags
re.search(r"ciao", "CIAO", re.IGNORECASE) # case-insensitive
re.findall(r"^.+$", testo, re.MULTILINE) # ^/$ match every line
re.search(r"a.b", "a\nb", re.DOTALL) # . matches newline toore.finditer for memory safety
If you need to process matches in a huge text block, re.findall allocates a list in memory containing all extracted match strings. It is preferred to use re.finditer(), which yields a lazy iterator of Match objects, consuming minimal memory.
Try it
Given `text = 'ordine #123, totale €45.50, ricevuta #456'`, extract ALL the order numbers (the digits after #) into `orders` as a list of strings. Evaluate `orders`.
Hiển thị gợi ý
re.findall(r"#(\d+)", text) — capture group for digits only.
Giải pháp khả dụng sau 3 lần thử
Review exercise
Given `s = 'tel: 06-12345678 e tel: 02-9876'`, replace all phone numbers (format \\d+-\\d+) with the string 'XXX' and assign to `masked`. Evaluate `masked`.
Hiển thị gợi ý
re.sub(r"\d+-\d+", "XXX", s)
Giải pháp khả dụng sau 3 lần thử
Additional challenge
Import the `re` module. Extract all numbers consisting of one or more digits from the string `log_line = "Error 404 in 15ms"`. Store the list in `numbers` and evaluate it.
Hiển thị gợi ý
Use re.findall(r'\d+', log_line) to extract all numbers as a list of strings.
Giải pháp khả dụng sau 3 lần thử