Module lessons (4/4)
re module: regex in Python
The re module exposes regular expressions in Python. If you already took
the Regex course on this site, you will find all the syntax familiar; here
we look at the APIs.
Raw string: r"..."
Regexes use many backslashes (\d, \b, \s). Always write them as
raw strings so you don't have to double them up:
import re
# raw: \d is a single token
pattern = r"\d+"
# NOT raw: you would have to write "\\d+" otherwise Python interprets \d as "d"re.search: find the first occurrence
import re
m = re.search(r"\d+", "ho 42 mele e 17 pere")
m.group() # '42'
m.start() # 3
m.end() # 5Returns a Match object if found, None otherwise. Idiomatic pattern:
if m := re.search(r"\d+", testo):
print(m.group())re.match: only at the start of the string
re.match(r"\d+", "42 anni") # match!
re.match(r"\d+", "ho 42 anni") # None (doesn't start with a digit)To search anywhere prefer re.search.
re.findall: all occurrences
re.findall(r"\d+", "ho 42 mele, 17 pere, 3 banane")
# ['42', '17', '3']If the pattern has capture groups, findall returns the groups, not the
whole match:
re.findall(r"(\w+)@(\w+)", "ada@x bob@y")
# [('ada', 'x'), ('bob', 'y')]re.sub: substitution
re.sub(r"\d+", "###", "ho 42 mele e 17 pere")
# 'ho ### mele e ### pere'It also works with a replacement function (which receives the Match):
re.sub(r"\d+", lambda m: str(int(m.group()) * 2), "ho 42 mele")
# 'ho 84 mele'Named groups
m = re.search(r"(?P<anno>\d{4})-(?P<mese>\d{2})-(?P<giorno>\d{2})", "2025-01-15")
m.group("anno") # '2025'
m.groupdict() # {'anno': '2025', 'mese': '01', 'giorno': '15'}Useful flags
re.search(r"ciao", "CIAO", re.IGNORECASE) # case-insensitive
re.findall(r"^.+$", testo, re.MULTILINE) # ^/$ match every line
re.search(r"a.b", "a\nb", re.DOTALL) # . matches newline toore.finditer for memory safety
If you need to process matches in a huge text block, re.findall allocates a list in memory containing all extracted match strings. It is preferred to use re.finditer(), which yields a lazy iterator of Match objects, consuming minimal memory.
Try it
Given `text = 'ordine #123, totale €45.50, ricevuta #456'`, extract ALL the order numbers (the digits after #) into `orders` as a list of strings. Evaluate `orders`.
Show hint
re.findall(r"#(\d+)", text) — capture group for digits only.
Solution available after 3 attempts
Review exercise
Given `s = 'tel: 06-12345678 e tel: 02-9876'`, replace all phone numbers (format \\d+-\\d+) with the string 'XXX' and assign to `masked`. Evaluate `masked`.
Show hint
re.sub(r"\d+-\d+", "XXX", s)
Solution available after 3 attempts
Additional challenge
Import the `re` module. Extract all numbers consisting of one or more digits from the string `log_line = "Error 404 in 15ms"`. Store the list in `numbers` and evaluate it.
Show hint
Use re.findall(r'\d+', log_line) to extract all numbers as a list of strings.
Solution available after 3 attempts