Skip to main content
eLearner.app
Module 8 · Lesson 4 of 432/36 in the course~12 min
Module lessons (4/4)

re module: regex in Python

The re module exposes regular expressions in Python. If you already took the Regex course on this site, you will find all the syntax familiar; here we look at the APIs.

Raw string: r"..."

Regexes use many backslashes (\d, \b, \s). Always write them as raw strings so you don't have to double them up:

Python
import re
# raw: \d is a single token
pattern = r"\d+"
# NOT raw: you would have to write "\\d+" otherwise Python interprets \d as "d"

re.search: find the first occurrence

Python
import re
m = re.search(r"\d+", "ho 42 mele e 17 pere")
m.group()    # '42'
m.start()    # 3
m.end()      # 5

Returns a Match object if found, None otherwise. Idiomatic pattern:

Python
if m := re.search(r"\d+", testo):
    print(m.group())

re.match: only at the start of the string

Python
re.match(r"\d+", "42 anni")     # match!
re.match(r"\d+", "ho 42 anni")  # None (doesn't start with a digit)

To search anywhere prefer re.search.

re.findall: all occurrences

Python
re.findall(r"\d+", "ho 42 mele, 17 pere, 3 banane")
# ['42', '17', '3']

If the pattern has capture groups, findall returns the groups, not the whole match:

Python
re.findall(r"(\w+)@(\w+)", "ada@x bob@y")
# [('ada', 'x'), ('bob', 'y')]

re.sub: substitution

Python
re.sub(r"\d+", "###", "ho 42 mele e 17 pere")
# 'ho ### mele e ### pere'

It also works with a replacement function (which receives the Match):

Python
re.sub(r"\d+", lambda m: str(int(m.group()) * 2), "ho 42 mele")
# 'ho 84 mele'

Named groups

Python
m = re.search(r"(?P<anno>\d{4})-(?P<mese>\d{2})-(?P<giorno>\d{2})", "2025-01-15")
m.group("anno")    # '2025'
m.groupdict()      # {'anno': '2025', 'mese': '01', 'giorno': '15'}

Useful flags

Python
re.search(r"ciao", "CIAO", re.IGNORECASE)       # case-insensitive
re.findall(r"^.+$", testo, re.MULTILINE)        # ^/$ match every line
re.search(r"a.b", "a\nb", re.DOTALL)            # . matches newline too

re.finditer for memory safety

If you need to process matches in a huge text block, re.findall allocates a list in memory containing all extracted match strings. It is preferred to use re.finditer(), which yields a lazy iterator of Match objects, consuming minimal memory.

Try it

Exercise#python.m8.l4.e1
Attempts: 0Loading…

Given `text = 'ordine #123, totale €45.50, ricevuta #456'`, extract ALL the order numbers (the digits after #) into `orders` as a list of strings. Evaluate `orders`.

Loading editor…
Show hint

re.findall(r"#(\d+)", text) — capture group for digits only.

Solution available after 3 attempts

Review exercise

Exercise#python.m8.l4.e2
Attempts: 0Loading…

Given `s = 'tel: 06-12345678 e tel: 02-9876'`, replace all phone numbers (format \\d+-\\d+) with the string 'XXX' and assign to `masked`. Evaluate `masked`.

Loading editor…
Show hint

re.sub(r"\d+-\d+", "XXX", s)

Solution available after 3 attempts

Additional challenge

Exercise#python.m8.l4.e3
Attempts: 0Loading…

Import the `re` module. Extract all numbers consisting of one or more digits from the string `log_line = "Error 404 in 15ms"`. Store the list in `numbers` and evaluate it.

Loading editor…
Show hint

Use re.findall(r'\d+', log_line) to extract all numbers as a list of strings.

Solution available after 3 attempts