Module 8 · Lesson 4 of 432/36 in the course~12 min

Module lessons (4/4)

re module: regex in Python

The re module exposes regular expressions in Python. If you already took the Regex course on this site, you will find all the syntax familiar; here we look at the APIs.

Raw string: `r"..."`

Regexes use many backslashes (\d, \b, \s). Always write them as raw strings so you don't have to double them up:

Python

import re
# raw: \d is a single token
pattern = r"\d+"
# NOT raw: you would have to write "\\d+" otherwise Python interprets \d as "d"

`re.search`: find the first occurrence

Python

import re
m = re.search(r"\d+", "ho 42 mele e 17 pere")
m.group()    # '42'
m.start()    # 3
m.end()      # 5

Returns a Match object if found, None otherwise. Idiomatic pattern:

Python

if m := re.search(r"\d+", testo):
    print(m.group())

`re.match`: only at the start of the string

Python

re.match(r"\d+", "42 anni")     # match!
re.match(r"\d+", "ho 42 anni")  # None (doesn't start with a digit)

To search anywhere prefer re.search.

`re.findall`: all occurrences

Python

re.findall(r"\d+", "ho 42 mele, 17 pere, 3 banane")
# ['42', '17', '3']

If the pattern has capture groups, findall returns the groups, not the whole match:

Python

re.findall(r"(\w+)@(\w+)", "ada@x bob@y")
# [('ada', 'x'), ('bob', 'y')]

`re.sub`: substitution

Python

re.sub(r"\d+", "###", "ho 42 mele e 17 pere")
# 'ho ### mele e ### pere'

It also works with a replacement function (which receives the Match):

Python

re.sub(r"\d+", lambda m: str(int(m.group()) * 2), "ho 42 mele")
# 'ho 84 mele'

Named groups

Python

m = re.search(r"(?P<anno>\d{4})-(?P<mese>\d{2})-(?P<giorno>\d{2})", "2025-01-15")
m.group("anno")    # '2025'
m.groupdict()      # {'anno': '2025', 'mese': '01', 'giorno': '15'}

Useful flags

Python

re.search(r"ciao", "CIAO", re.IGNORECASE)       # case-insensitive
re.findall(r"^.+$", testo, re.MULTILINE)        # ^/$ match every line
re.search(r"a.b", "a\nb", re.DOTALL)            # . matches newline too

re.finditer for memory safety

If you need to process matches in a huge text block, re.findall allocates a list in memory containing all extracted match strings. It is preferred to use re.finditer(), which yields a lazy iterator of Match objects, consuming minimal memory.

Try it

Exercise#python.m8.l4.e1

Attempts: 0Loading…

Given `text = 'ordine #123, totale €45.50, ricevuta #456'`, extract ALL the order numbers (the digits after #) into `orders` as a list of strings. Evaluate `orders`.

Loading editor…

Show hint

re.findall(r"#(\d+)", text) — capture group for digits only.

Solution available after 3 attempts

Review exercise

Exercise#python.m8.l4.e2

Attempts: 0Loading…

Given `s = 'tel: 06-12345678 e tel: 02-9876'`, replace all phone numbers (format \\d+-\\d+) with the string 'XXX' and assign to `masked`. Evaluate `masked`.

Loading editor…

Show hint

re.sub(r"\d+-\d+", "XXX", s)

Solution available after 3 attempts

Additional challenge

Exercise#python.m8.l4.e3

Attempts: 0Loading…

Import the `re` module. Extract all numbers consisting of one or more digits from the string `log_line = "Error 404 in 15ms"`. Store the list in `numbers` and evaluate it.

Loading editor…

Show hint

Use re.findall(r'\d+', log_line) to extract all numbers as a list of strings.

Solution available after 3 attempts

re module: regex in Python

Raw string: r"..."

re.search: find the first occurrence

re.match: only at the start of the string

re.findall: all occurrences

re.sub: substitution

Named groups

Useful flags

re.finditer for memory safety

Try it

Review exercise

Additional challenge

Raw string: `r"..."`

`re.search`: find the first occurrence

`re.match`: only at the start of the string

`re.findall`: all occurrences

`re.sub`: substitution