Module 8 · Lesson 3 of 431/36 in the course~12 min

Module lessons (3/4)

collections: Counter and defaultdict

The collections module adds specialized data types that extend the built-in collections. The three most used: Counter, defaultdict, namedtuple.

`Counter`: frequency counting

Python

from collections import Counter

parole = ["mela", "pera", "mela", "kiwi", "mela", "pera"]
c = Counter(parole)
# Counter({'mela': 3, 'pera': 2, 'kiwi': 1})

c["mela"]            # 3
c["banana"]          # 0   (default for missing keys, NO KeyError)
c.most_common(2)     # [('mela', 3), ('pera', 2)]

It also works on strings (counts characters):

Python

Counter("ciao mondo")
# Counter({'o': 2, 'c': 1, 'i': 1, 'a': 1, ' ': 1, 'm': 1, 'n': 1, 'd': 1})

It supports set-like operations on counts (+, -, &, |) — very handy for aggregating counts from different sources.

`defaultdict`: dict with automatic default

A dict that, when you access a missing key, creates it by calling a factory.

Python

from collections import defaultdict

gruppi = defaultdict(list)        # factory = list (empty list)
for nome in ["Ada", "Linus", "Ada", "Grace"]:
    gruppi[nome].append(1)
# defaultdict(list, {'Ada': [1, 1], 'Linus': [1], 'Grace': [1]})

Without defaultdict, you would have to write:

Python

gruppi = {}
for nome in [...]:
    if nome not in gruppi:
        gruppi[nome] = []
    gruppi[nome].append(1)

Common factories: list, int (default 0), set, dict.

`namedtuple`: tuples with field names

A lightweight way to create immutable record-classes. It is a tuple, but with field access by name.

Python

from collections import namedtuple

Punto = namedtuple("Punto", ["x", "y"])
p = Punto(3, 4)
p.x          # 3 (access by name)
p[0]         # 3 (access by index, still a tuple)
p.x + p.y    # 7

# typical use: return multiple values from a function
def divisione(a, b):
    Risultato = namedtuple("Risultato", ["quoziente", "resto"])
    return Risultato(a // b, a % b)

r = divisione(17, 5)
r.quoziente   # 3
r.resto       # 2

(For more sophisticated cases there is also dataclasses since 3.7 — see M9.)

namedtuple: lightweight immutable records

The collections module also exports namedtuple, which lets you quickly build lightweight class-like objects to store structured data without writing constructors or boilerplate methods:

Python

from collections import namedtuple
Point = namedtuple('Point', ['x', 'y'])
p = Point(10, 20)
print(p.x, p.y)

Try it

Exercise#python.m8.l3.e1

Attempts: 0Loading…

Given the list `words = ['mela', 'pera', 'mela', 'kiwi', 'mela', 'pera']`, compute the most frequent word in `top` as a string. Evaluate `top`.

Loading editor…

Show hint

Counter(...).most_common(1) returns [(word, count)].

Solution available after 3 attempts

Review exercise

Exercise#python.m8.l3.e2

Attempts: 0Loading…

Given the students `enrollments = [('Ada', 'mate'), ('Linus', 'fisica'), ('Ada', 'storia'), ('Grace', 'mate')]`, build `courses_per_student` as a defaultdict(list). Evaluate `dict(courses_per_student)`.

Loading editor…

Show hint

defaultdict(list) then for s, c in enrollments.

Solution available after 3 attempts

Additional challenge

Exercise#python.m8.l3.e3

Attempts: 0Loading…

Import `Counter` from `collections`. Count the frequency of characters in the string `text = "abracadabra"`. Store the counter in `char_counter` and evaluate it.

Loading editor…

Show hint

Counter takes the string as a parameter and counts the occurrences of each letter.

Solution available after 3 attempts

collections: Counter and defaultdict

Counter: frequency counting

defaultdict: dict with automatic default

namedtuple: tuples with field names

namedtuple: lightweight immutable records

Try it

Review exercise

Additional challenge

`Counter`: frequency counting

`defaultdict`: dict with automatic default

`namedtuple`: tuples with field names