Module lessons (2/4)
Self-JOIN: a table with itself
Sometimes the relationship you want to traverse is not between two
different tables, but between rows of the same table. Typical example: a
"category → subcategory" hierarchy, where both parent and child are rows of
the same categories table, linked by a parent_id.
To read together one row (child) and "its" other row (parent) you use a
self-join: the same table appears twice in the FROM, with two
different aliases.
The syntax
SELECT colonne
FROM tabella AS alias_a
JOIN tabella AS alias_b ON alias_a.foreign_key = alias_b.id;The alias is mandatory: if we omit AS a / AS b, PostgreSQL has no
way to know which of the two copies we mean when we write name or id.
-- Ogni sotto-categoria con il nome della categoria padre:
SELECT child.name AS subcategory,
parent.name AS category
FROM categories AS child
JOIN categories AS parent ON child.parent_id = parent.id;Read in plain English: "take the rows of categories named child, and
for each one find the row of categories named parent whose id is the
parent_id of the child".
The result excludes the root categories (those with parent_id = NULL)
because an INNER JOIN skips rows with no match. If you want to include
them, use LEFT JOIN.
Self-join with LEFT to "keep the roots too"
-- Tutte le categorie, con nome del padre o NULL se sono radice:
SELECT child.name AS category,
parent.name AS parent
FROM categories AS child
LEFT JOIN categories AS parent ON child.parent_id = parent.id;Now you get 9 rows (every category in the dataset), with parent = NULL
for the 3 roots.
Aggregating with a self-join
It combines very well with GROUP BY to ask "how many children does each
parent have":
SELECT p.name AS parent_category,
COUNT(c.id) AS children
FROM categories AS p
LEFT JOIN categories AS c ON c.parent_id = p.id
WHERE p.parent_id IS NULL -- solo le radici
GROUP BY p.name
ORDER BY p.name;Note two important details:
WHERE p.parent_id IS NULLfilters before the JOIN: we take only the roots as the "left-hand table".COUNT(c.id)(notCOUNT(*)) returns 0 for roots without children, whereasCOUNT(*)would return 1 because of theLEFT JOINrow with NULL.
Try it yourself
Per ogni sotto-categoria (categorie con parent_id NOT NULL), mostra il nome della sotto-categoria e il nome della sua categoria-padre. Due colonne: subcategory, category. Ordina per category poi per subcategory.
Show hint
La condizione del self-JOIN è child.parent_id = parent.id.
Solution available after 3 attempts
Review exercise
Per ogni categoria-radice (parent_id IS NULL), mostra il nome e il numero di sotto-categorie. Due colonne: name, children. Ordina per name. Includi una eventuale radice senza figlie con 0.
Show hint
LEFT JOIN su categories due volte (alias p per padre, c per figlio). Usa COUNT(c.id), non COUNT(*), per ottenere 0 sulle radici senza figli.
Solution available after 3 attempts