Skip to main content
eLearner.app
Module 5 · Lesson 2 of 418/57 in the course~10 min
Module lessons (2/4)

Self-JOIN: a table with itself

Sometimes the relationship you want to traverse is not between two different tables, but between rows of the same table. Typical example: a "category → subcategory" hierarchy, where both parent and child are rows of the same categories table, linked by a parent_id.

To read together one row (child) and "its" other row (parent) you use a self-join: the same table appears twice in the FROM, with two different aliases.

The syntax

SQL
SELECT colonne
FROM   tabella AS alias_a
JOIN   tabella AS alias_b ON alias_a.foreign_key = alias_b.id;

The alias is mandatory: if we omit AS a / AS b, PostgreSQL has no way to know which of the two copies we mean when we write name or id.

SQL
-- Ogni sotto-categoria con il nome della categoria padre:
SELECT child.name  AS subcategory,
       parent.name AS category
FROM   categories AS child
JOIN   categories AS parent ON child.parent_id = parent.id;

Read in plain English: "take the rows of categories named child, and for each one find the row of categories named parent whose id is the parent_id of the child".

The result excludes the root categories (those with parent_id = NULL) because an INNER JOIN skips rows with no match. If you want to include them, use LEFT JOIN.

Self-join with LEFT to "keep the roots too"

SQL
-- Tutte le categorie, con nome del padre o NULL se sono radice:
SELECT child.name AS category,
       parent.name AS parent
FROM   categories AS child
LEFT JOIN categories AS parent ON child.parent_id = parent.id;

Now you get 9 rows (every category in the dataset), with parent = NULL for the 3 roots.

Aggregating with a self-join

It combines very well with GROUP BY to ask "how many children does each parent have":

SQL
SELECT p.name AS parent_category,
       COUNT(c.id) AS children
FROM   categories AS p
LEFT JOIN categories AS c ON c.parent_id = p.id
WHERE  p.parent_id IS NULL          -- solo le radici
GROUP BY p.name
ORDER BY p.name;

Note two important details:

  1. WHERE p.parent_id IS NULL filters before the JOIN: we take only the roots as the "left-hand table".
  2. COUNT(c.id) (not COUNT(*)) returns 0 for roots without children, whereas COUNT(*) would return 1 because of the LEFT JOIN row with NULL.

Try it yourself

Exercise#sql.m5.l2.e1
Attempts: 0Loading…

Per ogni sotto-categoria (categorie con parent_id NOT NULL), mostra il nome della sotto-categoria e il nome della sua categoria-padre. Due colonne: subcategory, category. Ordina per category poi per subcategory.

Loading editor…
Show hint

La condizione del self-JOIN è child.parent_id = parent.id.

Solution available after 3 attempts

Review exercise

Exercise#sql.m5.l2.e2
Attempts: 0Loading…

Per ogni categoria-radice (parent_id IS NULL), mostra il nome e il numero di sotto-categorie. Due colonne: name, children. Ordina per name. Includi una eventuale radice senza figlie con 0.

Loading editor…
Show hint

LEFT JOIN su categories due volte (alias p per padre, c per figlio). Usa COUNT(c.id), non COUNT(*), per ottenere 0 sulle radici senza figli.

Solution available after 3 attempts