Module lessons (3/4)
Grouping with GROUP BY
In Module 2 you learned that COUNT(*), AVG(...), SUM(...) reduce all
rows to a single result row. But what if you want an average salary
for each department, not for the whole company? You want one row per
group. That's the job of GROUP BY.
The syntax
SELECT grouping_column, AGGREGATE_FUNCTION(...)
FROM table
GROUP BY grouping_column;Read in English: "split the rows into groups that share the same value of
grouping_column, and for each group compute the aggregate".
-- Average salary per department:
SELECT department_id, AVG(salary) AS average
FROM employees
GROUP BY department_id;The result has one row for each distinct department_id found in
employees. You'll get 5 rows: one for each of the 4 departments (1, 2,
3, 4) plus one for NULL (Andrea Costa).
With JOIN for readable names
Often you want to group by department_id but show the name of the
department. You combine GROUP BY with a JOIN:
SELECT d.name AS department,
COUNT(*) AS count,
AVG(e.salary) AS average_salary
FROM employees AS e
JOIN departments AS d ON e.department_id = d.id
GROUP BY d.name
ORDER BY count DESC;This is a full-fledged business intelligence query: in three lines it tells you how many employees each department has and how much they earn on average, sorted from largest to smallest.
The golden rule
When you use GROUP BY, every column in the SELECT must be:
- one of the grouping columns (e.g.
department_id,d.name), or - inside an aggregate function (e.g.
AVG(salary),COUNT(*)).
-- ERROR: first_name is neither in GROUP BY nor inside an aggregate
SELECT department_id, first_name, AVG(salary)
FROM employees
GROUP BY department_id;PostgreSQL returns an explicit error ("column must appear in the GROUP BY
clause or be used in an aggregate function"). It's a common early-weeks
mistake: if you see it, stop and ask yourself "what is the value of
first_name for a group of 3 people?" — there's no single answer, and
that's why the engine refuses.
Try it
For each department, compute how many employees it contains. Return two columns: the department name (departments.name) and the employee count. One row per department.
Show hint
Aggregate with COUNT(*) and group by d.name. The SELECT must contain only the grouping column and the aggregate.
Solution available after 3 attempts
Review exercise
For each department (departments.name), compute the average salary (AVG(salary)) of its employees, sorted from highest to lowest.
Show hint
AVG(e.salary) as the aggregate, GROUP BY d.name, ORDER BY average DESC.
Solution available after 3 attempts