Module lessons (4/4)
A query inside another: subqueries
You have learned how to extract, filter, group, join and transform. There is one last fundamental idea: putting a query inside another. These are called subqueries and they are the step that lets you answer questions like "who earns above average?" or "which departments have someone assigned to a project?".
The basic syntax
A subquery is a SELECT wrapped in parentheses, used in place of a single
value or a list of values in another query:
-- Find who earns MORE THAN THE COMPANY AVERAGE:
SELECT first_name, last_name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);The engine runs the subquery first — SELECT AVG(salary) FROM employees
returns a single number — and then uses that number in the outer WHERE
as if you had written it by hand. This is precisely the advantage: you do
not need to know the average in advance, SQL computes it for you.
The three most common "types"
1. Subquery returning a single value
The example above: the subquery produces one row, one column (called
scalar). You use it with =, >, < and so on.
-- Employee with the highest salary:
SELECT first_name, last_name
FROM employees
WHERE salary = (SELECT MAX(salary) FROM employees);2. Subquery returning a list
The subquery produces one column, many rows. You use it with
IN (...):
-- Departments that have at least one employee:
SELECT name
FROM departments
WHERE id IN (SELECT DISTINCT department_id
FROM employees
WHERE department_id IS NOT NULL);3. Subquery as a "table" (FROM)
A subquery can also take the place of a table in FROM. It is useful to
pre-compute aggregates and then filter or join them:
-- Departments with average salary above 40000:
SELECT d.name, medie.stipendio_medio
FROM departments AS d
JOIN (SELECT department_id, AVG(salary) AS stipendio_medio
FROM employees
GROUP BY department_id) AS medie
ON d.id = medie.department_id
WHERE medie.stipendio_medio > 40000;Subquery or JOIN? When to use which
Often the same result can be obtained with a JOIN or with a subquery.
Rule of thumb:
- To filter rows of one table based on a property of another
("employees whose departments are…"),
IN (subquery)is very readable. - To combine columns from multiple tables in the result, use
JOIN. - To compute an aggregate and use it as a threshold ("above the average"), the scalar subquery is the natural choice.
In practice you will see both: the SQL engine often optimises them the same way. Choose the form that reads best.
Try it
Find first name (first_name), last name (last_name) and salary (salary) of the employees who earn STRICTLY more than the company average. Three columns.
Show hint
Replace ? with the subquery SELECT AVG(salary) FROM employees — it returns a single number to use as the threshold.
Solution available after 3 attempts
Review exercise
List the name (departments.name) of departments that have at least one employee with salary strictly greater than 50000. Use a subquery with IN. Order alphabetically.
Show hint
The inner subquery returns the ids of the departments that have employees with salary > 50000. Then WHERE id IN (...) on the outer query.
Solution available after 3 attempts