You need to find every customer who has placed more than three orders. The order count lives in orders, but the customer name lives in customers. You could solve this with a JOIN and a HAVING clause, or with a subquery that computes the count first and then filters. Both work. Knowing which one to reach for, and why, is what this lesson covers.
Solving the Same Problem Two Ways
The best way to understand JOIN vs subquery is to see both approaches side by side on the same question. Take this one: return the names of customers who have placed at least one order.
With a JOIN:
SELECT DISTINCT c.first_name, c.last_name
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
The JOIN connects every matching row between customers and orders. DISTINCT removes the duplicates that appear when a customer has multiple orders.
With a subquery:
SELECT c.first_name, c.last_name
FROM customers c
WHERE c.customer_id IN (
SELECT o.customer_id FROM orders o
)
The inner query returns a list of customer_id values from orders. The outer query returns only the customers whose customer_id appears in that list.
Both queries return the same result here. The JOIN version is shorter. The subquery version reads more like a sentence: give me customers whose ID is in the list of IDs that placed orders.
When a Subquery Is the Cleaner Choice
A subquery becomes the natural fit when you're filtering based on an aggregate. You can't use a [HAVING](learn/sql-having-clause) clause directly in a WHERE condition, so a subquery gives you a clean way to pre-compute the aggregate and then filter against it.
Find customers who have placed more than two orders:
SELECT c.first_name, c.last_name
FROM customers c
WHERE c.customer_id IN (
SELECT o.customer_id
FROM orders o
GROUP BY o.customer_id
HAVING COUNT(o.order_id) > 2
)
The inner query groups orders by customer and keeps only those with more than two. The outer query uses that list to pull the customer names. The aggregate logic stays contained in the subquery. The outer query stays clean.
You could write this as a [JOIN with aggregate](learn/sql-join-with-aggregate), but it requires more structure:
SELECT c.first_name, c.last_name
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.first_name, c.last_name
HAVING COUNT(o.order_id) > 2
This also works. The subquery version is clearer when the filtering logic is the point of the query and the aggregate is just the condition. The JOIN version is better when you also need columns from orders in the output.
When a JOIN Is the Better Fit
Use a JOIN when you need columns from both tables in your SELECT. Subqueries used in WHERE only filter rows; they don't add columns to the result.
Return each customer's name alongside their total spend:
SELECT c.first_name, c.last_name, SUM(o.total_amount) AS total_spent
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.first_name, c.last_name
ORDER BY total_spent DESC
You can't do this with a WHERE subquery. The aggregate result needs to appear in the output, not just in a filter condition. A JOIN is the right structure here.
Subqueries in FROM: A Middle Ground
A subquery can also sit in the FROM clause, where it behaves like a temporary table. This is called an inline view or derived table.
Return the average number of orders per customer:
SELECT AVG(order_count) AS avg_orders_per_customer
FROM (
SELECT o.customer_id, COUNT(o.order_id) AS order_count
FROM orders o
GROUP BY o.customer_id
) AS order_summary
The inner query counts orders per customer. The outer query averages those counts. You can't write this as a simple JOIN because you'd need to aggregate twice, which SQL doesn't allow in a single level.
This is also the first step toward CTEs, which give you a named version of the same pattern and are worth exploring once you're comfortable with inline subqueries.
How JOIN vs Subquery Connects to Other Concepts
The [INNER JOIN](learn/sql-inner-join) is the direct alternative to an IN subquery when both tables contain the columns you need. For existence checks, EXISTS is another subquery form that's worth knowing alongside IN.
Subqueries in FROM are the gateway to understanding [Common Table Expressions](learn/sql-cte), which do the same thing with a cleaner, named syntax.
Common Mistakes
1. Using a subquery in WHERE when you need the value in SELECT
A WHERE subquery filters rows but adds nothing to the output columns, so you can't reference its result in SELECT.
-- wrong: can't return total_spent this way
SELECT c.first_name, c.last_name,
(SELECT SUM(o.total_amount) FROM orders o WHERE o.customer_id = c.customer_id) AS total_spent
FROM customers c
WHERE total_spent > 500
Use a JOIN with HAVING to filter on an aggregated column that also appears in SELECT.
-- correct
SELECT c.first_name, c.last_name, SUM(o.total_amount) AS total_spent
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.first_name, c.last_name
HAVING SUM(o.total_amount) > 500
2. Using = with a subquery that returns multiple rows
A subquery used with = must return exactly one row. If it returns more, the query errors.
-- wrong: this returns many customer_ids, not one
SELECT o.order_id, o.total_amount
FROM orders o
WHERE o.customer_id = (
SELECT c.customer_id FROM customers c WHERE c.country = 'USA'
)
Use IN when the subquery can return multiple values.
-- correct
SELECT o.order_id, o.total_amount
FROM orders o
WHERE o.customer_id IN (
SELECT c.customer_id FROM customers c WHERE c.country = 'USA'
)
3. Joining when existence is all you need
Using a JOIN to check whether a related row exists can return duplicate rows when the joined table has multiple matches.
-- wrong: returns one row per order, not one row per customer
SELECT c.first_name, c.last_name
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
Either add DISTINCT or use a subquery with IN to check existence cleanly without duplicates.
-- correct
SELECT c.first_name, c.last_name
FROM customers c
WHERE c.customer_id IN (
SELECT o.customer_id FROM orders o
)
Quick Recap
- A JOIN combines rows from multiple tables and lets you include columns from all of them in the output.
- A subquery in WHERE filters rows based on a separate query result but doesn't add columns to the SELECT.
- Use a subquery when the filtering logic involves an aggregate and you don't need the aggregate value in the output.
- Use a JOIN when you need columns from both tables, or when you want to aggregate and display the result in the same query.
- A subquery using = must return exactly one row; use IN when the subquery can return multiple rows.
- Subqueries in FROM act like temporary tables and let you aggregate in two steps, which is a stepping stone toward CTEs.
order_id and total_amount for every order where total_amount is above the average total_amount across all orders.SELECT o.order_id, o.total_amount
FROM orders o
WHERE o.total_amount > (
SELECT AVG(total_amount) FROM orders
)
product_name and price for every product where price is above the average price across all products.products.SELECT p.product_name, p.price
FROM products p
WHERE p.price > (
SELECT AVG(price) FROM products
)
first_name and last_name of every employee who has handled at least one order, using a subquery to check membership.employee_id values from orders.SELECT e.first_name, e.last_name
FROM employees e
WHERE e.employee_id IN (
SELECT o.employee_id FROM orders o
)
first_name and last_name of every employee who has handled more than two orders, using a subquery that groups and filters by count.employee_id from orders, groups by employee_id, and uses HAVING COUNT(*) > 2. The outer query selects from employees using IN.SELECT e.first_name, e.last_name
FROM employees e
WHERE e.employee_id IN (
SELECT o.employee_id
FROM orders o
GROUP BY o.employee_id
HAVING COUNT(o.order_id) > 2
)
first_name, last_name, and the total total_amount they handled across all orders, using the alias total_handled, sorted from highest to lowest.employees and orders on employee_id, GROUP BY employee columns, SUM total_amount, and ORDER BY the alias DESC.SELECT e.first_name, e.last_name, SUM(o.total_amount) AS total_handled
FROM employees e
JOIN orders o ON e.employee_id = o.employee_id
GROUP BY e.employee_id, e.first_name, e.last_name
ORDER BY total_handled DESC
A JOIN combines rows from two or more tables into one result set. A subquery is a query nested inside another query, used to filter or compute a value before the outer query runs.
Often yes, but it depends on the database engine and the query. Most modern databases optimise both well. Readability and correctness matter more than micro-optimising at the learning stage.
Use a subquery when you need to filter rows based on an aggregate result, or when the inner logic is easier to read as a separate step than as an additional join.
Yes. A subquery used with IN or EXISTS can return multiple rows. A subquery used with = must return exactly one row, or the query will error.