Lesson 48

SQL JOIN vs Subquery: When to Use Each

Choose between a JOIN and a subquery with confidence so you can write the right query structure for every reporting task.

You need to find every customer who has placed more than three orders. The order count lives in orders, but the customer name lives in customers. You could solve this with a JOIN and a HAVING clause, or with a subquery that computes the count first and then filters. Both work. Knowing which one to reach for, and why, is what this lesson covers.

Solving the Same Problem Two Ways

The best way to understand JOIN vs subquery is to see both approaches side by side on the same question. Take this one: return the names of customers who have placed at least one order.

With a JOIN:

SELECT DISTINCT c.first_name, c.last_name
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id

The JOIN connects every matching row between customers and orders. DISTINCT removes the duplicates that appear when a customer has multiple orders.

With a subquery:

SELECT c.first_name, c.last_name
FROM customers c
WHERE c.customer_id IN (
  SELECT o.customer_id FROM orders o
)

The inner query returns a list of customer_id values from orders. The outer query returns only the customers whose customer_id appears in that list.

Both queries return the same result here. The JOIN version is shorter. The subquery version reads more like a sentence: give me customers whose ID is in the list of IDs that placed orders.

Key idea: A JOIN works across rows. A subquery can work on a computed result. The right choice depends on what you're comparing.

When a Subquery Is the Cleaner Choice

A subquery becomes the natural fit when you're filtering based on an aggregate. You can't use a [HAVING](learn/sql-having-clause) clause directly in a WHERE condition, so a subquery gives you a clean way to pre-compute the aggregate and then filter against it.

Find customers who have placed more than two orders:

SELECT c.first_name, c.last_name
FROM customers c
WHERE c.customer_id IN (
  SELECT o.customer_id
  FROM orders o
  GROUP BY o.customer_id
  HAVING COUNT(o.order_id) > 2
)

The inner query groups orders by customer and keeps only those with more than two. The outer query uses that list to pull the customer names. The aggregate logic stays contained in the subquery. The outer query stays clean.

You could write this as a [JOIN with aggregate](learn/sql-join-with-aggregate), but it requires more structure:

SELECT c.first_name, c.last_name
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.first_name, c.last_name
HAVING COUNT(o.order_id) > 2

This also works. The subquery version is clearer when the filtering logic is the point of the query and the aggregate is just the condition. The JOIN version is better when you also need columns from orders in the output.

When a JOIN Is the Better Fit

Use a JOIN when you need columns from both tables in your SELECT. Subqueries used in WHERE only filter rows; they don't add columns to the result.

Return each customer's name alongside their total spend:

SELECT c.first_name, c.last_name, SUM(o.total_amount) AS total_spent
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.first_name, c.last_name
ORDER BY total_spent DESC

You can't do this with a WHERE subquery. The aggregate result needs to appear in the output, not just in a filter condition. A JOIN is the right structure here.

Key idea: If you need the aggregated value in your SELECT output, use a JOIN. If you only need it to decide which rows to include, a subquery in WHERE works cleanly.

Subqueries in FROM: A Middle Ground

A subquery can also sit in the FROM clause, where it behaves like a temporary table. This is called an inline view or derived table.

Return the average number of orders per customer:

SELECT AVG(order_count) AS avg_orders_per_customer
FROM (
  SELECT o.customer_id, COUNT(o.order_id) AS order_count
  FROM orders o
  GROUP BY o.customer_id
) AS order_summary

The inner query counts orders per customer. The outer query averages those counts. You can't write this as a simple JOIN because you'd need to aggregate twice, which SQL doesn't allow in a single level.

This is also the first step toward CTEs, which give you a named version of the same pattern and are worth exploring once you're comfortable with inline subqueries.

How JOIN vs Subquery Connects to Other Concepts

The [INNER JOIN](learn/sql-inner-join) is the direct alternative to an IN subquery when both tables contain the columns you need. For existence checks, EXISTS is another subquery form that's worth knowing alongside IN.

Subqueries in FROM are the gateway to understanding [Common Table Expressions](learn/sql-cte), which do the same thing with a cleaner, named syntax.

Common Mistakes

1. Using a subquery in WHERE when you need the value in SELECT

A WHERE subquery filters rows but adds nothing to the output columns, so you can't reference its result in SELECT.

-- wrong: can't return total_spent this way
SELECT c.first_name, c.last_name,
  (SELECT SUM(o.total_amount) FROM orders o WHERE o.customer_id = c.customer_id) AS total_spent
FROM customers c
WHERE total_spent > 500

Use a JOIN with HAVING to filter on an aggregated column that also appears in SELECT.

-- correct
SELECT c.first_name, c.last_name, SUM(o.total_amount) AS total_spent
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.first_name, c.last_name
HAVING SUM(o.total_amount) > 500

2. Using = with a subquery that returns multiple rows

A subquery used with = must return exactly one row. If it returns more, the query errors.

-- wrong: this returns many customer_ids, not one
SELECT o.order_id, o.total_amount
FROM orders o
WHERE o.customer_id = (
  SELECT c.customer_id FROM customers c WHERE c.country = 'USA'
)

Use IN when the subquery can return multiple values.

-- correct
SELECT o.order_id, o.total_amount
FROM orders o
WHERE o.customer_id IN (
  SELECT c.customer_id FROM customers c WHERE c.country = 'USA'
)

3. Joining when existence is all you need

Using a JOIN to check whether a related row exists can return duplicate rows when the joined table has multiple matches.

-- wrong: returns one row per order, not one row per customer
SELECT c.first_name, c.last_name
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id

Either add DISTINCT or use a subquery with IN to check existence cleanly without duplicates.

-- correct
SELECT c.first_name, c.last_name
FROM customers c
WHERE c.customer_id IN (
  SELECT o.customer_id FROM orders o
)

Quick Recap

  • A JOIN combines rows from multiple tables and lets you include columns from all of them in the output.
  • A subquery in WHERE filters rows based on a separate query result but doesn't add columns to the SELECT.
  • Use a subquery when the filtering logic involves an aggregate and you don't need the aggregate value in the output.
  • Use a JOIN when you need columns from both tables, or when you want to aggregate and display the result in the same query.
  • A subquery using = must return exactly one row; use IN when the subquery can return multiple rows.
  • Subqueries in FROM act like temporary tables and let you aggregate in two steps, which is a stepping stone toward CTEs.
Exercise 1 Easy
Write a query to return the order_id and total_amount for every order where total_amount is above the average total_amount across all orders.
Hint: Use a subquery in WHERE with a SELECT AVG(total_amount) FROM orders inside it. No alias needed on the subquery here.
Solution
SELECT o.order_id, o.total_amount
FROM orders o
WHERE o.total_amount > (
  SELECT AVG(total_amount) FROM orders
)
Exercise 2 Easy
Write a query to return the product_name and price for every product where price is above the average price across all products.
Hint: Use a subquery in WHERE that computes AVG(price) from products.
Solution
SELECT p.product_name, p.price
FROM products p
WHERE p.price > (
  SELECT AVG(price) FROM products
)
Exercise 3 Medium
Write a query to return the first_name and last_name of every employee who has handled at least one order, using a subquery to check membership.
Hint: Use WHERE employee_id IN with a subquery that selects distinct employee_id values from orders.
Solution
SELECT e.first_name, e.last_name
FROM employees e
WHERE e.employee_id IN (
  SELECT o.employee_id FROM orders o
)
Exercise 4 Medium
Write a query to return the first_name and last_name of every employee who has handled more than two orders, using a subquery that groups and filters by count.
Hint: The inner query selects employee_id from orders, groups by employee_id, and uses HAVING COUNT(*) > 2. The outer query selects from employees using IN.
Solution
SELECT e.first_name, e.last_name
FROM employees e
WHERE e.employee_id IN (
  SELECT o.employee_id
  FROM orders o
  GROUP BY o.employee_id
  HAVING COUNT(o.order_id) > 2
)
Exercise 5 Medium
Write a query to return each employee's first_name, last_name, and the total total_amount they handled across all orders, using the alias total_handled, sorted from highest to lowest.
Hint: JOIN employees and orders on employee_id, GROUP BY employee columns, SUM total_amount, and ORDER BY the alias DESC.
Solution
SELECT e.first_name, e.last_name, SUM(o.total_amount) AS total_handled
FROM employees e
JOIN orders o ON e.employee_id = o.employee_id
GROUP BY e.employee_id, e.first_name, e.last_name
ORDER BY total_handled DESC
Frequently Asked Questions

A JOIN combines rows from two or more tables into one result set. A subquery is a query nested inside another query, used to filter or compute a value before the outer query runs.

Often yes, but it depends on the database engine and the query. Most modern databases optimise both well. Readability and correctness matter more than micro-optimising at the learning stage.

Use a subquery when you need to filter rows based on an aggregate result, or when the inner logic is easier to read as a separate step than as an additional join.

Yes. A subquery used with IN or EXISTS can return multiple rows. A subquery used with = must return exactly one row, or the query will error.