Lesson 47

SQL JOIN with Aggregate Functions: Group After Joining

Combine JOIN and aggregate functions in one query so you can summarise data from multiple tables without writing separate queries.

Your finance lead wants a breakdown of total revenue by country, but the revenue is in order_items and the country is in customers. Neither table alone can answer the question. You need to join them and then aggregate. That's the pattern this lesson covers.

The Basic Pattern: JOIN First, Then Aggregate

The reason this combination matters is that most real reporting questions span more than one table. A JOIN with aggregate functions is not two separate operations you stitch together; it's one query where the join runs first and the aggregation runs second.

The structure is always the same: join the tables you need, then GROUP BY the dimension you want to summarise by, then apply the aggregate function to the measure column.

Here's the simplest version. Total revenue per customer:

SELECT c.first_name, c.last_name, SUM(o.total_amount) AS total_spent
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.first_name, c.last_name

The JOIN brings customers and orders together on customer_id. The GROUP BY collapses the rows so each customer appears once. The SUM adds up every total_amount for that customer across all their orders.

Notice customer_id is in GROUP BY even though it's not in SELECT. That's a common practice to ensure grouping is unambiguous, especially when two customers might share the same name.

Key idea: The JOIN runs first and creates one wide result set. GROUP BY and the aggregate functions then work on that combined result, not on the original tables separately.

Counting Rows Across Tables

COUNT works the same way. If you want to know how many orders each customer has placed, join the tables and count the order rows per customer:

SELECT c.first_name, c.last_name, COUNT(o.order_id) AS order_count
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.first_name, c.last_name
ORDER BY order_count DESC

COUNT(o.order_id) counts the number of non-NULL order_id values for each group. Because we're using an [INNER JOIN](learn/sql-inner-join), only customers who have at least one order appear in the result. Customers with no orders are excluded.

If you need to include customers with zero orders, switch to a LEFT JOIN. The COUNT would then return zero for those customers because o.order_id would be NULL.

Aggregating Across Three Tables

The pattern extends to more tables without changing the logic. The join chain runs first, and then you aggregate from whichever table holds the measure column.

Total items sold per product, with the product name:

SELECT p.product_name, SUM(oi.quantity) AS total_sold
FROM products p
JOIN order_items oi ON p.product_id = oi.product_id
GROUP BY p.product_id, p.product_name
ORDER BY total_sold DESC

To add the brand alongside the product name, it's already in products, so no extra join is needed. To add something like the customer's country, you'd extend the chain through orders and then customers, as covered in the [joining multiple tables lesson](learn/sql-joining-multiple-tables).

Filtering Before and After Aggregation

You can filter rows before grouping with WHERE and filter groups after grouping with [HAVING](learn/sql-having-clause).

WHERE applies to the joined rows before any aggregation:

SELECT c.country, SUM(o.total_amount) AS revenue
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
WHERE o.status = 'delivered'
GROUP BY c.country
ORDER BY revenue DESC

Only delivered orders enter the aggregation. Cancelled or pending orders are excluded before the SUM runs.

HAVING applies after aggregation, so it can reference the result of the aggregate function:

SELECT c.country, SUM(o.total_amount) AS revenue
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
WHERE o.status = 'delivered'
GROUP BY c.country
HAVING SUM(o.total_amount) > 1000
ORDER BY revenue DESC

This returns only countries where delivered revenue exceeds 1000. The WHERE cuts the rows first. The HAVING cuts the groups second. Both can appear in the same query.

Key idea: WHERE filters rows before grouping. HAVING filters groups after aggregation. They solve different problems and both work alongside JOINs.

How JOIN with Aggregate Connects to Other Concepts

Understanding [aggregate functions](learn/sql-count-sum-avg-min-max) like COUNT and SUM independently is a prerequisite here. Once you know what they do on a single table, combining them with a JOIN is a direct extension, not a new concept.

The column aliasing rules from single-table queries carry over unchanged. Always prefix column names with the table alias when the query touches more than one table, including inside GROUP BY and HAVING.

Common Mistakes

1. Grouping by a column that isn't in SELECT or an aggregate

Including a non-aggregated column in SELECT without adding it to GROUP BY causes an error in most databases.

-- wrong: last_name is in SELECT but not in GROUP BY
SELECT c.first_name, c.last_name, COUNT(o.order_id) AS order_count
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.first_name

Add every non-aggregated column from SELECT to the GROUP BY clause.

-- correct
SELECT c.first_name, c.last_name, COUNT(o.order_id) AS order_count
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.first_name, c.last_name

2. Using WHERE to filter on an aggregate result

WHERE doesn't have access to aggregate results because it runs before GROUP BY.

-- wrong
SELECT c.country, COUNT(o.order_id) AS order_count
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
WHERE COUNT(o.order_id) > 5
GROUP BY c.country

Move aggregate conditions to HAVING, which runs after grouping.

-- correct
SELECT c.country, COUNT(o.order_id) AS order_count
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.country
HAVING COUNT(o.order_id) > 5

3. Forgetting to alias-prefix aggregated columns from joined tables

When multiple joined tables share a column name, using the column without a prefix in the aggregate function causes an ambiguity error.

-- wrong: status exists in both orders and payments
SELECT o.order_id, COUNT(status) AS status_count
FROM orders o
JOIN payments pay ON o.order_id = pay.order_id
GROUP BY o.order_id

Always prefix the column with its table alias inside the aggregate function.

-- correct
SELECT o.order_id, COUNT(pay.status) AS payment_count
FROM orders o
JOIN payments pay ON o.order_id = pay.order_id
GROUP BY o.order_id

Quick Recap

  • The JOIN runs first and creates a combined result set; GROUP BY and aggregate functions then work on that result.
  • Every non-aggregated column in SELECT must appear in GROUP BY.
  • WHERE filters joined rows before aggregation; HAVING filters groups after aggregation.
  • COUNT with an INNER JOIN excludes rows with no match; use a LEFT JOIN to include unmatched rows with a count of zero.
  • Alias-prefix every column in aggregate functions when the query joins more than one table.
  • This pattern produces most standard reporting outputs: totals per customer, counts per category, revenue per region.
Exercise 1 Easy
Write a query to return the total amount collected and the number of payments per method in the payments table, using the aliases total_collected and payment_count.
Hint: GROUP BY method in payments, then use SUM on amount and COUNT(*).
Solution
SELECT pay.method, SUM(pay.amount) AS total_collected, COUNT(*) AS payment_count
FROM payments pay
GROUP BY pay.method
Exercise 2 Easy
Write a query to return each department from the employees table and the number of employees in that department, using the alias employee_count, sorted from highest to lowest.
Hint: GROUP BY department in employees and use COUNT(*). Sort with ORDER BY employee_count DESC.
Solution
SELECT e.department, COUNT(*) AS employee_count
FROM employees e
GROUP BY e.department
ORDER BY employee_count DESC
Exercise 3 Medium
Write a query to return each employee's first_name and last_name alongside the number of orders they handled, using the alias order_count.
Hint: JOIN employees and orders on employee_id, GROUP BY employee_id, first_name, and last_name, then COUNT order_id.
Solution
SELECT e.first_name, e.last_name, COUNT(o.order_id) AS order_count
FROM employees e
JOIN orders o ON e.employee_id = o.employee_id
GROUP BY e.employee_id, e.first_name, e.last_name
Exercise 4 Medium
Write a query to return each employee's first_name, last_name, and total total_amount they handled across all orders, using the alias total_handled, but only include employees where that total exceeds 500, sorted from highest to lowest.
Hint: JOIN employees and orders on employee_id, GROUP BY employee columns, use SUM on total_amount, then filter with HAVING.
Solution
SELECT e.first_name, e.last_name, SUM(o.total_amount) AS total_handled
FROM employees e
JOIN orders o ON e.employee_id = o.employee_id
GROUP BY e.employee_id, e.first_name, e.last_name
HAVING SUM(o.total_amount) > 500
ORDER BY total_handled DESC
Exercise 5 Medium
Write a query to return each brand from products and the total quantity sold across all order items, using the alias total_qty_sold, sorted from highest to lowest.
Hint: JOIN products and order_items on product_id, GROUP BY brand, and SUM quantity.
Solution
SELECT p.brand, SUM(oi.quantity) AS total_qty_sold
FROM products p
JOIN order_items oi ON p.product_id = oi.product_id
GROUP BY p.brand
ORDER BY total_qty_sold DESC
Frequently Asked Questions

Yes. You JOIN the tables first, then GROUP BY a column from any of the joined tables. The aggregate functions run on the combined rows after the join.

Use COUNT(*) or COUNT(column) in your SELECT after joining the tables. GROUP BY the column you want to count by, such as a customer name or product category.

WHERE runs after the JOIN but before GROUP BY. It filters the combined rows before aggregation happens.

WHERE filters individual rows before grouping. HAVING filters groups after aggregation. If you want to exclude groups based on a COUNT or SUM result, use HAVING.