๐Ÿ“ Theory Questions

SQL Aggregates & GROUP BY Theory Interview Questions

10 frequently asked SQL aggregate function and GROUP BY theory questions for data analyst interviews, with clear answers.

Q1 What are aggregate functions in SQL? Name the five core ones.

Aggregate functions perform a calculation on a set of rows and return a single value. The five core aggregate functions are: COUNT (counts rows), SUM (totals a numeric column), AVG (calculates the mean), MIN (finds the smallest value), and MAX (finds the largest value). They are almost always used with GROUP BY.

Q2 What is GROUP BY and when is it required?

GROUP BY collapses rows that share the same value in a specified column into a single summary row. It is required whenever you use an aggregate function alongside non-aggregated columns in your SELECT. Without GROUP BY, SQL would not know which rows to group together before aggregating.

Q3 What is the difference between WHERE and HAVING?

WHERE filters rows before grouping and aggregation โ€” it cannot reference aggregate results. HAVING filters groups after aggregation โ€” it can reference aggregate results like HAVING COUNT(*) > 5. The correct order is: WHERE filters rows โ†’ GROUP BY groups them โ†’ HAVING filters groups.

Q4 What does COUNT(*) count versus COUNT(column)?

COUNT(*) counts all rows in the group including those with NULLs. COUNT(column) counts only the rows where that specific column is NOT NULL. If you need to count distinct non-null values, use COUNT(DISTINCT column).

Q5 Can you GROUP BY multiple columns? What does it mean?

Yes. Grouping by multiple columns creates one row for each unique combination of values across all the specified columns. For example, GROUP BY department, role gives one row per department-role pair, not one per department.

Q6 Why does a column in SELECT need to be in GROUP BY (or be aggregated)?

SQL must return one row per group. If a non-aggregated column is not in GROUP BY, there could be multiple different values for it within the group โ€” SQL would not know which value to return. This is why non-aggregated columns must appear in GROUP BY; aggregate functions reduce the group to a single value and are therefore allowed in SELECT without appearing in GROUP BY.

Q7 What is the difference between SUM and COUNT?

COUNT counts the number of rows (or non-null values); SUM adds up numeric values. COUNT(*) on a table with 5 rows returns 5 regardless of the values. SUM(amount) on those same 5 rows returns the total of the amount column.

Q8 How do you find duplicate rows using GROUP BY?

Group by the column(s) that define uniqueness and use HAVING COUNT() > 1. For example: SELECT email, COUNT() AS cnt FROM customers GROUP BY email HAVING COUNT(*) > 1 finds emails that appear more than once.

Q9 What is a ROLLUP and how does it differ from regular GROUP BY?

GROUP BY ROLLUP(a, b) generates the normal grouped result plus subtotals. It produces rows for each unique (a, b) pair, then subtotals for each unique (a), then a grand total for the whole table. Regular GROUP BY only produces the finest level of grouping without any subtotals or grand totals.

Q10 Can you use aggregate functions in a WHERE clause?

No. Aggregate functions cannot be used in a WHERE clause because WHERE runs before the grouping and aggregation step. To filter on an aggregate result, use HAVING instead. If you need to filter individual rows based on a subquery result that uses aggregation, you can use a subquery in the WHERE clause.