SQL Joins Theory Interview Questions
10 frequently asked SQL joins theory questions for data analyst and BI developer interviews, with clear answers.
- What is a SQL JOIN and why is it needed?
- What is the difference between INNER JOIN and LEFT JOIN?
- What is a SELF JOIN? Give a use case.
- What is a CROSS JOIN?
- What is the difference between JOIN and UNION?
- What happens when you JOIN on a column with NULL values?
- What is an ANTI JOIN?
- What is a NATURAL JOIN? Why is it risky?
- How do you JOIN more than two tables?
- What is the difference between ON and USING in a JOIN?
A JOIN combines rows from two or more tables based on a related column. Relational databases store data across multiple tables to avoid duplication (normalization). JOINs let you bring that data together when you need it. For example, combining orders with customers to see who placed each order.
An INNER JOIN returns only rows that have a matching row in both tables. A LEFT JOIN returns all rows from the left table and the matched rows from the right table β rows with no match in the right table appear with NULL values for the right-side columns.
A SELF JOIN joins a table to itself. It is useful for hierarchical or relational data within a single table. A classic use case is an employees table where each employee has a manager_id that also refers to employee_id in the same table β you self-join to display both employee and manager name in one row.
A CROSS JOIN produces the Cartesian product of two tables β every row from the first table is paired with every row from the second. If table A has 10 rows and table B has 5 rows, the result has 50 rows. It is rarely used in analytics but useful for generating combinations (e.g., all productβregion pairs for a planning matrix).
A JOIN combines columns from multiple tables horizontally (widens the result). A UNION combines rows from multiple SELECT queries vertically (stacks results). JOIN requires a matching condition; UNION requires matching column count and compatible data types.
NULLs never match anything in a JOIN condition because NULL = NULL evaluates to UNKNOWN in SQL. Rows where the join column is NULL will be excluded from an INNER JOIN result. In a LEFT JOIN the left-side row will still appear, but right-side columns will be NULL.
An ANTI JOIN returns rows from one table that have no matching row in another table. SQL has no ANTI JOIN keyword β it is implemented using LEFT JOIN ... WHERE right.key IS NULL or NOT EXISTS (...) or NOT IN (...). It is useful for finding orphan records or customers who have never placed an order.
A NATURAL JOIN automatically joins two tables on all columns that share the same name and data type. The risk is that it silently joins on any column with a matching name, which can produce unexpected results if tables have coincidentally named columns (e.g., a status column in both tables). It also makes queries harder to read and maintain.
You chain multiple JOIN clauses. Each JOIN adds one more table. The order matters logically β later joins can reference columns from earlier joined tables. Example: orders JOIN customers ON ... JOIN order_items ON ... JOIN products ON ... builds a result combining four tables.
Both specify the join condition. ON is the general form: JOIN orders ON orders.customer_id = customers.customer_id. USING is a shorthand when the column name is identical in both tables: JOIN orders USING (customer_id). With USING, the column appears only once in the result set, whereas with ON it appears from both tables.