πŸ“ Theory Questions

SQL Joins Theory Interview Questions

10 frequently asked SQL joins theory questions for data analyst and BI developer interviews, with clear answers.

Q1 What is a SQL JOIN and why is it needed?

A JOIN combines rows from two or more tables based on a related column. Relational databases store data across multiple tables to avoid duplication (normalization). JOINs let you bring that data together when you need it. For example, combining orders with customers to see who placed each order.

Q2 What is the difference between INNER JOIN and LEFT JOIN?

An INNER JOIN returns only rows that have a matching row in both tables. A LEFT JOIN returns all rows from the left table and the matched rows from the right table β€” rows with no match in the right table appear with NULL values for the right-side columns.

Q3 What is a SELF JOIN? Give a use case.

A SELF JOIN joins a table to itself. It is useful for hierarchical or relational data within a single table. A classic use case is an employees table where each employee has a manager_id that also refers to employee_id in the same table β€” you self-join to display both employee and manager name in one row.

Q4 What is a CROSS JOIN?

A CROSS JOIN produces the Cartesian product of two tables β€” every row from the first table is paired with every row from the second. If table A has 10 rows and table B has 5 rows, the result has 50 rows. It is rarely used in analytics but useful for generating combinations (e.g., all product–region pairs for a planning matrix).

Q5 What is the difference between JOIN and UNION?

A JOIN combines columns from multiple tables horizontally (widens the result). A UNION combines rows from multiple SELECT queries vertically (stacks results). JOIN requires a matching condition; UNION requires matching column count and compatible data types.

Q6 What happens when you JOIN on a column with NULL values?

NULLs never match anything in a JOIN condition because NULL = NULL evaluates to UNKNOWN in SQL. Rows where the join column is NULL will be excluded from an INNER JOIN result. In a LEFT JOIN the left-side row will still appear, but right-side columns will be NULL.

Q7 What is an ANTI JOIN?

An ANTI JOIN returns rows from one table that have no matching row in another table. SQL has no ANTI JOIN keyword β€” it is implemented using LEFT JOIN ... WHERE right.key IS NULL or NOT EXISTS (...) or NOT IN (...). It is useful for finding orphan records or customers who have never placed an order.

Q8 What is a NATURAL JOIN? Why is it risky?

A NATURAL JOIN automatically joins two tables on all columns that share the same name and data type. The risk is that it silently joins on any column with a matching name, which can produce unexpected results if tables have coincidentally named columns (e.g., a status column in both tables). It also makes queries harder to read and maintain.

Q9 How do you JOIN more than two tables?

You chain multiple JOIN clauses. Each JOIN adds one more table. The order matters logically β€” later joins can reference columns from earlier joined tables. Example: orders JOIN customers ON ... JOIN order_items ON ... JOIN products ON ... builds a result combining four tables.

Q10 What is the difference between ON and USING in a JOIN?

Both specify the join condition. ON is the general form: JOIN orders ON orders.customer_id = customers.customer_id. USING is a shorthand when the column name is identical in both tables: JOIN orders USING (customer_id). With USING, the column appears only once in the result set, whereas with ON it appears from both tables.