Lesson 13

UNIQUE Constraint in SQL: Preventing Duplicate Values

Define uniqueness rules on your columns and write queries that detect existing duplicate values in any table.

Two customers register on your e-commerce platform with the same email address. Now you have duplicate accounts, split order history, and a broken password reset flow. A UNIQUE constraint on email would have stopped the second registration at the database level before the problem could exist. Constraints like this are your last line of defence against bad data.

What the UNIQUE Constraint Does

UNIQUE is a column or table-level constraint that guarantees no two rows share the same value in the constrained column or column combination. The database enforces this on every INSERT and UPDATE.

-- email and sku declared UNIQUE in their respective tables
CREATE TABLE customers (
    customer_id  INTEGER PRIMARY KEY,
    first_name   TEXT    NOT NULL,
    last_name    TEXT    NOT NULL,
    email        TEXT    NOT NULL UNIQUE,
    phone        TEXT,
    city         TEXT,
    country      TEXT,
    created_at   TEXT    NOT NULL
);

CREATE TABLE products (
    product_id   INTEGER PRIMARY KEY,
    product_name TEXT    NOT NULL,
    price        REAL    NOT NULL,
    sku          TEXT    NOT NULL UNIQUE,
    is_active    INTEGER NOT NULL DEFAULT 1,
    created_at   TEXT    NOT NULL
);

With these definitions, inserting two customers with email = 'user@example.com' raises a constraint error immediately. The same applies to two products sharing a sku.

-- This would fail with: UNIQUE constraint failed: customers.email
INSERT INTO customers (customer_id, first_name, last_name, email, created_at)
VALUES (1, 'John', 'Doe', 'doe@example.com', '2024-05-01'),
       (2, 'Jane', 'Doe', 'doe@example.com', '2024-05-01');

Single-Column vs Composite UNIQUE Constraints

A single-column UNIQUE constraint ensures no two rows share the same value in one column. A composite UNIQUE constraint covers a combination of columns. The combination must be unique, but individual columns in the pair can repeat.

-- Composite UNIQUE: one product can only appear once per order
CREATE TABLE order_items (
    item_id     INTEGER PRIMARY KEY,
    order_id    INTEGER NOT NULL,
    product_id  INTEGER NOT NULL,
    quantity    INTEGER NOT NULL,
    unit_price  REAL    NOT NULL,
    line_total  REAL    NOT NULL,
    UNIQUE (order_id, product_id)
);

This allows product_id = 5 to appear in multiple orders, and order_id = 100 to contain many products. What it blocks is the same product appearing twice in the same order, which would be a data entry error.

Key idea: A composite UNIQUE constraint does not mean each column in the group is unique on its own. It means the combination of values across those columns must be unique. This is exactly what you need for junction tables and line-item tables.

Auditing Existing Tables for Duplicate Values

If you inherit a database without constraints, you need to find duplicates manually before you can add a UNIQUE constraint. The standard approach uses GROUP BY with HAVING COUNT(*) > 1.

-- Find duplicate emails in the customers table
SELECT c.email, COUNT(*) AS occurrences
FROM customers c
WHERE c.email IS NOT NULL
GROUP BY c.email
HAVING COUNT(*) > 1
ORDER BY occurrences DESC;
-- Find duplicate SKUs in products
SELECT p.sku, COUNT(*) AS occurrences
FROM products p
WHERE p.sku IS NOT NULL
GROUP BY p.sku
HAVING COUNT(*) > 1
ORDER BY occurrences DESC;
-- Find duplicate transaction_id values in payments
-- A transaction ID should never appear twice
SELECT pay.transaction_id, COUNT(*) AS occurrences
FROM payments pay
WHERE pay.transaction_id IS NOT NULL
GROUP BY pay.transaction_id
HAVING COUNT(*) > 1
ORDER BY occurrences DESC;

If these queries return rows, you have a data quality problem. Fix the duplicates before adding the constraint, or the ALTER TABLE will fail.

Retrieving the Duplicate Rows Themselves

Finding the count of duplicates is step one. Step two is pulling the actual rows so you can investigate and clean them.

-- Get the full customer rows for duplicate emails
SELECT c.customer_id, c.first_name, c.last_name, c.email, c.created_at
FROM customers c
WHERE c.email IN (
    SELECT email
    FROM customers
    WHERE email IS NOT NULL
    GROUP BY email
    HAVING COUNT(*) > 1
)
ORDER BY c.email, c.created_at;
-- Get duplicate order_id and product_id combinations in order_items
SELECT oi.item_id, oi.order_id, oi.product_id, oi.quantity, oi.line_total
FROM order_items oi
WHERE (oi.order_id, oi.product_id) IN (
    SELECT order_id, product_id
    FROM order_items
    GROUP BY order_id, product_id
    HAVING COUNT(*) > 1
)
ORDER BY oi.order_id, oi.product_id;

UNIQUE and NULL Behaviour in SQLite

SQLite treats NULL as unknown, so multiple NULL values in a UNIQUE column are allowed. Two rows with phone = NULL do not violate a UNIQUE constraint on phone because neither is a known duplicate of the other.

-- These two inserts would both succeed in SQLite
-- even if phone has a UNIQUE constraint
-- because NULL is not considered equal to NULL

SELECT c.customer_id, c.first_name, c.phone
FROM customers c
WHERE c.phone IS NULL
ORDER BY c.customer_id;

This matters when you are auditing. If you are looking for duplicates in a nullable column, always add WHERE col IS NOT NULL to your duplicate-detection query, or you may get confused by the NULL behaviour.

Common Mistakes

-- wrong
SELECT c.email, COUNT(*) AS cnt
FROM customers c
GROUP BY c.email
HAVING COUNT(*) > 1;
-- correct
SELECT c.email, COUNT(*) AS cnt
FROM customers c
WHERE c.email IS NOT NULL
GROUP BY c.email
HAVING COUNT(*) > 1;

Including NULL in a duplicate check groups all NULL emails together and may falsely report them as duplicates. Filter out NULL first since multiple NULLs are allowed by the UNIQUE constraint.

---

-- wrong
SELECT DISTINCT p.sku
FROM products p;
-- correct
SELECT p.sku, COUNT(*) AS occurrences
FROM products p
WHERE p.sku IS NOT NULL
GROUP BY p.sku
HAVING COUNT(*) > 1;

DISTINCT returns unique values but tells you nothing about which values are duplicated or how many times they appear. Use GROUP BY with HAVING to actually identify the duplicates.

---

-- wrong
SELECT c.customer_id, c.email
FROM customers c
WHERE c.email IN (
    SELECT email FROM customers GROUP BY email HAVING COUNT(*) > 1
);
-- correct
SELECT c.customer_id, c.email
FROM customers c
WHERE c.email IS NOT NULL
  AND c.email IN (
    SELECT email FROM customers
    WHERE email IS NOT NULL
    GROUP BY email
    HAVING COUNT(*) > 1
  );

Without the IS NOT NULL filter in both the outer query and the subquery, you risk pulling rows with NULL email into your results when they are not true duplicates.

Quick Recap

  • UNIQUE prevents two rows from sharing the same value in a column or column combination. The database rejects any insert or update that would create a duplicate.
  • A composite UNIQUE constraint covers a combination of columns, not each column individually. Use it on junction tables and line-item tables where pairs must be unique.
  • Use GROUP BY with HAVING COUNT(*) > 1 to find existing duplicates before adding a UNIQUE constraint. Always filter out NULL values in nullable columns since multiple NULLs are permitted.
Exercise 1 Easy
Write a query to find any duplicate email values in the customers table. Show email and a count called occurrences. Exclude NULL emails. Only return emails that appear more than once. Sort by occurrences descending.
Hint: Use GROUP BY c.email with HAVING COUNT(*) > 1. Add WHERE c.email IS NOT NULL to exclude NULLs.
Solution
SELECT c.email, COUNT(*) AS occurrences FROM customers c WHERE c.email IS NOT NULL GROUP BY c.email HAVING COUNT(*) > 1 ORDER BY occurrences DESC;
Exercise 2 Medium
Write a query to detect any duplicate combinations of order_id and product_id in the order_items table. Show order_id, product_id, and a count called occurrences. Only return combinations that appear more than once. Sort by order_id ascending, then product_id ascending.
Hint: Use GROUP BY oi.order_id, oi.product_id and filter with HAVING COUNT(*) > 1.
Solution
SELECT oi.order_id, oi.product_id, COUNT(*) AS occurrences FROM order_items oi GROUP BY oi.order_id, oi.product_id HAVING COUNT(*) > 1 ORDER BY oi.order_id ASC, oi.product_id ASC;
Exercise 3 Hard
Find all customers who share an email with at least one other customer. Show customer_id, first_name, last_name, email, and created_at for every affected row. Join back to the customers table using a subquery to identify the duplicate emails. Sort by email ascending, then created_at ascending.
Hint: Use a subquery with GROUP BY email HAVING COUNT(*) > 1 to find the duplicate emails, then use WHERE c.email IN (...) to pull the full rows. Add IS NOT NULL in both places.
Solution
SELECT c.customer_id, c.first_name, c.last_name, c.email, c.created_at FROM customers c WHERE c.email IS NOT NULL AND c.email IN ( SELECT email FROM customers WHERE email IS NOT NULL GROUP BY email HAVING COUNT(*) > 1 ) ORDER BY c.email ASC, c.created_at ASC;
Frequently Asked Questions

It prevents two rows from having the same value in a column or combination of columns. Any INSERT or UPDATE that would create a duplicate is rejected by the database.

A PRIMARY KEY is implicitly UNIQUE and also NOT NULL. A UNIQUE constraint allows NULL values and a table can have multiple UNIQUE constraints but only one PRIMARY KEY.

Yes in most databases. NULL is treated as unknown so two NULL values are not considered duplicates. SQLite follows this behaviour and allows multiple NULLs in a UNIQUE column.

In most databases you use ALTER TABLE to add a unique index or constraint. In SQLite, ALTER TABLE support is limited and you typically need to recreate the table.