Two customers register on your e-commerce platform with the same email address. Now you have duplicate accounts, split order history, and a broken password reset flow. A UNIQUE constraint on email would have stopped the second registration at the database level before the problem could exist. Constraints like this are your last line of defence against bad data.
What the UNIQUE Constraint Does
UNIQUE is a column or table-level constraint that guarantees no two rows share the same value in the constrained column or column combination. The database enforces this on every INSERT and UPDATE.
-- email and sku declared UNIQUE in their respective tables
CREATE TABLE customers (
customer_id INTEGER PRIMARY KEY,
first_name TEXT NOT NULL,
last_name TEXT NOT NULL,
email TEXT NOT NULL UNIQUE,
phone TEXT,
city TEXT,
country TEXT,
created_at TEXT NOT NULL
);
CREATE TABLE products (
product_id INTEGER PRIMARY KEY,
product_name TEXT NOT NULL,
price REAL NOT NULL,
sku TEXT NOT NULL UNIQUE,
is_active INTEGER NOT NULL DEFAULT 1,
created_at TEXT NOT NULL
);
With these definitions, inserting two customers with email = 'user@example.com' raises a constraint error immediately. The same applies to two products sharing a sku.
-- This would fail with: UNIQUE constraint failed: customers.email
INSERT INTO customers (customer_id, first_name, last_name, email, created_at)
VALUES (1, 'John', 'Doe', 'doe@example.com', '2024-05-01'),
(2, 'Jane', 'Doe', 'doe@example.com', '2024-05-01');
Single-Column vs Composite UNIQUE Constraints
A single-column UNIQUE constraint ensures no two rows share the same value in one column. A composite UNIQUE constraint covers a combination of columns. The combination must be unique, but individual columns in the pair can repeat.
-- Composite UNIQUE: one product can only appear once per order
CREATE TABLE order_items (
item_id INTEGER PRIMARY KEY,
order_id INTEGER NOT NULL,
product_id INTEGER NOT NULL,
quantity INTEGER NOT NULL,
unit_price REAL NOT NULL,
line_total REAL NOT NULL,
UNIQUE (order_id, product_id)
);
This allows product_id = 5 to appear in multiple orders, and order_id = 100 to contain many products. What it blocks is the same product appearing twice in the same order, which would be a data entry error.
UNIQUE constraint does not mean each column in the group is unique on its own. It means the combination of values across those columns must be unique. This is exactly what you need for junction tables and line-item tables.Auditing Existing Tables for Duplicate Values
If you inherit a database without constraints, you need to find duplicates manually before you can add a UNIQUE constraint. The standard approach uses GROUP BY with HAVING COUNT(*) > 1.
-- Find duplicate emails in the customers table
SELECT c.email, COUNT(*) AS occurrences
FROM customers c
WHERE c.email IS NOT NULL
GROUP BY c.email
HAVING COUNT(*) > 1
ORDER BY occurrences DESC;
-- Find duplicate SKUs in products
SELECT p.sku, COUNT(*) AS occurrences
FROM products p
WHERE p.sku IS NOT NULL
GROUP BY p.sku
HAVING COUNT(*) > 1
ORDER BY occurrences DESC;
-- Find duplicate transaction_id values in payments
-- A transaction ID should never appear twice
SELECT pay.transaction_id, COUNT(*) AS occurrences
FROM payments pay
WHERE pay.transaction_id IS NOT NULL
GROUP BY pay.transaction_id
HAVING COUNT(*) > 1
ORDER BY occurrences DESC;
If these queries return rows, you have a data quality problem. Fix the duplicates before adding the constraint, or the ALTER TABLE will fail.
Retrieving the Duplicate Rows Themselves
Finding the count of duplicates is step one. Step two is pulling the actual rows so you can investigate and clean them.
-- Get the full customer rows for duplicate emails
SELECT c.customer_id, c.first_name, c.last_name, c.email, c.created_at
FROM customers c
WHERE c.email IN (
SELECT email
FROM customers
WHERE email IS NOT NULL
GROUP BY email
HAVING COUNT(*) > 1
)
ORDER BY c.email, c.created_at;
-- Get duplicate order_id and product_id combinations in order_items
SELECT oi.item_id, oi.order_id, oi.product_id, oi.quantity, oi.line_total
FROM order_items oi
WHERE (oi.order_id, oi.product_id) IN (
SELECT order_id, product_id
FROM order_items
GROUP BY order_id, product_id
HAVING COUNT(*) > 1
)
ORDER BY oi.order_id, oi.product_id;
UNIQUE and NULL Behaviour in SQLite
SQLite treats NULL as unknown, so multiple NULL values in a UNIQUE column are allowed. Two rows with phone = NULL do not violate a UNIQUE constraint on phone because neither is a known duplicate of the other.
-- These two inserts would both succeed in SQLite
-- even if phone has a UNIQUE constraint
-- because NULL is not considered equal to NULL
SELECT c.customer_id, c.first_name, c.phone
FROM customers c
WHERE c.phone IS NULL
ORDER BY c.customer_id;
This matters when you are auditing. If you are looking for duplicates in a nullable column, always add WHERE col IS NOT NULL to your duplicate-detection query, or you may get confused by the NULL behaviour.
Common Mistakes
-- wrong
SELECT c.email, COUNT(*) AS cnt
FROM customers c
GROUP BY c.email
HAVING COUNT(*) > 1;
-- correct
SELECT c.email, COUNT(*) AS cnt
FROM customers c
WHERE c.email IS NOT NULL
GROUP BY c.email
HAVING COUNT(*) > 1;
Including NULL in a duplicate check groups all NULL emails together and may falsely report them as duplicates. Filter out NULL first since multiple NULLs are allowed by the UNIQUE constraint.
---
-- wrong
SELECT DISTINCT p.sku
FROM products p;
-- correct
SELECT p.sku, COUNT(*) AS occurrences
FROM products p
WHERE p.sku IS NOT NULL
GROUP BY p.sku
HAVING COUNT(*) > 1;
DISTINCT returns unique values but tells you nothing about which values are duplicated or how many times they appear. Use GROUP BY with HAVING to actually identify the duplicates.
---
-- wrong
SELECT c.customer_id, c.email
FROM customers c
WHERE c.email IN (
SELECT email FROM customers GROUP BY email HAVING COUNT(*) > 1
);
-- correct
SELECT c.customer_id, c.email
FROM customers c
WHERE c.email IS NOT NULL
AND c.email IN (
SELECT email FROM customers
WHERE email IS NOT NULL
GROUP BY email
HAVING COUNT(*) > 1
);
Without the IS NOT NULL filter in both the outer query and the subquery, you risk pulling rows with NULL email into your results when they are not true duplicates.
Quick Recap
UNIQUEprevents two rows from sharing the same value in a column or column combination. The database rejects any insert or update that would create a duplicate.- A composite
UNIQUEconstraint covers a combination of columns, not each column individually. Use it on junction tables and line-item tables where pairs must be unique. - Use
GROUP BYwithHAVING COUNT(*) > 1to find existing duplicates before adding aUNIQUEconstraint. Always filter outNULLvalues in nullable columns since multiple NULLs are permitted.
email values in the customers table. Show email and a count called occurrences. Exclude NULL emails. Only return emails that appear more than once. Sort by occurrences descending.GROUP BY c.email with HAVING COUNT(*) > 1. Add WHERE c.email IS NOT NULL to exclude NULLs.SELECT c.email, COUNT(*) AS occurrences FROM customers c WHERE c.email IS NOT NULL GROUP BY c.email HAVING COUNT(*) > 1 ORDER BY occurrences DESC;
order_id and product_id in the order_items table. Show order_id, product_id, and a count called occurrences. Only return combinations that appear more than once. Sort by order_id ascending, then product_id ascending.GROUP BY oi.order_id, oi.product_id and filter with HAVING COUNT(*) > 1.SELECT oi.order_id, oi.product_id, COUNT(*) AS occurrences FROM order_items oi GROUP BY oi.order_id, oi.product_id HAVING COUNT(*) > 1 ORDER BY oi.order_id ASC, oi.product_id ASC;
customer_id, first_name, last_name, email, and created_at for every affected row. Join back to the customers table using a subquery to identify the duplicate emails. Sort by email ascending, then created_at ascending.GROUP BY email HAVING COUNT(*) > 1 to find the duplicate emails, then use WHERE c.email IN (...) to pull the full rows. Add IS NOT NULL in both places.SELECT c.customer_id, c.first_name, c.last_name, c.email, c.created_at FROM customers c WHERE c.email IS NOT NULL AND c.email IN ( SELECT email FROM customers WHERE email IS NOT NULL GROUP BY email HAVING COUNT(*) > 1 ) ORDER BY c.email ASC, c.created_at ASC;
It prevents two rows from having the same value in a column or combination of columns. Any INSERT or UPDATE that would create a duplicate is rejected by the database.
A PRIMARY KEY is implicitly UNIQUE and also NOT NULL. A UNIQUE constraint allows NULL values and a table can have multiple UNIQUE constraints but only one PRIMARY KEY.
Yes in most databases. NULL is treated as unknown so two NULL values are not considered duplicates. SQLite follows this behaviour and allows multiple NULLs in a UNIQUE column.
In most databases you use ALTER TABLE to add a unique index or constraint. In SQLite, ALTER TABLE support is limited and you typically need to recreate the table.