PRIMARY KEY in SQL: Definition, Syntax, and Best Practices

Understand what a primary key does, how to define one correctly, and why every table in a real schema needs one.

Every table in a real database needs a reliable way to identify each row. Without a primary key, you cannot safely join tables, update a specific record, or guarantee you are not working with duplicates. In the ecommerce schema, every table from customers to payments has a primary key column, and understanding why those columns were chosen that way is as useful as knowing the syntax.

What a PRIMARY KEY Enforces

A primary key combines two constraints in one declaration: NOT NULL and UNIQUE. Every value in the primary key column must exist and must be different from every other value in that column.

-- Single-column primary key on an integer ID
CREATE TABLE categories (
    category_id   INTEGER PRIMARY KEY,
    category_name TEXT    NOT NULL,
    description   TEXT,
    created_at    TEXT    NOT NULL
);

category_id is now the unique identifier for every category row. The database will reject any insert that supplies a duplicate category_id or tries to leave it as NULL.

-- Verify primary keys are unique and not null by querying the table
SELECT cat.category_id, cat.category_name
FROM categories cat
ORDER BY cat.category_id;

Key idea: A primary key is not just a naming convention. It is an enforced constraint. The database guarantees uniqueness and non-nullability at write time, which means you can always trust the primary key column to identify exactly one row.

Single-Column Primary Keys

The most common pattern is an integer surrogate key, an auto-incrementing number with no business meaning. Every table in the ecommerce schema uses this pattern.

CREATE TABLE customers (
    customer_id  INTEGER PRIMARY KEY,
    first_name   TEXT    NOT NULL,
    last_name    TEXT    NOT NULL,
    email        TEXT    NOT NULL UNIQUE,
    phone        TEXT,
    city         TEXT,
    country      TEXT,
    loyalty_tier TEXT,
    created_at   TEXT    NOT NULL
);

In SQLite, declaring a column as INTEGER PRIMARY KEY makes it an alias for the internal rowid. SQLite automatically assigns the next available integer when you insert a row without supplying that column.

-- Insert without specifying customer_id; SQLite assigns it automatically
INSERT INTO customers (first_name, last_name, email, created_at)
VALUES ('Arun', 'Mehta', 'arun@example.com', '2024-03-10');

-- Retrieve the row using the auto-assigned primary key
SELECT c.customer_id, c.first_name, c.last_name, c.email
FROM customers c
WHERE c.email = 'arun@example.com';

Composite Primary Keys

A composite primary key uses two or more columns together as the unique identifier. Neither column needs to be unique on its own. Only the combination must be unique.

This is the right choice for junction tables and line-item tables where a single integer ID is not meaningful.

-- Composite primary key on order_items
-- The same product cannot appear twice in the same order
CREATE TABLE order_items (
    item_id     INTEGER PRIMARY KEY,
    order_id    INTEGER NOT NULL,
    product_id  INTEGER NOT NULL,
    quantity    INTEGER NOT NULL,
    unit_price  REAL    NOT NULL,
    line_total  REAL    NOT NULL,
    UNIQUE (order_id, product_id)
);

Note that order_items still uses a surrogate item_id as its primary key and enforces the business rule separately with a UNIQUE constraint. An alternative would be to make (order_id, product_id) the primary key directly:

-- Pure composite primary key with no surrogate
CREATE TABLE order_items_alt (
    order_id    INTEGER NOT NULL,
    product_id  INTEGER NOT NULL,
    quantity    INTEGER NOT NULL,
    unit_price  REAL    NOT NULL,
    line_total  REAL    NOT NULL,
    PRIMARY KEY (order_id, product_id)
);

Both approaches are valid. Surrogate keys are simpler to reference in foreign keys and application code. Composite primary keys make the uniqueness rule more explicit in the schema itself.

Natural Keys vs Surrogate Keys

A natural key is a column that already exists in your data and has a unique business meaning, like email for customers or sku for products. A surrogate key is a meaningless integer added purely to identify the row.

-- Natural key example: sku as the identifier for products
CREATE TABLE products_natural (
    sku          TEXT    PRIMARY KEY,
    product_name TEXT    NOT NULL,
    price        REAL    NOT NULL,
    is_active    INTEGER NOT NULL DEFAULT 1
);

-- Surrogate key example: product_id as the identifier
CREATE TABLE products_surrogate (
    product_id   INTEGER PRIMARY KEY,
    sku          TEXT    NOT NULL UNIQUE,
    product_name TEXT    NOT NULL,
    price        REAL    NOT NULL,
    is_active    INTEGER NOT NULL DEFAULT 1
);

The ecommerce schema uses surrogate keys everywhere. This is standard practice because natural keys can change over time. A customer might change their email. A product SKU might be reassigned. Surrogate integer IDs never change, which makes joins and foreign key references stable.

-- Look up a specific order using its surrogate primary key
SELECT o.order_id, o.order_date, o.status, o.total_amount
FROM orders o
WHERE o.order_id = 1042;

How SQLite Handles PRIMARY KEY

In SQLite, INTEGER PRIMARY KEY is special. It is an alias for the rowid and SQLite auto-assigns the next integer on insert. TEXT PRIMARY KEY or REAL PRIMARY KEY do not get this treatment and will not auto-increment.

For composite primary keys, you must use the table-level PRIMARY KEY (col1, col2) syntax. You cannot declare a composite key inline with the column definitions.

-- Correct composite primary key syntax in SQLite
CREATE TABLE employee_departments (
    employee_id  INTEGER NOT NULL,
    department   TEXT    NOT NULL,
    assigned_at  TEXT    NOT NULL,
    PRIMARY KEY (employee_id, department)
);

Common Mistakes

-- wrong
CREATE TABLE payments (
    payment_id  INTEGER,
    order_id    INTEGER NOT NULL,
    amount      REAL    NOT NULL,
    status      TEXT    NOT NULL
);

-- correct
CREATE TABLE payments (
    payment_id  INTEGER PRIMARY KEY,
    order_id    INTEGER NOT NULL,
    amount      REAL    NOT NULL,
    status      TEXT    NOT NULL
);

Omitting PRIMARY KEY means the table has no enforced row identifier. You can end up with duplicate payment_id values and no reliable way to reference a specific payment.

---

-- wrong
CREATE TABLE order_items (
    order_id   INTEGER PRIMARY KEY,
    product_id INTEGER PRIMARY KEY,
    quantity   INTEGER NOT NULL
);

-- correct
CREATE TABLE order_items (
    order_id   INTEGER NOT NULL,
    product_id INTEGER NOT NULL,
    quantity   INTEGER NOT NULL,
    PRIMARY KEY (order_id, product_id)
);

You cannot declare PRIMARY KEY twice inline on two separate columns. For a composite primary key you must use the table-level PRIMARY KEY (col1, col2) syntax at the end of the column list.

---

-- wrong
SELECT * FROM orders WHERE order_id = NULL;

-- correct
SELECT * FROM orders WHERE order_id IS NULL;

Primary key columns are NOT NULL by definition, so this query returns no rows regardless of what is in the table. In practice you would never search for a null primary key, but using = NULL anywhere in SQL is always wrong.

Quick Recap

A primary key enforces both NOT NULL and UNIQUE on the identifying column. Every table should have one so rows can always be uniquely referenced.
Surrogate integer keys are the most common pattern. In SQLite, INTEGER PRIMARY KEY auto-increments by default so you do not need to supply the value on insert.
Composite primary keys cover a combination of columns and are declared at the table level using PRIMARY KEY (col1, col2). Use them for junction and line-item tables where the pair of values defines uniqueness.

Exercise 1 Easy

Write a query that retrieves customer_id, first_name, last_name, and email from the customers table for the single customer with customer_id = 5. Use the primary key in your WHERE clause.

Hint: Filter directly on c.customer_id = 5. A primary key lookup always returns at most one row.

Solution

SELECT c.customer_id, c.first_name, c.last_name, c.email FROM customers c WHERE c.customer_id = 5;

Exercise 2 Medium

Write a query that checks whether any duplicate order_id values exist in the payments table. An order should only have one payment record per transaction. Show order_id and a count called payment_count. Only return rows where the count is greater than 1. Sort by payment_count descending.

Hint: Use GROUP BY pay.order_id with HAVING COUNT(*) > 1. This is a data quality check against an implied uniqueness rule.

Solution

SELECT pay.order_id, COUNT(*) AS payment_count FROM payments pay GROUP BY pay.order_id HAVING COUNT(*) > 1 ORDER BY payment_count DESC;

Exercise 3 Hard

For each customer, retrieve their customer_id, first_name, last_name, and the total number of orders they have placed as total_orders. Only include customers who have placed more than 2 orders. Join customers and orders using the primary and foreign key relationship. Sort by total_orders descending, then by customer_id ascending.

Hint: Join on c.customer_id = o.customer_id. Use GROUP BY on the customer primary key and name columns. Filter with HAVING COUNT(o.order_id) > 2.

Solution

SELECT c.customer_id, c.first_name, c.last_name, COUNT(o.order_id) AS total_orders FROM customers c JOIN orders o ON c.customer_id = o.customer_id GROUP BY c.customer_id, c.first_name, c.last_name HAVING COUNT(o.order_id) > 2 ORDER BY total_orders DESC, c.customer_id ASC;

Frequently Asked Questions

What is a primary key in SQL?

A primary key is a column or combination of columns that uniquely identifies each row in a table. It cannot contain NULL values and no two rows can share the same primary key value.

What is the difference between a primary key and a unique key in SQL?

A table can only have one primary key, and it cannot contain NULLs. A unique key also enforces uniqueness but allows NULLs and a table can have multiple unique keys.

Can a primary key be made up of multiple columns?

Yes. This is called a composite primary key. The combination of the columns must be unique across all rows, even if individual columns repeat.

Should I use AUTOINCREMENT for primary keys in SQLite?

In SQLite, INTEGER PRIMARY KEY already auto-increments by default. Adding AUTOINCREMENT is stricter and prevents reuse of deleted IDs, but it is rarely necessary for most applications.