Your head of sales asks for a report: total revenue by country, broken down by product category, for the last six months. That question crosses at least three tables. If you don't understand how those tables relate to each other and why they're structured the way they are, you'll spend more time guessing joins than actually writing the query. The star schema is the pattern that makes those joins predictable.
What a Star Schema Is and Why It Exists
Most databases you'll query for reporting weren't designed to store data efficiently. They were designed to answer questions efficiently. The star schema is the most common layout for doing that.
At the centre sits a fact table. It records events: a sale, a payment, a login. Each row represents one thing that happened. The fact table contains foreign keys pointing outward and numeric columns you can measure, like total_amount or quantity.
Around the fact table sit dimension tables. Each one holds descriptive information about one aspect of the event: who was involved, what was bought, when it happened, where it was shipped. The visual result looks like a star, with the fact table in the middle and the dimensions around it.
In the ecommerce schema, order_items behaves like a fact table. Each row records one product being sold: how many units, at what price, for what total. The dimension tables around it are products (what was sold), orders (when and to whom), and through orders, customers (who bought it) and employees (who processed it).
The Fact Table Up Close
The fact table is the most queried table in a star schema. It's wide, it's long, and almost every analytical question starts there.
In order_items, the measurable columns are quantity, unit_price, and line_total. The foreign keys are order_id and product_id. Those keys are the spokes of the star. They don't describe the sale; they point to tables that do.
SELECT oi.order_id, oi.product_id, oi.quantity, oi.line_total
FROM order_items oi
On its own, this query tells you numbers. It doesn't tell you what the product was called or which customer placed the order. For that, you need to join the dimension tables.
Joining Dimensions to the Fact Table
The value of a star schema is that every dimension connects to the fact table through a single, direct foreign key. There are no intermediate tables to navigate. Each join is one step.
To add product names to the sales data:
SELECT p.product_name, oi.quantity, oi.line_total
FROM order_items oi
JOIN products p ON oi.product_id = p.product_id
products is a dimension table here. It adds context to the fact. The join is on product_id, which is the primary key of products and a foreign key in order_items.
To bring in the customer dimension, you go through orders first, because order_items doesn't hold customer_id directly:
SELECT c.country, p.product_name, oi.line_total
FROM order_items oi
JOIN orders o ON oi.order_id = o.order_id
JOIN customers c ON o.customer_id = c.customer_id
JOIN products p ON oi.product_id = p.product_id
This query crosses four tables but follows the star structure cleanly. order_items is the centre. Each join adds one layer of context.
When you want to aggregate across dimensions, you add GROUP BY:
SELECT c.country, SUM(oi.line_total) AS total_revenue
FROM order_items oi
JOIN orders o ON oi.order_id = o.order_id
JOIN customers c ON o.customer_id = c.customer_id
GROUP BY c.country
ORDER BY total_revenue DESC
This answers the kind of question your head of sales asked. The fact table provides the numbers. The dimension table provides the grouping.
How the Star Schema Connects to Other Concepts
The joins in a star schema follow exactly the same syntax as any other [JOIN in SQL](sqlround.com/lessons/sql-joins). The schema just makes the join paths obvious before you start.
Understanding the star schema also makes [GROUP BY](sqlround.com/lessons/sql-group-by) queries more intuitive. The fact table gives you what to measure. The dimension tables give you what to group by.
If you're not sure which table is the fact table in an unfamiliar schema, look at the [ER diagram](sqlround.com/lessons/sql-er-diagrams). The table with the most foreign keys and the most rows is almost always the fact table.
Common Mistakes
1. Selecting descriptive columns from the fact table instead of the dimension
The fact table holds foreign keys, not names or labels. Selecting product_id when you want product_name gives you a number, not a label.
-- wrong
SELECT oi.product_id, SUM(oi.line_total) AS revenue
FROM order_items oi
GROUP BY oi.product_id
Join the dimension table to get the label alongside the measure.
-- correct
SELECT p.product_name, SUM(oi.line_total) AS revenue
FROM order_items oi
JOIN products p ON oi.product_id = p.product_id
GROUP BY p.product_name
2. Aggregating without joining the right dimension
Grouping by a column that isn't in the FROM clause or a joined table causes an error or missing results.
-- wrong
SELECT c.country, SUM(oi.line_total) AS revenue
FROM order_items oi
GROUP BY c.country
Always join the dimension table before referencing its columns in SELECT or GROUP BY.
-- correct
SELECT c.country, SUM(oi.line_total) AS revenue
FROM order_items oi
JOIN orders o ON oi.order_id = o.order_id
JOIN customers c ON o.customer_id = c.customer_id
GROUP BY c.country
3. Joining dimension tables to each other instead of through the fact table
Dimension tables don't connect directly to each other in a star schema. They all connect through the fact table.
-- wrong
SELECT c.country, p.product_name
FROM customers c
JOIN products p ON c.customer_id = p.product_id
Start from the fact table and join each dimension outward from there.
-- correct
SELECT c.country, p.product_name, oi.line_total
FROM order_items oi
JOIN orders o ON oi.order_id = o.order_id
JOIN customers c ON o.customer_id = c.customer_id
JOIN products p ON oi.product_id = p.product_id
Quick Recap
- A star schema has one fact table at the centre and multiple dimension tables around it, each connected by a foreign key.
- The fact table stores measurable events, such as sales or payments, and contains the numeric columns you aggregate.
- Dimension tables store descriptive context, such as customer names, product details, or employee roles.
- Every join in a star schema goes from the fact table outward to a dimension, never between two dimension tables directly.
- When writing analytical queries, SELECT descriptive columns from the dimensions and aggregate numeric columns from the fact table.
- The star schema pattern makes [GROUP BY](sqlround.com/lessons/sql-group-by) queries across multiple dimensions straightforward once you've identified which table is the centre.
order_id, quantity, and line_total for every row in the order_items table.order_items. No joins needed.SELECT oi.order_id, oi.quantity, oi.line_total
FROM order_items oi
product_name, price, and brand for every product in the products table, sorted by price descending.products. Sort with ORDER BY price DESC.SELECT p.product_name, p.price, p.brand
FROM products p
ORDER BY p.price DESC
product_name and the total line_total sold per product across all orders, using the column alias total_revenue.order_items and products on product_id, then GROUP BY product_name and use SUM on line_total.SELECT p.product_name, SUM(oi.line_total) AS total_revenue
FROM order_items oi
JOIN products p ON oi.product_id = p.product_id
GROUP BY p.product_name
product_name and total line_total per product, but only include products where the total revenue is above 1000, sorted by total revenue descending.SELECT p.product_name, SUM(oi.line_total) AS total_revenue
FROM order_items oi
JOIN products p ON oi.product_id = p.product_id
GROUP BY p.product_name
HAVING SUM(oi.line_total) > 1000
ORDER BY total_revenue DESC
country from the customers table and the total line_total from all their orders, using the column alias total_spent, sorted from highest to lowest.order_items to orders on order_id, then join orders to customers on customer_id. GROUP BY country and SUM line_total.SELECT c.country, SUM(oi.line_total) AS total_spent
FROM order_items oi
JOIN orders o ON oi.order_id = o.order_id
JOIN customers c ON o.customer_id = c.customer_id
GROUP BY c.country
ORDER BY total_spent DESC
A star schema is a database design pattern where one central fact table holds measurable events, and several dimension tables hold the descriptive details. The fact table connects to each dimension table through a foreign key.
A fact table stores what happened, such as a sale or a payment, and contains numbers you can measure. A dimension table stores the context around that event, such as who the customer was or what the product was.
Star schemas are most common in data warehouses and reporting databases, but the JOIN pattern behind them works in any relational database.
In a star schema, each dimension table is flat and connects directly to the fact table. In a snowflake schema, dimension tables are further split into sub-tables, adding more levels of normalisation.