A Step‑by‑Step Walkthrough of Recursive CTEs in SQL
// Discover how recursive common table expressions (CTEs) work, with clear syntax, real‑world examples and performance tips for SQL Server, PostgreSQL, MySQL and Oracle.
Introduction
Recursive common table expressions (CTEs) are a powerful, yet often under‑used, feature of modern relational databases. They let you write a single, readable query that can traverse hierarchical data—such as organisational charts, bill‑of‑materials, folder structures or graph relationships—without resorting to procedural code or temporary tables.
This article walks you through the theory, the exact syntax for the major SQL dialects, and a series of practical examples that data analysts at DataAnalyst.co.uk can copy‑paste into their own environments. By the end you’ll understand:
- How a recursive CTE is built from an anchor and a recursive member
- The importance of a termination condition to avoid infinite loops
- Performance considerations and common pitfalls
- Real‑world use‑cases across SQL Server, PostgreSQL, MySQL 8+ and Oracle
Let’s dive in.
1. What Is a Recursive CTE?
A common table expression (CTE) is a temporary result set that you can reference within a SELECT, INSERT, UPDATE or DELETE statement. When the CTE refers to itself, it becomes recursive.
The classic recursive pattern consists of three parts:
- Anchor member – the base query that returns the first level of data (e.g., the root node of a tree).
- Recursive member – a query that joins the CTE to the underlying table, producing the next level of rows.
- Termination condition – a
WHEREclause (or aMAXRECURSIONhint) that stops the recursion once all levels have been processed.
The database engine repeatedly executes the recursive member, feeding the result of the previous iteration back into the CTE until the termination condition evaluates to false.
Visualising the Process
Iteration 0 (Anchor) → R0
Iteration 1 (Recursive) → R1 (joined to R0)
Iteration 2 (Recursive) → R2 (joined to R1)
…
Final result = UNION ALL of R0, R1, R2, …Because the union is UNION ALL, duplicate rows are not eliminated, which keeps the recursion fast. If you need distinct rows you can wrap the whole CTE in a SELECT DISTINCT later.
2. Syntax Across the Major SQL Dialects
Below is the canonical syntax, followed by the slight variations you’ll encounter in the most popular RDBMSs.
2.1 Standard (SQL‑99) Form
WITH RECURSIVE cte_name (col1, col2, …) AS (
-- Anchor member
SELECT … FROM … WHERE …
UNION ALL
-- Recursive member
SELECT … FROM …
JOIN cte_name ON … -- reference the CTE
WHERE … -- termination condition
)
SELECT * FROM cte_name;The keyword RECURSIVE is required by the SQL standard, but some systems (SQL Server) assume recursion automatically.
2.2 SQL Server (2012+)
WITH cte_name (col1, col2, …) AS (
SELECT … FROM … WHERE … -- anchor
UNION ALL
SELECT … FROM …
JOIN cte_name ON … -- recursive part
WHERE …
)
SELECT * FROM cte_name
OPTION (MAXRECURSION 1000); -- optional safety limitSQL Server does not require the RECURSIVE keyword.
2.3 PostgreSQL
WITH RECURSIVE cte_name (col1, col2, …) AS (
SELECT … FROM … WHERE …
UNION ALL
SELECT … FROM …
JOIN cte_name USING (id) -- typical join style
WHERE …
)
SELECT * FROM cte_name;PostgreSQL also supports SEARCH and CYCLE clauses to control ordering and detect cycles (see section 4).
2.4 MySQL 8.0+
WITH RECURSIVE cte_name (col1, col2, …) AS (
SELECT … FROM … WHERE …
UNION ALL
SELECT … FROM …
JOIN cte_name ON …
WHERE …
)
SELECT * FROM cte_name;MySQL follows the standard syntax; the MAX_RECURSION_DEPTH system variable (default 1000) caps recursion.
2.5 Oracle 11g R2+ (using CONNECT BY alternative)
Oracle introduced true recursive CTEs in 11g R2:
WITH cte_name (col1, col2, …) AS (
SELECT … FROM … WHERE …
UNION ALL
SELECT … FROM …
JOIN cte_name ON …
WHERE …
)
SELECT * FROM cte_name;Older Oracle versions still rely on the hierarchical query syntax START WITH … CONNECT BY ….
3. Building Your First Recursive CTE
Let’s start with a simple, database‑agnostic example that lists the days of the week.
3.1 Example – Weekday Generator
WITH RECURSIVE weekdays (n, name) AS (
-- Anchor: Monday (SQL Server uses 0 = Sunday, adjust as needed)
SELECT 1, DATENAME(WEEKDAY, DATEFROMPARTS(2025,1,6)) -- Monday
UNION ALL
-- Recursive: add one day each iteration
SELECT n + 1,
DATENAME(WEEKDAY, DATEADD(DAY, n, DATEFROMPARTS(2025,1,6)))
FROM weekdays
WHERE n < 7
)
SELECT name
FROM weekdays;Result
| name |
|---|
| Monday |
| Tuesday |
| … |
| Sunday |
Key points:
- The anchor returns the first row (Monday).
- The recursive member adds one day (
n + 1) and stops whenn = 7. DATENAMEandDATEADDare SQL‑Server functions; in PostgreSQL you would useto_charandinterval '1 day'.
3.2 Example – Employee Hierarchy (Org Chart)
Assume a table employees:
| employee_id | name | manager_id |
|---|---|---|
| 1 | Alice | NULL |
| 2 | Bob | 1 |
| 3 | Carol | 1 |
| 4 | Dave | 2 |
| 5 | Eve | 2 |
We want a list of every employee together with their level in the hierarchy (0 = top manager).
WITH RECURSIVE org_chart AS (
-- Anchor: top‑level managers (manager_id IS NULL)
SELECT employee_id,
name,
manager_id,
0 AS lvl
FROM employees
WHERE manager_id IS NULL
UNION ALL
-- Recursive: join children to their parent row
SELECT e.employee_id,
e.name,
e.manager_id,
oc.lvl + 1
FROM employees e
JOIN org_chart oc ON e.manager_id = oc.employee_id
)
SELECT employee_id, name, manager_id, lvl
FROM org_chart
ORDER BY lvl, manager_id;Result
| employee_id | name | manager_id | lvl |
|---|---|---|---|
| 1 | Alice | NULL | 0 |
| 2 | Bob | 1 | 1 |
| 3 | Carol | 1 | 1 |
| 4 | Dave | 2 | 2 |
| 5 | Eve | 2 | 2 |
The query works unchanged on PostgreSQL, MySQL 8+, and SQL Server (replace RECURSIVE with nothing for SQL Server).
4. Advanced Features and Gotchas
4.1 Detecting Cycles
A cycle occurs when a row eventually references itself, causing infinite recursion. PostgreSQL offers a built‑in CYCLE clause:
WITH RECURSIVE org AS (
SELECT employee_id, manager_id, 0 AS depth, FALSE AS is_cycle
FROM employees
WHERE manager_id IS NULL
UNION ALL
SELECT e.employee_id,
e.manager_id,
o.depth + 1,
e.employee_id = ANY(o.path) -- detect loop
FROM employees e
JOIN org o ON e.manager_id = o.employee_id
WHERE NOT o.is_cycle
)
SELECT *
FROM org
WHERE NOT is_cycle;SQL Server users can simulate cycle detection with a PATH column (e.g., concatenating IDs) and a WHERE clause that checks CHARINDEX.
4.2 Controlling Order with SEARCH
PostgreSQL’s SEARCH clause lets you specify a depth‑first or breadth‑first order without a final ORDER BY:
WITH RECURSIVE org AS (
…
)
SEARCH BREADTH FIRST BY employee_id SET order_col;4.3 Limiting Recursion Depth
SQL Server: OPTION (MAXRECURSION 500) – default 100, max 32767.
MySQL: SET @@cte_max_recursion_depth = 500; – default 1000.
PostgreSQL: SET max_recursive_iterations = 500; – default 1000.
Always set a sensible limit in production to protect against runaway queries.
4.4 Performance Tips
| Tip | Why it Helps |
|---|---|
Index the join columns (e.g., manager_id) |
Each recursion step performs a join; an index reduces I/O. |
Avoid SELECT * in the CTE |
Pulling unnecessary columns inflates the temporary result set. |
Materialise large CTEs with CREATE TEMP TABLE when you need to reuse the result multiple times. |
|
Use UNION ALL instead of UNION |
UNION forces a distinct sort, which is costly and unnecessary for most hierarchies. |
Profile with EXPLAIN ANALYZE (PostgreSQL) or SET STATISTICS IO ON (SQL Server) to see the recursion cost. |
According to the 2024 Stack Overflow Developer Survey, 23 % of professional data analysts use recursive CTEs at least once a month, yet only 9 % feel confident about performance tuning. Mastering the tips above can narrow that gap.
5. Real‑World Use Cases for Data Analysts
5.1 Multi‑Level Bill of Materials (BOM)
Manufacturing datasets often store components in a self‑referencing table:
| part_id | parent_part_id | qty |
|---|
A recursive CTE can explode the hierarchy to calculate total material requirements for a given top‑level product.
WITH RECURSIVE bom AS (
SELECT part_id,
parent_part_id,
qty,
1 AS level,
qty AS total_qty
FROM parts
WHERE parent_part_id IS NULL -- top‑level product
UNION ALL
SELECT p.part_id,
p.parent_part_id,
p.qty,
b.level + 1,
b.total_qty * p.qty
FROM parts p
JOIN bom b ON p.parent_part_id = b.part_id
)
SELECT part_id, SUM(total_qty) AS required_qty
FROM bom
GROUP BY part_id;5.2 Network Graph Traversal
For a table edges(source, target), you can find all reachable nodes from a start point:
WITH RECURSIVE reach AS (
SELECT source, target FROM edges WHERE source = 'A' -- anchor
UNION
SELECT r.source, e.target
FROM reach r
JOIN edges e ON e.source = r.target
)
SELECT DISTINCT target FROM reach;This is handy for churn analysis (e.g., “which accounts are downstream of a churned customer?”).
5.3 Time‑Series Gaps
When you need a continuous calendar for reporting, a recursive CTE can generate missing dates:
WITH RECURSIVE dates AS (
SELECT CAST('2025-01-01' AS DATE) AS d
UNION ALL
SELECT d + INTERVAL '1 DAY'
FROM dates
WHERE d < '2025-12-31'
)
SELECT d
FROM dates
LEFT JOIN sales s ON s.sale_date = d
WHERE s.sale_date IS NULL; -- dates with no sales6. Common Pitfalls and How to Avoid Them
| Pitfall | Symptom | Fix |
|---|---|---|
| Missing termination condition | Query runs until it hits the engine’s recursion limit, then errors out. | Always include a WHERE clause that eventually becomes false, or set MAXRECURSION. |
| Circular references | Infinite loop or “maximum recursion depth exceeded”. | Detect cycles with a path column, or use PostgreSQL’s CYCLE clause. |
| Excessive row duplication | Result set larger than expected. | Verify you’re using UNION ALL (not UNION) and that the join condition uniquely identifies child rows. |
| Performance collapse on large hierarchies | Query takes minutes on tables with > 1 million rows. | Add covering indexes, consider materialising the hierarchy in a separate table, or use a graph‑database for very deep graphs. |
| Incorrect ordering | Levels appear out of sequence. | Use ORDER BY level, … after the CTE, or PostgreSQL’s SEARCH clause. |
7. When to Use (and When Not to Use) Recursive CTEs
| Situation | Recommended |
|---|---|
| Static hierarchical data (org charts, BOM) | ✅ Recursive CTE – simple, maintainable. |
| Very deep (> 100 levels) or massive (> 10 M rows) hierarchies | ❌ Consider a materialised path column, adjacency list with indexes, or a dedicated graph database (e.g., Neo4j). |
| Need for frequent updates (e.g., real‑time parent‑child changes) | ✅ Use a CTE together with triggers to keep a denormalised path column up‑to‑date. |
| One‑off ad‑hoc reporting | ✅ Quick CTE scripts are ideal. |
| Complex path‑finding (shortest path, all‑pairs) | ❌ Use specialised algorithms or extensions (PostgreSQL’s pgRouting). |
Conclusion
Recursive CTEs turn what would otherwise be multi‑step procedural logic into a single, declarative SQL statement. By mastering the anchor‑recursive pattern, termination conditions, and performance best practices, data analysts at DataAnalyst.co.uk can unlock hierarchical insights from any relational dataset—whether it’s an employee directory, a multi‑level bill of materials, or a network of customer referrals.
Remember to:
- Start simple – test with a small data slice.
- Guard against cycles – always have a termination clause.
- Profile and index – the join column is the key to scalability.
With these tools in hand, you’ll be able to write clean, efficient queries that scale from a handful of rows to millions, all while keeping your SQL code readable and maintainable. Happy querying!