SQL Set Operators Demystified: UNION, INTERSECT and EXCEPT Explained for Data Analysts
// Discover how UNION, INTERSECT and EXCEPT work in SQL, their performance impact, cross‑database syntax, and practical tips for data analysts in 2025.
Introduction
Set operators are a powerful yet often misunderstood feature of SQL. They let you combine the results of two (or more) SELECT statements into a single result set, without using joins. For data analysts, mastering UNION, INTERSECT and EXCEPT (or MINUS in Oracle) can simplify reporting, data cleaning and ad‑hoc analysis.
This article breaks down each operator, highlights differences between major RDBMSs, and offers optimisation advice that reflects the latest releases (PostgreSQL 17, SQL Server 2022, Oracle 23c, MySQL 8.4). By the end you’ll know when to use each operator, how they affect query performance, and how to avoid common pitfalls.
What Are SQL Set Operators?
A set operator combines the output of two SELECT statements that have the same number of columns and compatible data types. The three core operators covered here are:
| Operator | Meaning | Typical RDBMS name |
|---|---|---|
UNION |
Rows from both queries, duplicates removed | UNION (all vendors) |
INTERSECT |
Rows that appear in both queries | INTERSECT (SQL Server, PostgreSQL, Oracle, MySQL 8.4) |
EXCEPT / MINUS |
Rows from the first query that are not in the second | EXCEPT (SQL Server, PostgreSQL, MySQL 8.4) / MINUS (Oracle) |
All three operators follow the same syntactic pattern:
SELECT … -- Query A
<set_operator>
SELECT … -- Query B
[ORDER BY …] -- Optional, placed after the final SELECTColumn and Data‑type Rules
- Column count must match – if Query A returns three columns, Query B must also return three.
- Data‑type compatibility – the database will attempt implicit conversion, but it is safest to align the data types explicitly (e.g., cast
INTtoBIGINTif needed). - Column names – the final result set inherits column names from the first
SELECT. Use column aliases in the first query to control naming.
UNION vs UNION ALL
UNION – Removing Duplicates
UNION performs a DISTINCT on the combined rows. Internally the engine typically:
- Executes both queries.
- Concatenates the result sets.
- Sorts the combined rows (or uses a hash aggregate) to eliminate duplicates.
Because of the extra sorting/aggregation step, UNION can be 30‑70 % slower than UNION ALL on large data sets, especially when the result set is millions of rows. Benchmarks on PostgreSQL 17 show an average runtime increase of 0.45 seconds per 1 million rows when switching from UNION ALL to UNION.
UNION ALL – Keep Everything
UNION ALL simply appends the rows from the second query to the first, preserving duplicates. No sorting or aggregation occurs, making it the fastest way to stitch together results.
When to use UNION ALL:
- You know the two queries cannot produce overlapping rows (e.g., non‑overlapping date ranges or partition keys).
- You need a complete audit trail where duplicate records are meaningful.
- You plan to de‑duplicate later using
ROW_NUMBER()or a temporary table.
Practical Example (PostgreSQL)
-- Customers (active) + Employees (current) – keep duplicates for audit
SELECT 'Customer' AS source, id, name
FROM customers
WHERE status = 'active'
UNION ALL
SELECT 'Employee' AS source, emp_id AS id, emp_name AS name
FROM employees
WHERE emp_status = 'current';If you later discover some names appear in both tables and you only want unique rows, wrap the whole query in a CTE and apply DISTINCT:
WITH combined AS (
… previous UNION ALL query …
)
SELECT DISTINCT source, id, name
FROM combined;INTERSECT – Finding Common Rows
INTERSECT returns only rows that exist in both query results. Like UNION, it removes duplicates automatically.
Use Cases for Data Analysts
- Identify customers who are also suppliers.
- Find overlapping time periods in two event logs.
- Validate data migration by comparing source and target tables.
Performance Tips
- Index both sides – if the columns involved are indexed, the optimiser can use a hash join style intersection, which is far faster than sorting.
- Limit early – applying
WHEREclauses before theINTERSECTreduces the amount of data the engine must compare. - In PostgreSQL 17,
INTERSECTnow benefits from parallel hash execution for large sets, offering up to 2× speed‑up on 8‑core machines.
Example (SQL Server)
SELECT customer_id
FROM dbo.customers
WHERE country = 'UK'
INTERSECT
SELECT customer_id
FROM dbo.purchases
WHERE purchase_date >= '2024-01-01';The result is the list of UK customers who made a purchase in 2024.
EXCEPT (or MINUS) – Excluding Rows
EXCEPT (or MINUS in Oracle) returns rows from the first query that do not appear in the second query. It is essentially the set‑theoretic difference operation.
Typical Scenarios
- Find records that need cleaning – e.g., orders without a matching invoice.
- Identify “orphan” rows after a data migration.
- Generate a “missing data” report for a data quality audit.
Example (MySQL 8.4)
SELECT order_id, customer_id
FROM orders
WHERE order_date >= '2024-01-01'
EXCEPT -- MySQL 8.4 supports EXCEPT directly
SELECT order_id, customer_id
FROM invoices
WHERE invoice_date IS NOT NULL;Only orders without an invoice will be returned.
Performance Considerations
- Like
UNION,EXCEPTremoves duplicates, so it incurs a sorting/aggregation cost. - Ensure the second query is as selective as possible – filtering early reduces the work needed to compare rows.
- In Oracle 23c,
MINUSnow supports parallel execution for large data sets, dramatically reducing runtime on multi‑CPU systems.
Cross‑Database Syntax Differences
| Feature | PostgreSQL | SQL Server | Oracle | MySQL |
|---|---|---|---|---|
UNION ALL |
✅ | ✅ | ✅ | ✅ |
INTERSECT |
✅ (since 9.5) | ✅ | ✅ | ✅ (8.4) |
EXCEPT |
✅ | ✅ | ❌ (uses MINUS) |
✅ (8.4) |
MINUS |
❌ (use EXCEPT) |
❌ (use EXCEPT) |
✅ | ❌ |
| Bracketed precedence | ✅ | ✅ | ✅ | ✅ |
ORDER BY placement |
Only after final SELECT | Same | Same | Same |
Tip: When writing portable code, favour UNION ALL and EXCEPT (or MINUS with a conditional compile) and avoid INTERSECT if you need to support older MySQL versions (< 8.0).
Performance Optimisation Tips for Data Analysts
- Prefer
UNION ALLwhen possible – only switch toUNIONif you truly need duplicate removal. - Push predicates down – filter rows in each sub‑query before the set operator.
- Use CTEs for readability – but be aware that in PostgreSQL 17 and SQL Server 2022, CTEs are inlined, so they do not add overhead.
- Leverage indexes – especially for
INTERSECTandEXCEPT. An index on the columns used for comparison speeds up the hash‑based implementation. - Parallelise – modern engines (PostgreSQL 17, Oracle 23c, SQL Server 2022) automatically parallelise set operations when the estimated row count exceeds a threshold (default ~100 k rows). Ensure
max_parallel_workers_per_gather(Postgres) orMAXDOP(SQL Server) is configured for your hardware. - Avoid unnecessary
ORDER BY– ordering after a set operation forces an extra sort. If you only need ordered output for presentation, applyORDER BYin the client layer or a finalSELECTthat wraps the set operation. - Materialise large intermediate results – for very large unions, consider inserting the first query into a temporary table, then appending the second with
INSERT … SELECT. This can reduce memory pressure and allow you to create indexes on the temp table before the finalSELECT.
Practical Use Cases for Data Analysts
| Business Question | Set Operator | Example Query |
|---|---|---|
| Which products sold in Q1 were never sold in Q2? | EXCEPT |
SELECT product_id FROM sales_q1 EXCEPT SELECT product_id FROM sales_q2; |
| List all unique customers across three regions. | UNION |
SELECT customer_id FROM region_a UNION SELECT customer_id FROM region_b UNION SELECT customer_id FROM region_c; |
| Find customers who appear in both the loyalty program and the newsletter list. | INTERSECT |
SELECT email FROM loyalty_program INTERSECT SELECT email FROM newsletter_subscribers; |
| Create a master list of transactions, preserving duplicates for audit. | UNION ALL |
SELECT * FROM online_transactions UNION ALL SELECT * FROM store_transactions; |
These patterns save time compared with writing complex joins or multiple sub‑queries.
Common Pitfalls and How to Avoid Them
| Pitfall | Symptom | Fix |
|---|---|---|
| Mismatched column types | “ERROR: column 1 is of type integer but expression is of type text” | Cast explicitly: SELECT CAST(col AS TEXT) … |
Forgetting ORDER BY placement |
“ORDER BY clause not allowed here” | Place ORDER BY after the final SELECT, not inside each sub‑query. |
Unintended duplicates with UNION ALL |
Duplicate rows appear in report | Use DISTINCT or switch to UNION if duplicates must be removed. |
Performance hit on large EXCEPT |
Query runs for minutes on 10 M rows | Add selective WHERE clauses, ensure indexes, or break into temp table steps. |
Using MINUS in SQL Server |
“Incorrect syntax near ‘MINUS’” | Replace with EXCEPT or use a compatibility shim. |
Conclusion
Set operators—UNION, INTERSECT and EXCEPT (or MINUS)—are indispensable tools for data analysts who need to merge, compare, or exclude result sets without resorting to complex joins. Understanding how each operator works, the performance implications of duplicate removal, and the subtle syntax differences across PostgreSQL, SQL Server, Oracle and MySQL will enable you to write cleaner, faster queries.
Remember these key take‑aways:
- Prefer
UNION ALLfor speed; only useUNIONwhen duplicates truly matter. - Push filters down and index the comparison columns for
INTERSECTandEXCEPT. - Leverage modern engine features such as parallel hash execution (PostgreSQL 17, Oracle 23c) to handle large data sets efficiently.
- Test on real data – performance can vary dramatically between databases and data volumes.
Armed with this knowledge, you can now demystify set operations and apply them confidently in your analytical workflows. Happy querying!