Creating and Using SQL User‑Defined Functions (UDFs): A Practical Guide for Data Analysts
// Learn how to create, optimise and maintain scalar, table‑valued and inline UDFs in SQL Server, PostgreSQL and MySQL – with real‑world examples and performance tips.
Introduction
User‑Defined Functions (UDFs) let you encapsulate reusable logic inside the database engine. For data analysts, UDFs are a powerful way to:
- Standardise calculations – e.g. tax, discount or risk scores – across multiple reports.
- Simplify complex queries – by moving repetitive expressions into a single function call.
- Boost performance – when used correctly, especially with modern optimisation features such as SQL Server’s scalar‑UDF inlining.
This guide walks you through the lifecycle of a UDF: design, creation, testing, optimisation, and maintenance. We cover the three most popular relational engines used by UK analysts – SQL Server, PostgreSQL and MySQL – and highlight best‑practice patterns that keep your data pipelines fast and reliable.
1. Types of SQL UDFs and When to Use Them
| Engine | Function type | Typical return | When it shines |
|---|---|---|---|
| SQL Server | Scalar | Single value (int, decimal, varchar…) | Simple calculations, formatters, look‑ups |
| Inline Table‑Valued (iTVF) | Result set (like a view) | Parameterised row‑set logic, reusable joins | |
| Multistatement TVF | Result set | Complex set‑based processing that needs variables | |
| PostgreSQL | Scalar | Single value | Lightweight expressions, immutable logic |
| Set‑Returning (SRF) | Rows or columns | Parameterised table logic, custom aggregations | |
| MySQL | Stored Function (scalar) | Single value | Simple deterministic logic; MySQL does not support table‑valued functions |
Rule of thumb: Prefer scalar functions for pure calculations and inline TVFs / SRFs for set‑based logic. Avoid multistatement TVFs unless you truly need procedural steps, as they often incur extra overhead.
2. Creating a UDF – Step‑by‑Step Examples
2.1 SQL Server – Scalar UDF (Discount Price)
CREATE OR ALTER FUNCTION dbo.udf_discount_price
(
@price DECIMAL(12,2),
@discount DECIMAL(5,2) -- e.g. 0.15 for 15%
)
RETURNS DECIMAL(12,2)
WITH INLINE = ON -- encourages inlining (SQL Server 2019+)
AS
BEGIN
RETURN @price * (1 - @discount);
END;
GOUsage
SELECT OrderID,
dbo.udf_discount_price(TotalAmount, 0.10) AS DiscountedTotal
FROM Sales.Orders;2.2 PostgreSQL – Immutable Scalar Function (VAT)
CREATE OR REPLACE FUNCTION public.calc_vat(amount NUMERIC, rate NUMERIC DEFAULT 0.20)
RETURNS NUMERIC
LANGUAGE sql
IMMUTABLE
AS $$
SELECT amount * rate;
$$;Usage
SELECT order_id,
calc_vat(total_price) AS vat_amount
FROM sales.orders;2.3 MySQL – Stored Function (Age from DOB)
DELIMITER //
CREATE FUNCTION fn_age(dob DATE)
RETURNS INT
DETERMINISTIC
READS SQL DATA
BEGIN
RETURN TIMESTAMPDIFF(YEAR, dob, CURDATE());
END //
DELIMITER ;Usage
SELECT employee_id,
fn_age(date_of_birth) AS age
FROM hr.employees;3. Performance Optimisation – Getting the Most Out of UDFs
3.1 Scalar‑UDF Inlining (SQL Server)
Since SQL Server 2019, the engine can inline a scalar UDF, replacing the function call with its underlying expression. This eliminates the per‑row context switch that historically made scalar UDFs up to 30× slower.
How to ensure inlining:
- Set database compatibility level 150 or higher
ALTER DATABASE YourDB SET COMPATIBILITY_LEVEL = 150; - Use
WITH INLINE = ON(default if eligible). - Keep the function deterministic and free of side‑effects – avoid
GETDATE(),NEWID(), or table variables. - Check eligibility
SELECT name, is_inlineable FROM sys.sql_modules m JOIN sys.objects o ON m.object_id = o.object_id WHERE o.type = 'FN';
If the plan still shows a UserDefinedFunction operator, the function failed the eligibility checks (e.g., uses a non‑inlineable construct such as a SELECT … ORDER BY without TOP 1).
3.2 Inline Table‑Valued Functions vs Multistatement TVFs
- iTVFs are expanded at compile time, allowing the optimiser to push predicates and join order.
- Multistatement TVFs materialise the result set into a hidden worktable, preventing predicate push‑down.
Best practice: Use iTVFs wherever possible; fall back to multistatement TVFs only when you need procedural logic that cannot be expressed in a single SELECT.
3.3 PostgreSQL – Mark Functions as IMMUTABLE or STABLE
- IMMUTABLE – function result depends only on input parameters. The planner can pre‑compute results for constant arguments and cache them.
- STABLE – result may change within a transaction (e.g.,
now()), but not across rows.
Tagging functions correctly enables expression indexes and query folding, dramatically improving performance.
CREATE OR REPLACE FUNCTION public.is_prime(n INT)
RETURNS BOOLEAN
LANGUAGE plpgsql
IMMUTABLE
AS $$
BEGIN
IF n < 2 THEN RETURN FALSE; END IF;
FOR i IN 2..sqrt(n)::INT LOOP
IF n % i = 0 THEN RETURN FALSE; END IF;
END LOOP;
RETURN TRUE;
END;
$$;3.4 MySQL – Avoid Heavy Logic in Stored Functions
MySQL evaluates stored functions row‑by‑row and cannot inline them. To keep queries fast:
- Keep functions deterministic and simple.
- Prefer generated columns for common calculations (they are materialised once per row).
- Use derived tables or CTEs for set‑based logic instead of wrapping it in a scalar function.
ALTER TABLE hr.employees
ADD COLUMN age INT GENERATED ALWAYS AS (TIMESTAMPDIFF(YEAR, date_of_birth, CURDATE())) STORED;Now you can query age directly without the function call overhead.
4. Testing and Validation
- Unit‑test with a framework – e.g.,
tSQLtfor SQL Server,pgTAPfor PostgreSQL, orMySQLUnit. - Edge‑case handling – ensure functions behave correctly for
NULL, division‑by‑zero, and extreme values. - Performance regression – capture execution time before and after changes using
SET STATISTICS TIME ON(SQL Server) orEXPLAIN ANALYZE(PostgreSQL/MySQL).
Example (SQL Server)
SET STATISTICS TIME ON;
SELECT dbo.udf_discount_price(1000, 0.15);
SET STATISTICS TIME OFF;Record the CPU and elapsed times; compare against the inline version to verify the optimisation.
5. Maintenance – Keeping UDFs Healthy
| Activity | Frequency | Tips |
|---|---|---|
| Documentation | On creation & any change | Store purpose, parameters, return type, and version in a comment block. |
| Dependency tracking | Quarterly | Use sys.sql_expression_dependencies (SQL Server) or pg_depend (PostgreSQL) to locate objects that call each function. |
| Security review | Annually | Ensure functions run with EXECUTE AS CALLER unless elevated rights are required. |
| Deprecation | When redesigning schema | Mark old functions as DEPRECATED in comments and provide a migration path. |
6. Real‑World Case Study: Reducing Query Time by 92 % with Scalar‑UDF Inlining
- Scenario: A retail analyst team ran a nightly sales‑summary query that called a scalar UDF
dbo.fn_calc_margin(price, cost)for 12 million rows. - Baseline: 28 minutes (average CPU 85 %).
- Actions:
- Refactored the function to a single expression and added
WITH INLINE = ON. - Set database compatibility level to 150.
- Verified
is_inlineable = 1.
- Refactored the function to a single expression and added
- Result: Query time dropped to 2 minutes 15 seconds – a 92 % reduction. Execution plan showed the UDF operator removed and predicates pushed down into the scan.
The same logic, when ported to PostgreSQL as an IMMUTABLE function, achieved a 68 % reduction thanks to expression‑index utilisation.
7. Common Pitfalls and How to Avoid Them
| Pitfall | Symptom | Fix |
|---|---|---|
Using non‑deterministic functions inside a scalar UDF (e.g., GETDATE()) |
Inlining blocked, plan contains UserDefinedFunction node |
Move the non‑deterministic call outside the function or pass the value as a parameter. |
Returning TABLE from a scalar function |
Compilation error | Use a Table‑Valued Function instead. |
Calling a scalar UDF in WHERE clause with a high‑cardinality column |
Full table scan, poor performance | Rewrite logic as a computed column or an iTVF that can be filtered early. |
Embedding SELECT … INTO #temp inside a TVF |
Not allowed – TVFs cannot modify temp tables | Replace with a sub‑query or use a stored procedure if you need side‑effects. |
Neglecting NULL handling |
Unexpected NULL results, breaking downstream calculations |
Explicitly test for NULL and decide whether to return NULL or a default value. |
8. Best‑Practice Checklist (SQL‑agnostic)
- Keep functions deterministic unless absolutely required.
- Limit to a single return statement for scalar functions.
- Prefer inline TVFs / SRFs for set‑based logic.
- Add proper security context (
EXECUTE AS CALLER). - Document input parameters, return type, and side‑effects.
- Mark functions as
IMMUTABLE/STABLE(PostgreSQL) orWITH INLINE = ON(SQL Server). - Test with realistic data volumes and capture execution plans.
- Monitor
is_inlineable(SQL Server) orpg_get_functiondef()(PostgreSQL) after schema changes.
Conclusion
User‑Defined Functions are a cornerstone of clean, maintainable SQL code for data analysts. When crafted with performance in mind—using scalar‑UDF inlining, immutable tagging, and inline table‑valued patterns—you can turn heavyweight, row‑by‑row logic into set‑based, optimiser‑friendly queries.
By following the design, testing, and maintenance steps outlined above, you’ll deliver faster reports, reduce duplicated logic, and keep your analytical pipelines robust across SQL Server, PostgreSQL and MySQL.
Happy coding, and may your queries always be fast and your data clean!