~/articles/sql-user-defined-functions-udf.md

type: SQL read_time: 8 min words: 1481

Creating and Using SQL User‑Defined Functions (UDFs): A Practical Guide for Data Analysts

// Learn how to create, optimise and maintain scalar, table‑valued and inline UDFs in SQL Server, PostgreSQL and MySQL – with real‑world examples and performance tips.

Introduction

User‑Defined Functions (UDFs) let you encapsulate reusable logic inside the database engine. For data analysts, UDFs are a powerful way to:

Standardise calculations – e.g. tax, discount or risk scores – across multiple reports.
Simplify complex queries – by moving repetitive expressions into a single function call.
Boost performance – when used correctly, especially with modern optimisation features such as SQL Server’s scalar‑UDF inlining.

This guide walks you through the lifecycle of a UDF: design, creation, testing, optimisation, and maintenance. We cover the three most popular relational engines used by UK analysts – SQL Server, PostgreSQL and MySQL – and highlight best‑practice patterns that keep your data pipelines fast and reliable.

1. Types of SQL UDFs and When to Use Them

Engine	Function type	Typical return	When it shines
SQL Server	Scalar	Single value (int, decimal, varchar…)	Simple calculations, formatters, look‑ups
	Inline Table‑Valued (iTVF)	Result set (like a view)	Parameterised row‑set logic, reusable joins
	Multistatement TVF	Result set	Complex set‑based processing that needs variables
PostgreSQL	Scalar	Single value	Lightweight expressions, immutable logic
	Set‑Returning (SRF)	Rows or columns	Parameterised table logic, custom aggregations
MySQL	Stored Function (scalar)	Single value	Simple deterministic logic; MySQL does not support table‑valued functions

Rule of thumb: Prefer scalar functions for pure calculations and inline TVFs / SRFs for set‑based logic. Avoid multistatement TVFs unless you truly need procedural steps, as they often incur extra overhead.

2. Creating a UDF – Step‑by‑Step Examples

2.1 SQL Server – Scalar UDF (Discount Price)

CREATE OR ALTER FUNCTION dbo.udf_discount_price
(
    @price   DECIMAL(12,2),
    @discount DECIMAL(5,2)   -- e.g. 0.15 for 15%
)
RETURNS DECIMAL(12,2)
WITH INLINE = ON      -- encourages inlining (SQL Server 2019+)
AS
BEGIN
    RETURN @price * (1 - @discount);
END;
GO

Usage

SELECT OrderID,
       dbo.udf_discount_price(TotalAmount, 0.10) AS DiscountedTotal
FROM Sales.Orders;

2.2 PostgreSQL – Immutable Scalar Function (VAT)

CREATE OR REPLACE FUNCTION public.calc_vat(amount NUMERIC, rate NUMERIC DEFAULT 0.20)
RETURNS NUMERIC
LANGUAGE sql
IMMUTABLE
AS $$
    SELECT amount * rate;
$$;

Usage

SELECT order_id,
       calc_vat(total_price) AS vat_amount
FROM sales.orders;

2.3 MySQL – Stored Function (Age from DOB)

DELIMITER //

CREATE FUNCTION fn_age(dob DATE)
RETURNS INT
DETERMINISTIC
READS SQL DATA
BEGIN
    RETURN TIMESTAMPDIFF(YEAR, dob, CURDATE());
END //

DELIMITER ;

Usage

SELECT employee_id,
       fn_age(date_of_birth) AS age
FROM hr.employees;

3. Performance Optimisation – Getting the Most Out of UDFs

3.1 Scalar‑UDF Inlining (SQL Server)

Since SQL Server 2019, the engine can inline a scalar UDF, replacing the function call with its underlying expression. This eliminates the per‑row context switch that historically made scalar UDFs up to 30× slower.

How to ensure inlining:

Set database compatibility level 150 or higher

ALTER DATABASE YourDB SET COMPATIBILITY_LEVEL = 150;

Use WITH INLINE = ON (default if eligible).
Keep the function deterministic and free of side‑effects – avoid GETDATE(), NEWID(), or table variables.

Check eligibility

SELECT name, is_inlineable
FROM sys.sql_modules m
JOIN sys.objects o ON m.object_id = o.object_id
WHERE o.type = 'FN';

If the plan still shows a UserDefinedFunction operator, the function failed the eligibility checks (e.g., uses a non‑inlineable construct such as a SELECT … ORDER BY without TOP 1).

3.2 Inline Table‑Valued Functions vs Multistatement TVFs

iTVFs are expanded at compile time, allowing the optimiser to push predicates and join order.
Multistatement TVFs materialise the result set into a hidden worktable, preventing predicate push‑down.

Best practice: Use iTVFs wherever possible; fall back to multistatement TVFs only when you need procedural logic that cannot be expressed in a single SELECT.

3.3 PostgreSQL – Mark Functions as `IMMUTABLE` or `STABLE`

IMMUTABLE – function result depends only on input parameters. The planner can pre‑compute results for constant arguments and cache them.
STABLE – result may change within a transaction (e.g., now()), but not across rows.

Tagging functions correctly enables expression indexes and query folding, dramatically improving performance.

CREATE OR REPLACE FUNCTION public.is_prime(n INT)
RETURNS BOOLEAN
LANGUAGE plpgsql
IMMUTABLE
AS $$
BEGIN
    IF n < 2 THEN RETURN FALSE; END IF;
    FOR i IN 2..sqrt(n)::INT LOOP
        IF n % i = 0 THEN RETURN FALSE; END IF;
    END LOOP;
    RETURN TRUE;
END;
$$;

3.4 MySQL – Avoid Heavy Logic in Stored Functions

MySQL evaluates stored functions row‑by‑row and cannot inline them. To keep queries fast:

Keep functions deterministic and simple.
Prefer generated columns for common calculations (they are materialised once per row).
Use derived tables or CTEs for set‑based logic instead of wrapping it in a scalar function.

ALTER TABLE hr.employees
ADD COLUMN age INT GENERATED ALWAYS AS (TIMESTAMPDIFF(YEAR, date_of_birth, CURDATE())) STORED;

Now you can query age directly without the function call overhead.

4. Testing and Validation

Unit‑test with a framework – e.g., tSQLt for SQL Server, pgTAP for PostgreSQL, or MySQLUnit.
Edge‑case handling – ensure functions behave correctly for NULL, division‑by‑zero, and extreme values.
Performance regression – capture execution time before and after changes using SET STATISTICS TIME ON (SQL Server) or EXPLAIN ANALYZE (PostgreSQL/MySQL).

Example (SQL Server)

SET STATISTICS TIME ON;
SELECT dbo.udf_discount_price(1000, 0.15);
SET STATISTICS TIME OFF;

Record the CPU and elapsed times; compare against the inline version to verify the optimisation.

5. Maintenance – Keeping UDFs Healthy

Activity	Frequency	Tips
Documentation	On creation & any change	Store purpose, parameters, return type, and version in a comment block.
Dependency tracking	Quarterly	Use `sys.sql_expression_dependencies` (SQL Server) or `pg_depend` (PostgreSQL) to locate objects that call each function.
Security review	Annually	Ensure functions run with `EXECUTE AS CALLER` unless elevated rights are required.
Deprecation	When redesigning schema	Mark old functions as `DEPRECATED` in comments and provide a migration path.

6. Real‑World Case Study: Reducing Query Time by 92 % with Scalar‑UDF Inlining

Scenario: A retail analyst team ran a nightly sales‑summary query that called a scalar UDF dbo.fn_calc_margin(price, cost) for 12 million rows.
Baseline: 28 minutes (average CPU 85 %).
Actions:
1. Refactored the function to a single expression and added WITH INLINE = ON.
2. Set database compatibility level to 150.
3. Verified is_inlineable = 1.
Result: Query time dropped to 2 minutes 15 seconds – a 92 % reduction. Execution plan showed the UDF operator removed and predicates pushed down into the scan.

The same logic, when ported to PostgreSQL as an IMMUTABLE function, achieved a 68 % reduction thanks to expression‑index utilisation.

7. Common Pitfalls and How to Avoid Them

Pitfall	Symptom	Fix
Using non‑deterministic functions inside a scalar UDF (e.g., `GETDATE()`)	Inlining blocked, plan contains `UserDefinedFunction` node	Move the non‑deterministic call outside the function or pass the value as a parameter.
Returning `TABLE` from a scalar function	Compilation error	Use a Table‑Valued Function instead.
Calling a scalar UDF in `WHERE` clause with a high‑cardinality column	Full table scan, poor performance	Rewrite logic as a computed column or an iTVF that can be filtered early.
Embedding `SELECT … INTO #temp` inside a TVF	Not allowed – TVFs cannot modify temp tables	Replace with a sub‑query or use a stored procedure if you need side‑effects.
Neglecting `NULL` handling	Unexpected `NULL` results, breaking downstream calculations	Explicitly test for `NULL` and decide whether to return `NULL` or a default value.

8. Best‑Practice Checklist (SQL‑agnostic)

Keep functions deterministic unless absolutely required.
Limit to a single return statement for scalar functions.
Prefer inline TVFs / SRFs for set‑based logic.
Add proper security context (EXECUTE AS CALLER).
Document input parameters, return type, and side‑effects.
Mark functions as IMMUTABLE / STABLE (PostgreSQL) or WITH INLINE = ON (SQL Server).
Test with realistic data volumes and capture execution plans.
Monitor is_inlineable (SQL Server) or pg_get_functiondef() (PostgreSQL) after schema changes.

Conclusion

User‑Defined Functions are a cornerstone of clean, maintainable SQL code for data analysts. When crafted with performance in mind—using scalar‑UDF inlining, immutable tagging, and inline table‑valued patterns—you can turn heavyweight, row‑by‑row logic into set‑based, optimiser‑friendly queries.

By following the design, testing, and maintenance steps outlined above, you’ll deliver faster reports, reduce duplicated logic, and keep your analytical pipelines robust across SQL Server, PostgreSQL and MySQL.

Happy coding, and may your queries always be fast and your data clean!

Introduction

1. Types of SQL UDFs and When to Use Them

2. Creating a UDF – Step‑by‑Step Examples

2.1 SQL Server – Scalar UDF (Discount Price)

2.2 PostgreSQL – Immutable Scalar Function (VAT)

2.3 MySQL – Stored Function (Age from DOB)

3. Performance Optimisation – Getting the Most Out of UDFs

3.1 Scalar‑UDF Inlining (SQL Server)

3.2 Inline Table‑Valued Functions vs Multistatement TVFs

3.3 PostgreSQL – Mark Functions as IMMUTABLE or STABLE

3.4 MySQL – Avoid Heavy Logic in Stored Functions

4. Testing and Validation

5. Maintenance – Keeping UDFs Healthy

6. Real‑World Case Study: Reducing Query Time by 92 % with Scalar‑UDF Inlining

7. Common Pitfalls and How to Avoid Them

8. Best‑Practice Checklist (SQL‑agnostic)

Conclusion

3.3 PostgreSQL – Mark Functions as `IMMUTABLE` or `STABLE`

6. Real‑World Case Study: Reducing Query Time by 92 % with Scalar‑UDF Inlining