Query OptimizationFunctions
16
Copyright
© Postgres Professional, 2019–2024
Authors Authors: Egor Rogov, Pavel Luzanov, Ilya Bashtanov
Photo by: Oleg Bartunov (Phu monastery, Bhrikuti summit, Nepal)
Use of course materials
Non-commercial use of course materials (presentations, demonstrations) is
allowed without restrictions. Commercial use is possible only with the written
permission of Postgres Professional. It is prohibited to make changes to the
course materials.
Feedback
Please send your feedback, comments and suggestions to:
edu@postgrespro.ru
Disclaimer
In no event shall Postgres Professional company be liable for any damages
or loss, including loss of profits, that arise from direct or indirect, special or
incidental use of course materials. Postgres Professional company
specifically disclaims any warranties on course materials. Course materials
are provided “as is,” and Postgres Professional company has no obligations
to provide maintenance, support, updates, enhancements, or modifications.
2
Topics
Volatility сategories
Inlining function code into the query
Invoking Table Functions
Settings for COST and ROWS
Planner Helper Functions
Configuration Parameters
3
Volatility сategories
Volatile
may return different values for the same input arguments
is used by default
Stable
the value cannot change within a single SQL operator
the function cannot change the database state
Immutable
the value cannot change, the function is deterministic
the function cannot change the database state
Each function is mapped to a particular volatility category, which defines the
properties of the return value for the same input arguments.
The Volatile category indicates that the return value may vary arbitrarily.
Such functions will be executed each time they are called. If the function is
declared without a category specification, it is assumed to be volatile.
The Stable category is used for functions whose return value remains
constant during a single SQL statement. In particular, such functions cannot
change the state of the database. It could execute such a function only once
during the query and then use the computed value.
The immutable category is even more strict: the return value always
remains the same.
Such a function could be executed at the planning stage, before the query
is actually executed.
It does not mean that it happens so all the time, but the planner has the right
to perform such optimizations.
5
SQL Scalar Functions
a single SELECT statement a SELECT statement without a FROM clause,
returns a single value
Called functions should not be more volatile than the calling function.
etc.
Set-Returning Functions
a single SELECT statement
Immutable or Stable category
Non-strict function
etc.
Inlining of Function Code
The PostgreSQL optimizer can inline the function body into an SQL query.
This applies to both scalar and set-returning functions.
In both cases, there are several limitations: the function must be written in
SQL, use a single SELECT statement, and so on. Additionally, scalar
functions must not access database tables or call functions with a less strict
category, while set-returning functions must be stable or immutable.
A key benefit of inlining the function body into the query is that the function
becomes transparent to the query planner. For instance, additional
conditions in the main query can be applied to the query within the function's
body, enabling early filtering of unnecessary rows.
7
Invoking Table Functions
Function Scan
function fun()
Join
Scan Scan
ProjectSet
SELECT * FROM fun()
SELECT fun()
On demand:
SQL, C, and iterators
Python
When a set-returning function is called in the FROM clause, the Function
Scan node is responsible for its execution. In this case, the function's
returned rows are materialized first and then passed to the parent plan node.
This is the current implementation limitation.
If the function is called in the SELECT clause, it is executed by the
ProjectSet node in the query plan. In this case, the function can take
advantage of the on-demand row return interface (value-per-call) without
materialization.
SQL functions, most built-in functions (written in C), and PL/Python functions
that return an iterator operate in this manner. Functions written in other
programming languages can also use this interface if the feature is
implemented.
9
Settings for COST and
ROWS
CREATE FUNCTION fun( )
SELECT * FROM fun()
CREATE FUNCTION fun( ) ROWS 12 COST 123
SELECT * FROM fun()
Function Scan ( rows=1000 cost=100 )
Function Scan ( rows=12 cost=123 )
ROWS
оценки
by default
COST
Usually (when the function body cannot be integrated into the query), the
optimizer can't analyze the function code and treats it as a "black box".
However, you can provide the optimizer with approximate information about
the cost of calling the function and the number of rows it returns.
The COST parameter defines the cost of a user-defined function in units of
cpu_operator_cost. By default, C functions are given a cost estimate of 1,
whereas functions in other languages are estimated at 100.
The ROWS parameter specifies the approximate number of rows returned.
COST and ROWS can be set when creating a function or for existing
functions.
11
Helper Functions
CREATE FUNCTION fun(x) ROWS 18
SELECT * FROM fun(1) SELECT * FROM fun(2)
CREATE FUNCTION fun(x) SUPPORT fun_support
SELECT * FROM fun(1) SELECT * FROM fun(2)
Function Scan ( rows=18 )
ROWS
Function Scan ( rows=18 )
Function Scan ( rows=12 )
fun_support( 1 )
Function Scan ( rows=24 )
fun_support( 2 )
The COST and ROWS parameters enable you to specify the cost and the
number of rows returned by the function as fixed values.But constants don't
always yield the intended result.
In PostgreSQL, a function can use a helper function that provides the
planner with information based on the arguments of the main (target)
function.
The helper function can provide information using the values of the target
function's arguments:
cost estimation
estimate of the number of rows returned
an expression equivalent to the function call
Additionally, for functions returning boolean:
selectivity estimate
an equivalent predicate using an indexable operator
The auxiliary function must be implemented in C.
13
CREATE FUNCTION fun( ) PARALLEL
Parallelism Annotations
SAFE
RESTRICTED
UNSAFE
Parallel-safe
partially parallelizable
Parallel-safe
The topic of "Parallel Processing" explained that not all queries can be run in
parallel. Since the optimizer can't analyze the function body, it uses
parallelism annotations to determine if parallel processing is feasible.
When defining a function (or later), you can choose one of three
annotations:
UNSAFE — prohibits parallel execution plans when the query includes a
function call;
RESTRICTED — allows parallel execution plans, but prohibits function
calls in the parallel section of the plan;
SAFE — safe for parallel execution
The UNSAFE annotation is the default setting.
Parallelism annotations are also applied to user-defined aggregate
functions.
15
Takeaways
A function is a black box for the planner if its body is not inlined
into the query.
The invocation of set-returning functions is typically
materialized.
The planner can be provided with additional information.
function variability category
Cardinality and cost
parallelism flag
Helper functions aid in optimizing invocations of built-in
functions.
16
Practice
1. Disable the materialization of a common table expression that
uses the random function. Provide an explanation for the result.
2. Write a SQL wrapper function for the querySELECT * FROM
generate_series (1, 10_000_000).
Consider three scenarios: the original query and function calls in
the FROM and SELECT clauses. Compare the query execution
plans and the use of temporary files.
What changes when the volatility category is set to Stable?
3. What volatility category does the days_of_week function in the
demonstration have? What is its actual volatility?
3. 3. Function Definition:
CREATE FUNCTION days_of_week() RETURNS SETOF textAS $$BEGIN
FOR i IN 7 .. 13 LOOP RETURN NEXT
to_char(to_date(i::text,'J'),'TMDy'); END LOOP;END;$$
LANGUAGE plpgsql; 13 LOOP
RETURN NEXT to_char(to_date(i::text,'J'),'TMDy');
END LOOP;
END;
$$ LANGUAGE plpgsql;