Query ExecutionPlanning and Execution
16
Copyright
© Postgres Professional, 2019–2024
Authors Authors: Egor Rogov, Pavel Luzanov, Ilya Bashtanov
Photo by: Oleg Bartunov (Phu monastery, Bhrikuti summit, Nepal)
Use of course materials
Non-commercial use of course materials (presentations, demonstrations) is
allowed without restrictions. Commercial use is possible only with the written
permission of Postgres Professional. It is prohibited to make changes to the
course materials.
Feedback
Please send your feedback, comments and suggestions to:
edu@postgrespro.com
Disclaimer
In no event shall Postgres Professional company be liable for any damages
or loss, including loss of profits, that arise from direct or indirect, special or
incidental use of course materials. Postgres Professional company
specifically disclaims any warranties on course materials. Course materials
are provided “as is,” and Postgres Professional company has no obligations
to provide maintenance, support, updates, enhancements, or modifications.
2
Topics
Common Approaches to Optimization
Simple Protocol and Query Processing Phases
Extended Protocol
More on planning
3
Optimization Strategies
Parameter Configuration
Tuning for the current workload
Global impact on the entire system
monitoring
Query performance tuning
Reducing workload
Localized impact (a single query or multiple queries)
Profiling
This course focuses on query optimization. Generally speaking, optimization
is a broad concept; it's important to consider it during the system design and
architecture selection stage. We will only discuss the tasks executed during
the operation of an existing application.
Two main approaches can be identified. The first approach is about
monitoring the system's state and making sure it can handle the existing
workload. To achieve this, you can tune DBMS parameters (the key ones
covered in the DBA2 course and partially in this course) and also configure
the operating system. If the settings don't help, the only remaining option
with this approach is to upgrade hardware (which doesn't always work).
Another approach, which we will mainly discuss next, involves not adapting
to the workload but rather reducing it. The "productive" workload consists of
queries. If a bottleneck can be identified, you can attempt to influence the
query's execution in some way to achieve the same result with fewer
resources. This approach is more targeted (affecting specific queries or a
group of queries), but reducing the workload positively impacts the entire
system's performance.
We'll begin by exploring query execution mechanisms in depth, then move
on to discussing how to detect inefficient operations and practical
approaches for optimization.
4
Phases of Query Execution
Parsing
Query Rewriting (Transformation)
Query Planning (Optimization)
Execution
We'll start by examining how a query is executed in a straightforward
scenario—for instance, when you issue a SELECT command in psql.
5
Parsing
text
query
tree
query
parsing
(parse)
system
catalog
The processing of a regular query is carried out in multiple stages.
First, the query is parsed (parse).
The first step is syntax analysis, where the query text is converted into a tree
structure — this makes it easier to work with.
Next, semantic analysis (parse analysis) occurs, during which the query's
referenced database objects are identified, and the user's access to them is
verified (the parser consults the system catalog for this).
6
Syntax Parsing
RTE
pg_tables
OPEXPR
tableowner = 'postgres'
QUERY
FROMEXPRTARGETENTRY SORTGROUPCLAUSE
SELECT schemaname, tablename
FROM pg_tables
WHERE tableowner = 'postgres'
ORDER BY tablename;
Let's take a look at a simple example: the query shown on the slide.
During the parsing phase, a tree is constructed in the backend process's
memory, as illustrated in the simplified diagram below the query. Color
indicates the approximate mapping between parts of the query text and tree
nodes.
RTE is a non-obvious abbreviation for Range Table Entry. In PostgreSQL,
this term refers to tables, subqueries, and join results — essentially sets of
rows that SQL can manipulate.)
For those curious, the actual parse tree can be examined by setting the
debug_print_parse parameter and checking the server log. There's no
practical value in this (unless, of course, you're a PostgreSQL kernel
developer).
7
Semantic Parsing
RTE
pg_tables
OPEXPR
tableowner = 'postgres'
QUERY
FROMEXPRTARGETENTRY SORTGROUPCLAUSE
SELECT schemaname, tablename
FROM pg_tables
WHERE tableowner = 'postgres'
ORDER BY tablename;
oid
pg_tables
control
access rights
During semantic analysis, the parser consults the system catalog to
associate the name "pg_tables" with a view that has a specific object
identifier (OID) within the system catalog. Access permissions for this view
will be verified as well.
8
Rewriting
text
query
tree
query
tree
query
parsing
(parse)
rewriting
(rewrite)
rules
Secondly, the query is rewritten or transformed (rewrite) according to the
rules>
An important special case of rewriting is substituting the query text in place
of the view name. Keep in mind that the view text must be parsed again, so
we simplify by stating that the first two stages follow each other.
9
Rewriting
RTE
pg_tables
QUERY
RTE
pg_class
RTE
pg_namespace
RTE
pg_tablespace
FROMEXPR
JOINEXPR
JOINEXPR
OPEXPR
n.oid = c.relnamespace
OPEXPR
t.oid = c.reltablespace
OPEXPR
c.relkind = 'r'::"char"
TARGETENTRY
OPEXPR
tableowner = 'postgres'
QUERY
FROMEXPRTARGETENTRY SORTGROUPCLAUSE
SELECT schemaname, tablename
FROM (
SELECT ...
FROM pg_class c
LEFT JOIN pg_namespace n ON n.oid = c.relnamespace
LEFT JOIN pg_tablespace t ON t.oid = c.reltablespace
WHERE c.relkind = 'r'::"char"
)
WHERE tableowner = 'postgres'
ORDER BY tablename;
The slide presents a query with the view's text included (this is an
abstraction: the query in this form does not actually exist — all rewriting is
performed on the query tree).
The parent node of the subtree associated with the subquery is the node
that references this view. The figure clearly shows the tree structure of the
query within this subtree.
The rewritten query tree can be seen in the server log by enabling the
debug_print_rewritten parameter.
10
Planning
text
query
tree
query
tree
query
parsing
(parse)
rewriting
(rewrite)
execution plan
query
planning
(plan)
transaction
Thirdly, the query is optimized (plan).
SQL is a declarative language, so a single query can be executed in various
ways. The planner (also known as the optimizer) considers different
execution approaches and evaluates them. The evaluation is based on a
mathematical model that uses statistics about the data being processed.
The execution approach with the lowest estimated cost is represented as an
execution plan.
11
Planning
NESTLOOP
SEQSCAN
pg_class
SEQSCAN
pg_namespace
OPEXPR
n.oid = c.relnamespace
OPEXPR
c.relkind = 'r'::"char"
pg_get_userbyid(relowner) = 'postgres'::name
TARGETENTRY
PLANNEDSTMT
SORT
TARGETENTRY
Sort (cost=19.59..19.59 rows=1 width=128)
Sort Key: c.relname
-> Nested Loop Left Join (cost=0.00..19.58 rows=1 width=128)
Join Filter: (n.oid = c.relnamespace)
-> Seq Scan on pg_class c (cost=0.00..18.44 rows=1 width=72)
Filter: ((relkind = 'r'::"char") AND
(pg_get_userbyid(relowner) = 'postgres'::name))
-> Seq Scan on pg_namespace n (cost=0.00..1.06 rows=6 width=68)
The slide shows an example of an execution plan, demonstrating how the
query is executed.
Here, the Seq Scan steps involve reading the relevant tables, while the
Nested Loop represents the method for joining two tables. The slide displays
the execution plan in the format shown by the EXPLAIN command. We will
discuss data access methods, join methods, and the EXPLAIN command in
more detail in the next sections.
For now, it's worth noting two key points:
Out of the three tables, only two remain: the planner determined that one
table wasn't necessary for the result and could be safely removed from
the execution plan.
Each node in the tree includes information about the estimated number of
rows (rows) and the cost (cost).
For those interested, the actual execution plan can be viewed by setting the
debug_print_plan parameter.
12
Execution
text
query
tree
query
tree
query
parsing
(parse)
rewriting
(rewrite)
execution plan
query
planning
(plan)
result
execution
EXECUTE
Fourthly, the query is executed (execute) according to the selected plan,
with the result returned to the client.
Setting the log_parser_stats, log_planner_stats, and log_executor_stats
parameters to 'on' enables detailed statistics for each stage to be logged.
But in practice, it's usually not required.
14
Execution
NESTLOOP
n.oid = c.relnamespace
SEQSCAN
pg_class
c.relkind = 'r' AND
pg_get_userbyid(relowner) = 'postgres'
SEQSCAN
pg_namespace
SORT
c.relname
connection
access
access to data
Pipeline
Tree traversal starting at the root
Data is transmitted upwards—either as it arrives or all at once.
Data Access
table and index reads
Joins
always in pairs
Order matters
Other operations
c.relname
n.oid = c.relnamespace
c.relkind = 'r' AND
pg_get_userbyid(relowner) = 'postgres'
During the execution phase, the execution plan tree (as shown on the slide,
it's simplified to highlight the essentials) functions like a conveyor belt.
Execution starts at the root node. tree The root node (in our case, the SORT
operation) retrieves data from the child node; once it receives the data, it
performs the sorting operation and sends the results upward (i.e., to the
client).
Some nodes (such as the NESTLOOP node) join data from different
sources. Here, the node accesses two child nodes in sequence (the join is
always performed pairwise) and, upon receiving rows from them, joins them
and sends them upward to the sort node.
The two lower nodes represent table access for data retrieval. They read
rows from the relevant tables and pass them up to the join node.
Some nodes can only return a result once they have received all data from
their child nodes. Such nodes include sorting — it cannot process an
incomplete sample. Other nodes can return data as it comes in. For
example, table scan data access can return data as it is read (enabling quick
retrieval of the initial portion of the result — such as for paginated display on
a web page).
To get a handle on execution plans, you need to understand the available
data access methods, the techniques for joining data, and examine some
other operations.
15
JIT Compilation
Compiling parts of queries into source code
Evaluation of expressions in the WHERE clause
Evaluation of expressions in the SELECT clause
Aggregates and Projections
Transformation of Tuple Versions
Moving tuple versions from disk to an unrolled representation in memory
JIT (just-in-time, "exactly at the right time") dynamic compilation is used to
compile code or its parts during program execution. This technology enables
faster execution of interpreted code and is used in many systems.
In PostgreSQL, JIT compilation can compile parts of the code executed
when processing SQL queries. PostgreSQL needs to be compiled with
support for LLVM.
JIT compilation is more suitable for long-running, CPU-intensive analytical
queries. For short OLTP queries, JIT compilation overhead can exceed the
queries' execution time.
JIT compilation can be influenced by configuration parameters. There are
several JIT-related optimizations that are enabled only when the query cost
exceeds the threshold value set in the relevant configuration parameters.
17
Extended Protocol
Refining the Query Processing Schema
Prepared Statements
Cursors
PostgreSQL also supports the Extended Query Protocol. In practice, this
allows for the use of prepared statements and cursors.
18
Simple Query Protocol
text
query
tree
query
tree
query
parsing
(parse)
rewriting
(rewrite)
execution plan
query
planning
(plan)
result
execution
EXECUTE
system
catalog
rules
transaction
SELECT …
The slide revisits the complete query processing workflow via the simple
protocol, which we have already discussed.
19
Extended Protocol
parameterized
parameterized
text
query
tree
query
tree
query
parsing
(parse)
rewriting
(rewrite)
private
execution plan
query
planning
(plan)
partial
result
execution
EXECUTE
execution
EXECUTE
system
catalog
rules
transaction
bonding
parameters
cursors
prepared operators
PREPARE EXECUTE
The extended protocol provides finer-grained control over query processing.
First, the query can be prepared. To do this, the client sends a query to the
server (possibly in a parameterized form), and the server parses and
rewrites it, storing the prepared query plan in the backend process's local
memory.
To execute a prepared query, the client identifies it by name and provides
specific parameter values. The server constructs a private query plan,
considering the parameter values, and executes it.
Preparation helps avoid the need for repeated parsing and rewriting of the
same query when executed multiple times within a single session.
Second, you can use cursors. The cursor mechanism allows retrieving query
results row by row instead of all at once. The information about the open
cursor is also kept in the backend process's local memory.
20
Extended Protocol
parameterized
parameterized
text
query
tree
query
tree
query
parsing
(parse)
rewriting
(rewrite)
common
execution plan
query
planning
(plan)
partial
result
execution
EXECUTE
execution
EXECUTE
system
catalog
rules
transaction
bonding
parameters
cursors
prepared operators
PREPARE
EXECUTE
If a query has no parameters, there's no need for the server to re-plan it
every time it's executed. In this case, it immediately caches a generic
execution plan. This helps save even more resources.
If the query includes parameters, the server can use a generic plan if it
determines that, on average, it performs as well as or better than specific
plans. For more information on when the switch occurs, see the "Basic
Statistics" section.
Another reason to use prepared statements is to prevent SQL injection when
the query's input data originates from an untrusted source, such as input
fields on a web form.
22
More on planning
Planning Process
Cardinality Estimation
Cost estimation
Selecting the Optimal Plan
Query planning — a critical and complex process, so we'll explore it in more
detail.
23
Planning Process
Statistics
table size and data distribution statistics
Cardinality Estimation
Condition selectivity — the proportion of rows selected.
Cardinality refers to the total number of rows
Calculation requires statistics
Cost estimation
Mainly determined by the node type and the number of rows processed
Optimizer's plan evaluation
the plan with the lowest cost is chosen
The optimizer evaluates all possible execution plans and selects the one
with the lowest cost.
When evaluating the cost of a plan node, the optimizer takes into account
the node's type (notably, the cost of reading data directly from a table differs
from that of using an index) and the amount of data processed by the node.
Other factors are regarded as less important.
Two key concepts are crucial for assessing data volume:
cardinality — the total number of rows;
<Selectivity> refers to the proportion of rows that meet the conditions
(predicates).
To evaluate selectivity and cardinality, you need to have information about
the data, such as table sizes, value distribution in columns and other factors.
Ultimately, everything relies on statistics — data gathered and maintained
through the auto-analyze process or the ANALYZE command.
If cardinality is correctly estimated, the cost is usually quite accurate. The
optimizer's primary issues stem from inaccurate cardinality estimation. This
can occur due to insufficient statistics, the inability to use them, or flaws in
the models underlying the optimizer. We'll go into more detail about this.
24
Cardinality Estimation
Access Method Cardinality
rows
A where cond
= rows
A
· sel
cond
Join Cardinality
rowsA join B on cond = rowsA · rowsB · </10>sel</11>cond</12>
NESTLOOP
n.oid = c.relnamespace
SEQSCAN
pg_class
c.relkind = 'r' AND
pg_get_userbyid(relowner) = 'postgres'
SEQSCAN
pg_namespace
SORT
c.relname
c.relname
n.oid = c.relnamespace
c.relkind = 'r' AND
pg_get_userbyid(relowner) = 'postgres'
rows = 7
filter:
(departure_airport
= $1)
rows = 1
= 0.0025
rows = 1
filter: (total_amount
= 0.1429)
rows = 1
It is convenient to view cardinality estimation as a recursive process. To
estimate the cardinality of a node, you first need to estimate the cardinalities
of its child nodes, and then calculate its cardinality based on those values
once you know the node's type.
Therefore, the first step is to calculate the cardinalities of leaf nodes
containing data access methods. To do this, we need to know the table's
size and the selectivity of the conditions applied to it. We'll discuss how
exactly this is done later.
We can note for now that it's enough to estimate the selectivity of simple
conditions, as the selectivity of conditions constructed with logical operations
can be calculated with simple formulas:
sel
x and y
= sel
x
· sel
y
; sel
x or y
= 1 – (1 – sel
x
) (1 – sel
y
)
Keep in mind that these formulas assume independence of the predicates. If
they correlate, this estimate will be inaccurate. In the case of correlated
predicates, this estimate will be inaccurate (it can be improved with
extended statistics).
Next, you can calculate the join cardinalities. We already know the
cardinalities of the joined data sets, but we still need to estimate the
selectivity of the join conditions. Let's just assume it's possible for now.
Similarly, this approach can be applied to other nodes, such as sortings or
aggregations.
Note that a cardinality calculation mistake in a lower node will propagate
upward, resulting in inaccurate cost estimation and, ultimately, a sub-optimal
plan.
25
Cost estimation
Calculated using mathematical models
cost = costA + ∑costA's child nodes
Two elements
Preparation .. retrieving all rows
NESTLOOP
n.oid = c.relnamespace
SEQSCAN
pg_class
c.relkind = 'r' AND
pg_get_userbyid(relowner) = 'postgres'
SEQSCAN
pg_namespace
SORT
c.relname
c.relname
n.oid = c.relnamespace
c.relkind = 'r' AND
pg_get_userbyid(relowner) = 'postgres'
rows = 7
filter:
(departure_airport
= $1)
filter: (total_amount
= 0.00) 1.10
rows = 1
= 0.0025
filter: (total_amount
= 0.00) 17.65
rows = 1
filter: (total_amount =
0.1429)
filter: (total_amount =
0.00) 18.86
rows = 1
filter: (total_amount =
18.87) 18.87
Now, let's look at the overall process of cost estimation. It is also inherently
recursive To calculate the cost of a subtree, you first need to compute the
costs of the child nodes and sum them, then add the node's own cost.
The cost of a node's operation is calculated using the mathematical model
built into the planner, taking into account the estimated number of rows
processed (which has already been computed).
The cost consists of two components, each evaluated separately.The first
component is the initial cost (initial cost), while the second represents the
cost of retrieving all the rows in the result set (total cost).
Some operations don't require any preparation; their initial cost is zero.
Conversely, other operations require preliminary steps. For example, the
sorting operation in the example needs to first retrieve all data from the child
node before it can begin. For these nodes, the initial cost will not be zero
this cost must be incurred regardless of how many result rows are required.
It's important to recognize that the cost represents the planner's estimate
and might not align with the actual execution time. It's often viewed as the
cost being measured in hypothetical "units" that don't carry any inherent
meaning on their own. Cost is only used to allow the planner to compare
different plans for the same query.
26
Selecting the Optimal Plan
Exploring Plans
Join order, join methods, access methods
A full scan where possible; with numerous options, the search space is
reduced.
Simple Queries and Prepared Statements
Optimizing retrieval of all rows
Minimum total cost
Cursors
Optimizing retrieval of a portion of the first rows
Minimum cost for cursor_tuple_fraction rows
The optimizer tries to evaluate all possible query execution plans to choose
the best one.
A dynamic programming algorithm is used to reduce the search space, but
when the number of options is high—especially due to the number of tables
being joined—finding an exact solution to the optimization problem becomes
impractical within a reasonable time. In such cases, the planner reduces the
number of plans it evaluates by either not considering all pairwise join
options or switching to a genetic optimization algorithm (GEQO - Genetic
Query Optimization). This could result in the optimizer choosing a
suboptimal plan not because of evaluation errors, but merely because the
best plan wasn't considered.
What defines the "best plan"?
For typical queries, the optimal plan minimizes the resources needed to
retrieve all rows, meaning it has the lowest total cost.
However, when using cursors, it's important to retrieve the first rows as soon
as possible. Therefore, there's a parameter cursor_tuple_fractio (default 0.1)
that determines the fraction of rows to be retrieved as quickly as possible.
The lower the parameter value, the more the initial cost influences the plan
selection instead of the total cost.
27
Takeaways
Query processing involves multiple stages: parsing and
rewriting, planning, and execution.
There are two protocols for executing queries.
Simple — direct execution and immediate result retrieval
Extended: prepared statements and cursors
Runtime depends on the quality of planning.
The optimizer constructs the plan based on cost.
28
Practice
1. The Effect of Preparing a Long Statement on Its Execution
Compute the average cost of a flight and determine the average
run time of this query.
Prepare the statement for this query and then recalculate the
average run time.
How many times faster did the execution become?
2. The Impact of Preparing Short Statements on Their Execution
Run the query for booking data with ID 0824C5 multiple times;
then calculate the average execution time.
Prepare the statement for this query and then recalculate the
average run time.
How many times faster did the execution become in this case?
The execution time of the same query can vary significantly, especially
during the first execution. To reduce the variation, average the execution
time by running the query multiple times. It's convenient to use PL/pgSQL,
keeping in mind that:
A dynamic query executed via the PL/pgSQL EXECUTE command (note:
not to be confused with the SQL EXECUTE command) goes through all
stages every time;
An SQL query embedded in PL/pgSQL code is executed using prepared
statements.
Example of the syntax for a regular operator command:
DO $$
BEGIN
FOR i IN 1..10 LOOP
EXECUTE 'SELECT ... FROM ...';
END LOOP;
END;
$$ LANGUAGE plpgsql;
For the prepared statement (in this case, SELECT is replaced with
PERFORM because the result is not needed):
DO $$
BEGIN
FOR i IN 1..10 LOOP
PERFORM ... FROM ...;
END LOOP;
END;
$$ LANGUAGE plpgsql;