Query Optimization: Materialization
16
Copyright
© Postgres Professional, 2019–2024
Authors Authors: Egor Rogov, Pavel Luzanov, Ilya Bashtanov
Photo by: Oleg Bartunov (Phu monastery, Bhrikuti summit, Nepal)
Use of course materials
Non-commercial use of course materials (presentations, demonstrations) is
allowed without restrictions. Commercial use is possible only with the written
permission of Postgres Professional. It is prohibited to make changes to the
course materials.
Feedback
Please send your feedback, comments and suggestions to:
edu@postgrespro.ru
Disclaimer
In no event shall Postgres Professional company be liable for any damages
or loss, including loss of profits, that arise from direct or indirect, special or
incidental use of course materials. Postgres Professional company
specifically disclaims any warranties on course materials. Course materials
are provided “as is,” and Postgres Professional company has no obligations
to provide maintenance, support, updates, enhancements, or modifications.
2
Topics
Query Materialization
Temporary tables
Managing Connection Order
Materialized Views
3
Материализация —
сохранение промежуточного
набора строк
для последующего
многократного
использования
Materialize
Materialize Node
Join
Scan Scan
Nested Loop
Scan
work_mem
temp_file_limit
Materialization refers to storing an intermediate set of rows for later reuse,
often multiple times. The set can be stored at various levels: for a particular
query, at the session level, or at the database level.
Query nodes typically exchange data using a pipeline model: when the
algorithm in a plan node requires the next row from the dataset, the node
requests the next batch from one of its child nodes. However, in some
cases, it's beneficial (and sometimes necessary) for the query executor to
immediately retrieve all rows, store them, and have the ability to access the
stored result again. This materialization is handled by the Materialize node.
Rows are stored in main memory as long as their size does not exceed the
work_mem limit. When the limit is exceeded, all rows are written to a
temporary file and read from it as needed. The total size of all temporary
files per session is constrained by the temp_file_limit parameter value.
For example, the top Nested Loop node in the example executes a nested
loop join, but there's no efficient way to access the inner data set—it's
computed using another join. To avoid repeatedly executing the nested join,
its result can be materialized.
4
Материализация —
сохранение промежуточного
набора строк
для последующего
многократного
использования
Materialize Node
Materialize
Join
Scan Scan
Nested Loop
Scan
work_mem
temp_file_limit
Once the nested join's result is materialized, the Nested Loop node
accesses the already prepared row set.
6
CTE
CTE Scan
CTE Materialization
Join
Scan 1 Scan 2
Join
Scan 3
Join
Scan 3 Scan 2
Join
Scan 1
Materialization
As determined by the planner
or the developer's choice
Without Materialization
materialization
Common Table Expressions (CTE), also known as WITH subqueries,
provide an effective way to structure a query and enhance its clarity. Unlike
regular subqueries, CTEs avoid deep nesting.
The planner expands CTE subqueries (without materializing them) when
possible. This allows it to select the optimal join order. By default, the
subquery is materialized in the following cases:
The main query accesses the subquery multiple times to avoid redundant
calculations.
In such cases, materialization can be disabled by specifying the AS NOT
MATERIALIZED clause.
The subquery has side effects (modifies data) —so that the change
occurs exactly once. (Side effects also include calls to volatile functions;
see the «Functions» section.) (К побочным
эффектам относится также обращение к изменчивым функциям,
см. тему «Функции».)
Materialization cannot be disabled in this case.
You can always force materialization by specifying the AS MATERIALIZED
clause.
8
CTE
Recursive queries
working
table
CTE Scan
Recursive
Union
Result
Recursive
Union
Worktable
Scan
intermediate
table
Recursive queries are based on common table expressions.
The Recursive Union node relies on a working table and an intermediate
table: the working table stores the rows generated during the current
iteration, while the intermediate table builds up the result. The Worktable
Scan node reads the content of the working table during the recursive part of
the query.
Each of these two tables is materialized following the same rules: while the
table's content fits into work_mem, it is stored in memory, and then all rows
are written to disk.
Once the recursive query completes, the intermediate table's accumulated
data is passed to the CTE Scan node, while the working table is discarded.
10
Window functions
Sort
Seq Scan
WindowAgg
rows
current
partition
When evaluating window functions in the WindowAgg node (see the
"Sorting" section), materialization is also used: rows from the current
partition (PARTITION BY) may be included multiple times in the window, and
thus get materialized.
In this case, only the WindowAgg node uses the materialized rows; the
parent node receives the computed result, not these intermediate data.
11
Temporary tables
A table accessible during a single session
is stored in the system catalog
is not written to the log
Files are written to disk
Session-local memory caching (temp_buffers)
Vacuum and analysis
only performed manually
Intermediate data can be obtained using a complex algorithm, such as one
written in a procedural language. In such a scenario, the data cannot be
processed in a CTE but can be stored in a temporary table and reused
across multiple queries.
Temporary tables are more suitable for intermediate data than regular tables
because they exist only within a session or transaction (depending on the
mode specified during creation) and are automatically removed along with
the data and dependent objects like views and indexes. Moreover,
temporary tables are not logged and are cached in the local memory of the
backend process handling the session, making them more efficient to use.
The process's local memory is not accessible to autovacuum processes, so
vacuuming and analyzing must be done manually. The cache memory is
allocated on demand and restricted by the temp_buffer parameter for the
session (once the temporary table is accessed for the first time, the limit
cannot be modified).
However, temporary tables generate entries in the system catalog and are
stored as disk files. As a result, bulk processing of temporary tables—such
as in the 1C system—can lead to the system catalog growing and increased
strain on the file system. Therefore, 1C uses specific patches to mitigate
these unwanted side effects.
13
Join Order
The number of join combinations increases exponentially as the
number of tables in the query increases.
JOIN Operation
Full enumeration when the number of joins does not exceed
join_collapse_limit = 8
Next, in groups of join_collapse_limit tables
Comma-separated join
Full enumeration when the number of joins is no more than geqo_threshold =
12
Next, using a genetic algorithm
Materialization – Planner Hint
As the number of tables in a query increases, the number of join
combinations—and thus the cost of choosing a plan—grows exponentially.
If the number of joins (using the JOIN syntax) exceeds join_collapse_limit
(default 8), the planner evaluates potential join combinations in groups of
join_collapse_limit tables and then merges these groups.
There's a similar parameter for subqueries in the FROM clause —
from_collapse_limit.
If tables are joined with commas (without the JOIN keyword), the planner
evaluates all join combinations but switches to a genetic algorithm when the
number of joins exceeds geqo_threshold (default 12).
This can result in suboptimal execution plans. Details can be found in Pavel
By manually managing materialization—either through CTE or by breaking
down the query into parts and utilizing temporary tables—developers can
group tables, allowing the optimizer to plan each separately. Typically, CTE
is a more straightforward and efficient way, but the second option enables
analyzing a temporary table, thereby giving the planner more information
about the data.
15
Materialized views
Materializing the Query Result
read-only
It is possible to create indexes
Data Update
Manual synchronization
Incremental Update: pg_ivm Extension
Vacuum and analysis
manually and automatically
A materialized view is a named query result stored at the database level. A
materialized view can be treated like a regular read-only table.
You can create indexes on a materialized view (but you can't add integrity
constraints — these must be enforced in the base tables). A materialized
view collects the same statistics as regular tables.
Unlike regular views, the rows of a materialized view remain unchanged
when the base tables are modified. Synchronization is done manually.
Full synchronization of a materialized view may be too expensive. Built-in
incremental update (individual rows as base tables change) is not natively
supported, but the pg_ivm extension enables this feature (author – Suguru
17
Takeaways
The optimizer can materialize rows from the plan nodes for
reuse
You can control materialization using CTEs, temporary tables,
and materialized views.
Materialization enables control over the join order
18
Practice
1. At the start of the demo, examples of two queries with a
Materialize node were shown. Instruct the planner not to use this
node and check whether it can avoid materialization in some
cases and not in others.
2. Check the query execution plan with the from_collapse_limit
parameter set to 8 (default) and 1:
SELECT *FROM ( SELECT * FROM ticket_flights tf, tickets
t WHERE tf.ticket_no = t.ticket_no ) ttf, flights fWHERE
f.flight_id = ttf.flight_id;
1. 1. Use the enable_material parameter enable_material:
2. 2. See the documentation for details on the