Sorting and GroupingSorting
16
Copyright
© Postgres Professional, 2019–2024
Authors Authors: Egor Rogov, Pavel Luzanov, Ilya Bashtanov
Photo by: Oleg Bartunov (Phu monastery, Bhrikuti summit, Nepal)
Use of course materials
Non-commercial use of course materials (presentations, demonstrations) is
allowed without restrictions. Commercial use is possible only with the written
permission of Postgres Professional. It is prohibited to make changes to the
course materials.
Feedback
Please send your feedback, comments and suggestions to:
edu@postgrespro.ru
Disclaimer
In no event shall Postgres Professional company be liable for any damages
or loss, including loss of profits, that arise from direct or indirect, special or
incidental use of course materials. Postgres Professional company
specifically disclaims any warranties on course materials. Course materials
are provided “as is,” and Postgres Professional company has no obligations
to provide maintenance, support, updates, enhancements, or modifications.
2
Topics
Retrieving Sorted Data
In-memory sorting
External sort
Incremental sort
Sorting in Parallel Execution Plans
Sorting During Index Construction
Window functions involving sorting
3
Index Access (B-Tree)
returns a sorted row set
Sequential Access (Seq Scan)
returns an unordered set of rows
An additional sorting step is required
Data Access
As previously stated, index access (using a B-tree) enables the immediate
retrieval of a row set sorted by the indexing keys.
But how do you handle it when sorting is needed on fields without an index?
In this case, the server must first retrieve the data from the table using a
sequential scan and then perform an additional sorting step.
5
Quick sort (quicksort)
Top-N heapsort Top-N Heap Sort (Partial Heapsort)
when only a portion of the values is needed
In-memory sorting
In an ideal scenario, the set of rows to be sorted entirely fits within the
memory constrained by the work_mem parameter. In this case, all rows are
sorted using the Quick sort (quicksort) algorithm, and the result is passed up
to the parent node.
If you need to retrieve just the first few rows of a sorted set (when using the
LIMIT clause), you don't need to sort the entire set. In this case, the server
can use partial heap sort (top-N heapsort), which operates in random access
memory.
7
work_mem
External sort
namealbum_id
5
1
2
1
2
A Day in the Life
All Together Now
Another Girl
All You Need Is Love
Act Naturally
3 Across the Universe
5
1
A Day in the Life
All Together Now
quicksort
2 Another Girl
5
1
A Day in the Life
All Together Now
2 Another Girl
If the row set is too large, it won't fit into memory all at once. In such cases,
an external sort is used.
The row set is loaded into memory while there's capacity, then sorted and
written to a temporary file.
8
work_mem
External sort
namealbum_id
5
1
2
1
2
A Day in the Life
All Together Now
Another Girl
All You Need Is Love
Act Naturally
3 Across the Universe
quicksort
1
2
All You Need Is Love
Act Naturally
3 Across the Universe
1
2
All You Need Is Love
Act Naturally
3 Across the Universe5
1
A Day in the Life
All Together Now
2 Another Girl
temp_file_limit
This procedure is repeated as needed to write all the data to files, each
sorted separately.
Recall that the total size of session temporary files (excluding temporary
tables) is constrained by the value of the temp_file_limit parameter.
9
work_mem
External sort
1 All You Need Is Love
1 All Together Now
Merge
1
2
All You Need Is Love
Act Naturally
3 Across the Universe5
1
A Day in the Life
All Together Now
2 Another Girl
Subsequently, multiple files (two or more) are merged while maintaining the
line order.
Merging doesn't require much memory; you just need to keep one line from
each file (as shown in the example on the slide). The minimum (maximum)
is selected from these lines and returned as part of the output, with a new
line read from the same file to replace it. In practice, lines are read in
batches to speed up input/output, rather than one at a time.
When there isn't enough RAM to merge all files at once, the process begins
by merging a subset of files and saving the result to a temporary file. This is
then merged with other temporary files, and the process continues
iteratively.
We won't go into all the details of the sorting implementation here; you can
find them in the file src/backend/utils/sort/tuplesort.c.
The evolution of sorting techniques in PostgreSQL is thoroughly covered in
Gregory Stark's presentation «Sorting Through The Ages»:
synchronized Russian translation of the talk available at
11
Incremental Sort
namealbum_id
1
1
2
3
5
All You Need Is Love
All Together Now
Another Girl
Across the Universe
A Day in the Life
2 Act Naturally
1
1
All Together Now
All You Need Is Love
already
already sorted
3
5
Across the Universe
A Day in the Life
Act Naturally2
Another Girl2
Pre-sorted Groups
Full-sort Groups
Sortingby the 'name' field
Sortingby all fields
Let's suppose you want to sort a set by keys
K
1 ... Km ... Kn, and it's known
that the set is already sorted by some of the first keys K1 ... Km. K
m
... If the
dataset is already sorted by the first m keys, there's no need to re-sort the
entire dataset. Instead, you can split the set into groups, with each group
containing only rows with the same
K
1 ... Km values (arranged
consecutively) And then sort each group by the remaining keys
K
m+1 ... Kn
. This approach is referred to as incremental sorting .
The algorithm processes relatively large row groups separately while
merging smaller groups and fully sorting them. As the data sets being sorted
become smaller, the memory requirements also decrease. However,
increasing the number of groups leads to higher overhead.
Sets can be handled either in RAM or on disk when there's not enough
work_mem available.
Incremental sorting enables delivering results as soon as the first group is
processed, without waiting for the entire dataset to be sorted.
Disable incremental sorting by using the enable_incremental_sort
parameter.
13
The Gather Merge node preserves the sorted order.
выполняет слияние данных, поступающих от дочерних узлов
In parallel execution plans
Gather
Merge
Sort
Sort Sort
Parallel Seq Scan Parallel Seq Scan Parallel Seq Scan
Sorting can be part of parallel execution plans. Each worker process sorts its
portion of the data and sends the sorted results to a parent node, which
combines them into a single set.
But the Gather node isn't suitable for this purpose because it outputs results
in the order they are received from the worker processes.
Therefore, such plans use a Gather Merge node to maintain the sorted order
of incoming rows. To achieve this, it uses a merge algorithm to combine
multiple sorted sets into a single set.
15
Building the B-Tree
Sorting is employed
All rows are sorted first.
Then the rows are gathered into leaf index pages
The pointers are collected into next-level pages.
and so on until the root is reached
Can run in parallel
max_parallel_maintenance_workers
Constraint
maintenance_work_mem, as the operation is not frequent
When building an index (specifically a B-tree), the server would add records
to an empty index one at a time, processing the table's rows in sequence.
However, this approach is highly inefficient.
Therefore, sorting is used when creating or rebuilding indexes: all table rows
are sorted and then organized into leaf index pages. The upper levels of the
tree, consisting of references to elements from lower-level pages, are then
constructed until only one page remains — this becomes the root of the tree.
Sorting is done in the same way as described earlier. However, the memory
limit is not determined by work_mem but by maintenance_work_mem, as
the index creation process isn't too frequent and it's beneficial to allocate
more memory for it.
Index building can be done in parallel. Number of worker processes the
number of worker processes is determined similarly to parallel queries
(depending on the table size), but is limited by the
max_parallel_maintenance_worker parameter.
16
The window defines the set of rows to be aggregated for each row.
In the sorted set, the frame boundaries can shift.
1 2 3 4 5 1 2 3 4 5
1 2 3 4 5 1 2 3 4 5
1 2 3 4 5 1 2 3 4 5
1 2 3 4 5 1 2 3 4 5
1 2 3 4 5 1 2 3 4 5
Window functions
window frame
OVER () OVER (ORDER BY)
window
Using sorting in window functions has unique characteristics.
Unlike ordinary aggregate functions, window functions process every row in
the set without altering its size. In the OVER clause, after the function name,
a window is specified that defines the set of rows processed by the window
function for each row in the dataset. Such a sample is referred to as a
window frame.
If the window is specified as OVER(), the window frame is the same for each
row and includes all rows in the dataset.
18
Takeaways
Sorting is used during query execution and for building B-trees
There are various ways to implement sorting
in-memory
external (uses temporary files)
incremental
Indexes can help avoid sorting
19
Practice
1. What execution plan will be selected for the following query?
SELECT * FROM flights
Will the execution plan change if we increase work_mem to 32
MB?
2. Create an index on the passenger_name and passenger_id
columns of the tickets table. Did this operation require a
temporary file?
2. 2. Turn on logging for temporary file usage by setting the log_temp_files
parameter to zero.