Data Organization
Low Level
16
Copyright
© Postgres Professional, 2017–2025
Authors: Egor Rogov, Pavel Luzanov, Ilya Bashtanov, Alexey Beresnev
Translated by: Liudmila Mantrova, Alexander Meleshko, Elena Sharafutdinova
Photo: Oleg Bartunov (Phu Monastery and Bhrikuti Peak, Nepal)
Use of Course Materials
Non-commercial use of course materials (presentations, demonstrations) is
allowed without restrictions. Commercial use is possible only with the written
permission of Postgres Professional. It is prohibited to make changes to the
course materials.
Feedback
Please send your feedback, comments and suggestions to:
edu@postgrespro.ru
Disclaimer
Postgres Professional assumes no responsibility for any damages and losses,
including loss of income, caused by direct or indirect, intentional or accidental
use of course materials. Postgres Professional company specifically disclaims
any warranties on course materials. Course materials are provided “as is,” and
Postgres Professional company has no obligations to provide maintenance,
support, updates, enhancements, or modifications.
2
Topics
Data Files
Forks: Main, Visibility Map, Free Space Map
Oversized Row Versions and TOAST
3
Object Forks
NNN_vm
NNN_fsm
1 GB
segment
main
fsm
vm
NNN_fsm
NNN_fsm.1
NNN
NNN.1
NNN.2
pg_relation_size
Usually, each database object that stores data (table, index, sequence,
materialized view) has several corresponding forks. Each fork contains a
specific type of data.
Initially, each fork contains a single file. The file name is a numeric identifier
and may include a suffix derived from the fork name.
The file gradually increases in size until it reaches 1 GB, at which point the
next file for the same fork is created. Such files are sometimes called
segments.The segment sequence number is appended to the end of the file
name. The pg_relation_size function displays the total size of a fork.
The 1 GB file size limit was established in the past to support file systems
that cannot operate with larger file sizes. A different file size limit can be set
during source code compilation with the --with-segsize option.
So, a single database object may consist of multiple files on disk. A small
table will have three corresponding files on disk, and an index will have two.
All object files belonging to the same tablespace and the same database are
stored in the same directory. This may become an issue as some file
systems may perform poorly on directories with a large number of files.
4
Forks
Main fork
actual data (row versions)
exists for all objects
Initialization fork (init)
“template” of the main fork
used in case of failure; exists only for unlogged tables
Visibility map (vm)
exists only for tables
Free space map (fsm)
exists for both tables and indexes
There are several types of forks.
The main fork contains the data itself, namely table row versions and index
records. The main fork file names match the identifier. All objects have a
main fork.
The file names of the initialization fork end with the _init suffix. This fork
exists only for unlogged tables (created with the UNLOGGED clause) and
their indexes. Such objects do not differ from regular ones, except that
actions performed on them are not logged in WAL. This makes operations
on them faster, but their content cannot be recovered if a failure occurs.
During a recovery PostgreSQL just removes all the forks of such objects and
writes the initialization fork in place of the main fork. The result is an empty
table.
The vm (visibility map) fork’s filenames end in _vm. The fork exists only for
tables, separate MVCC for indexes is not supported.
The fsm (free space map) fork’s filenames end in _fsm. This fork exists for
both tables and indexes.
These two maps were discussed in the Architecture module.
6
TOAST
A row version must fit into one page
some fields can be compressed
some fields can be moved into a TOAST table
fields can be both compressed and moved
TOAST table
located in the pg_toast (pg_toast_temp_N) schema
supported by its own index
contains chunks of oversized values, each chunk is smaller than a page
retrieved only when the oversized field is accessed
has its own row versions
used transparently for the application
Any row version in PostgreSQL must fit entirely into one page. Oversized
row versions are stored using TOAST, The Oversized Attributes Storage
Technique. TOAST comprises several approaches to storing oversized field
values. Firstly, the value can be compressed so that the row version fits into
the page. Secondly, the value can be moved from the row version to a
separate service table. Both strategies can be applied to the same row
versions: some values would be compressed, some moved, some
compressed and moved.
Any table can have a separate TOAST table (with a dedicated index)
created for it, if necessary. Such tables and indexes are located in a
separate schema named pg_toast and, therefore, are usually not visible (for
temporary tables, pg_toast_temp_N schema is used, similarly to the regular
pg_temp_N).
The row versions in the TOAST table must also fit into one page each, so
longer values are split into multiple chunks, and are transparently “glued
together” by PostgreSQL when the application demands.
TOAST table is used only when oversized values are accessed. Besides
that, TOAST tables have their own row versions. Whenever a data update in
the main table does not affect the oversized value, the new row version will
refer to the same TOAST value, saving disk space.
8
Table Size
pg_total_relation_size
pg_indexes_sizepg_table_size
Table TOAST Indexes
As already mentioned, the size of a single fork can be obtained by the
pg_relation_size function. To get the total object size, other functions can be
used:
pg_table_size shows the size of the table and its TOAST part (the TOAST
table and its index), but not the regular index sizes. The same function
can be used to find the size of an individual index: both tables and
indexes are relations, and despite the name, the function accepts any
relation as an input.
pg_indexes_size sums up the sizes of all table indexes except the
TOAST table index.
pg_total_relation_size shows the full size of the table, along with all its
indexes.
10
Takeaways
An object comprises several forks
A fork consists of one or more segment files
Oversized row versions are stored using TOAST
11
Practice
1. Create an unlogged table in a custom tablespace and make sure
that it has an init fork.
Delete the created tablespace.
2. Create a table with a column of the text type. What storage
strategy is used for this column?
Change the strategy to external and insert a short and a long row
into the table.
Check if the rows are in the TOAST table by making a direct
query to it. Explain why.
12
Practice+
1. Create a database.
Compare the database size returned by the pg_database_size
function with the total size of all tables in the database.
Explain the result.
2. TOAST supports two compression methods: pglz and lz4.
Use SQL to check whether PostgreSQL was compiled with these
methods support.
3. Create a text file of at least 10 MB size.
Load its contents into a table with a text field, first without
compression, and then using each of the algorithms. Compare the
final table size and the data loading time of all three options.
1. You can get the list of database tables from the pg_class table.
2. Using the pg_config view, you can find out which options were set for the
configure script when the server software was compiled. The string
containing the list of options is long; you can extract the necessary options
using the string_to_table function.
3. To obtain text for the experiment, you can take a sufficiently large binary
file (for example, the postgres executable) and convert it to text. For the
conversion, you can use the Base32 algorithm (the -w0 option disables line
breaks):
base32 -w0 < binary-file > text-file