Data organization
Low level
© Postgres Professional, 2015–2022
Authors: Egor Rogov, Pavel Luzanov, Ilya Bashtanov
Translated by Alexander Meleshko
Use of course materials
Non-commercial use of course materials (presentations, demonstrations) is
allowed without restrictions. Commercial use is possible only with the written
permission of Postgres Professional. It is prohibited to make changes to the
course materials.
Please send your feedback, comments and suggestions to:
Postgres Professional assumes no responsibility for any damages and
losses, including loss of income, caused by direct or indirect, intentional or
accidental use of course materials. Postgres Professional company
specifically disclaims any warranties on course materials. Course materials
are provided “as is,” and Postgres Professional company has no obligations
to provide maintenance, support, updates, enhancements, or modifications.
Data files
Forks: main, visibility map, free space map
Oversized row versions and TOAST
Object forks
1 GB
Usually, each database object that stores data (table, index, sequence,
materialized view) has several corresponding forks. Each fork contains a
specific type of data.
Initially, each fork contains a single file. The file name is a numeric identifier
and may include a suffix derived from the fork name.
The file gradually increases in size until it reaches 1 GB, at which point the
next file for the same fork is created. Such files are sometimes called
segments.The segment sequence number is appended to
the end of the file name. The pg_relation_size function displays the total
size of a fork.
The 1 GB file size limit was established in the past to support file systems
that cannot operate with larger file sizes. A different file size limit can be set
during source code compilation with the --with-segsize flag.
So, a single database object may consist of multiple files on disk. A small
table will have three corresponding files on disk, and an index will have two.
All object files belonging to the same tablespace and the same database are
stored in the same directory. This may become an issue as some file
systems may perform poorly on directories with a large number of files.
Main fork
actual data (row versions)
exists for all objects
Initialization fork (init)
A “template” of the main fork
used in case of failure; exists only for unlogged tables
Visibility map (vm)
exists only for tables
Free space map (fsm)
exists for both tables and indexes
There are multiple types of forks.
The main fork contains the actual data, such as table row versions and
index records. The main fork file names match the identifier. All objects have
a main fork.
The file names of the initialization fork end with the “_init” suffix. This fork
exists only for unlogged tables (created with the UNLOGGED keywords)
and their indexes. Unlogged tables are no different from regular ones,
except that actions performed on them are not logged in WAL. This makes
operations on them faster, but their content cannot be recovered if a failure
occurs. When recovering after a failure, PostgreSQL simply wipes all
unlogged table forks and copies the initialization fork into the main fork. The
result is an empty table.
The vm (visibility map) forks filenames end in “_vm”. The fork exists only for
tables. Separate MVCC for indexes is not supported.
The fsm (free space map) fork’s filenames end in “_fsm”. This fork exists for
both tables and indexes.
These two maps were discussed in the Architecture module.
A row version must fit into one page
some of the fields can be compressed
some fields can be moved into a TOAST table
fields can be both compressed and moved
TOAST table
located in the pg_toast (pg_toast_temp_N) schema
supported by its own index
contains chunks of oversized values, each chunk is smaller than a page
accessed by querying a corresponding oversized field
has its own MVCC
used transparently for the application
Any row version in PostgreSQL must fit entirely into one page. Oversized
row versions are stored using TOAST, The Oversized Attributes Storage
Technique. TOAST comprises several approaches to storing oversized field
values. Firstly, the value can be compressed so that the row version fits into
the page. Secondly, the value can be moved from the row version to a
separate service table. Both strategies can be applied to the same row
versions: some values would be compressed, some moved, some
compressed and moved.
Any table can have a separate TOAST table (with a dedicated index)
created for it, if necessary. The dedicated indexes are located in the
pg_toast schema and therefore are usually not visible (temporary TOAST
tables are stored in the pg_toast_temp_N schema, similarly to the regular
The row versions in the TOAST table must also fit into one page each, so
longer values are split into multiple chunks, and are transparently “glued
together” by PostgreSQL when the application demands.
TOAST tables are used only when oversized values are queried. The tables
have their own versioning mechanism. Whenever a data update in the main
table does not modify the oversized value in the TOAST table, the new row
version in the table will refer to the same old TOAST value, saving disk
Table size
table TOAST Indexes
As already mentioned, the size of a single fork can be obtained by the
pg_relation_size function. To get the total object size, other functions can be
pg_table_size shows the size of the table and its TOAST part (the
TOAST table and its index), but not the regular index sizes. The same
function can be used to find the size of an individual index: both tables
and indexes are relations, and despite the name, the function accepts
any relation as input.
pg_indexes_size sums up the sizes of all table indexes except the
TOAST table index.
pg_total_relation_size shows the full size of the table, along
with all its indexes.
An object comprises several forks
A fork consists of one or more segment files
Oversized row versions are stored using TOAST
1. Create an unlogged table in a custom tablespace and make sure
that it has an init fork.
Delete the created tablespace.
2. Create a table with a column of the text type.
What storage strategy is used for this column?
Change the strategy to external and insert a short and a long
row into the table.
Check if the rows are in the TOAST table by making a direct
query to it. Explain why.
1. Create a database.
Compare the database size returned by the pg_database_size
command with the total size of all tables in the database.
Explain the result.
1. You can get the list of database tables from the pg_class table.