Data Organization

Low Level

Authors: Egor Rogov, Pavel Luzanov, Ilya Bashtanov, Aleksei Beresnev

Translated by: Liudmila Mantrova, Alexander Meleshko, Elena Sharafutdinova

Photo: Oleg Bartunov (Phu Monastery and Bhrikuti Peak, Nepal)

Use of Course Materials

Non-commercial use of course materials (presentations, demonstrations) is

allowed without restrictions. Commercial use is possible only with the written

permission of Postgres Professional. It is prohibited to make changes to the

course materials.

Feedback

Please send your feedback, comments and suggestions to:

edu@postgrespro.ru

Disclaimer

Postgres Professional assumes no responsibility for any damages and losses,

including loss of income, caused by direct or indirect, intentional or accidental

use of course materials. Postgres Professional company specifically disclaims

any warranties on course materials. Course materials are provided “as is,” and

Postgres Professional company has no obligations to provide maintenance,

support, updates, enhancements, or modifications.

Topics

Data Files

Forks: Main, Visibility Map, Free Space Map

Oversized Row Versions and TOAST

Object Forks

NNN_vm

NNN_fsm

1 GB

segment

main

fsm

NNN_fsm

NNN_fsm.1

NNN

NNN.1

NNN.2

pg_relation_size

Usually, each database object that stores data (table, index, sequence,

materialized view) has several corresponding forks. Each fork contains a

specific type of data.

Initially, each fork contains a single file. The file name is a numeric identifier

and may include a suffix derived from the fork name.

The file gradually increases in size until it reaches 1 GB, at which point the

next file for the same fork is created. Such files are sometimes called

segments.The segment sequence number is appended to the end of the file

name. The pg_relation_size function displays the total size of a fork.

The 1 GB file size limit was established in the past to support file systems

that cannot operate with larger file sizes. A different file size limit can be set

during source code compilation with the --with-segsize option.

So, a single database object may consist of multiple files on disk. A small

table will have three corresponding files on disk, and an index will have two.

All object files belonging to the same tablespace and the same database are

stored in the same directory. This may become an issue as some file

systems may perform poorly on directories with a large number of files.

Forks

Main fork

actual data (row versions)

exists for all objects

Initialization fork (init)

“template” of the main fork

used in case of failure; exists only for unlogged tables

Visibility map (vm)

exists only for tables

Free space map (fsm)

exists for both tables and indexes

There are several types of forks.

The main fork contains the data itself, namely table row versions and index

records. The main fork file names match the identifier. All objects have a

main fork.

The file names of the initialization fork end with the _init suffix. This fork

exists only for unlogged tables (created with the UNLOGGED clause) and

their indexes. Such objects do not differ from regular ones, except that

actions performed on them are not logged in WAL. This makes operations

on them faster, but their content cannot be recovered if a failure occurs.

During a recovery PostgreSQL just removes all the forks of such objects and

writes the initialization fork in place of the main fork. The result is an empty

table.

https://postgrespro.com/docs/postgresql/16/storage-init

The vm (visibility map) fork’s filenames end in _vm. The fork exists only for

tables, separate MVCC for indexes is not supported.

The fsm (free space map) fork’s filenames end in _fsm. This fork exists for

both tables and indexes.

These two maps were discussed in the Architecture module.

https://postgrespro.com/docs/postgresql/16/storage-fsm

https://postgrespro.com/docs/postgresql/16/storage-vm

TOAST

A row version must fit into one page

some fields can be compressed

some fields can be moved into a TOAST table

fields can be both compressed and moved

TOAST table

located in the pg_toast (pg_toast_temp_N) schema

supported by its own index

contains chunks of oversized values, each chunk is smaller than a page

retrieved only when the oversized field is accessed

has its own row versions

used transparently for the application

Any row version in PostgreSQL must fit entirely into one page. Oversized

row versions are stored using TOAST, The Oversized Attributes Storage

Technique. TOAST comprises several approaches to storing oversized field

values. Firstly, the value can be compressed so that the row version fits into

the page. Secondly, the value can be moved from the row version to a

separate service table. Both strategies can be applied to the same row

versions: some values would be compressed, some moved, some

compressed and moved.

Any table can have a separate TOAST table (with a dedicated index)

created for it, if necessary. Such tables and indexes are located in a

separate schema named pg_toast and, therefore, are usually not visible (for

temporary tables, pg_toast_temp_N schema is used, similarly to the regular

pg_temp_N).

The row versions in the TOAST table must also fit into one page each, so

longer values are split into multiple chunks, and are transparently “glued

together” by PostgreSQL when the application demands.

TOAST table is used only when oversized values are accessed. Besides

that, TOAST tables have their own row versions. Whenever a data update in

the main table does not affect the oversized value, the new row version will

refer to the same TOAST value, saving disk space.

https://postgrespro.com/docs/postgresql/16/storage-toast

Table Size

pg_total_relation_size

pg_indexes_sizepg_table_size

Table TOAST Indexes

As already mentioned, the size of a single fork can be obtained by the

pg_relation_size function. To get the total object size, other functions can be

used:

●

pg_table_size shows the size of the table and its TOAST part (the TOAST

table and its index), but not the regular index sizes. The same function

can be used to find the size of an individual index: both tables and

indexes are relations, and despite the name, the function accepts any

relation as an input.

●

pg_indexes_size sums up the sizes of all table indexes except the

TOAST table index.

●

pg_total_relation_size shows the full size of the table, along with all its

indexes.

Takeaways

An object comprises several forks

A fork consists of one or more segment files

Oversized row versions are stored using TOAST

Practice

1. Create an unlogged table in a custom tablespace and make sure

that it has an init fork.

Delete the created tablespace.

2. Create a table with a column of the text type. What storage

strategy is used for this column?

Change the strategy to external and insert a short and a long row

into the table.

Check if the rows are in the TOAST table by making a direct

query to it. Explain why.

Practice+

1. Create a database.

Compare the database size returned by the pg_database_size

function with the total size of all tables in the database.

Explain the result.

2. TOAST supports two compression methods: pglz and lz4.

Use SQL to check whether PostgreSQL was compiled with these

methods support.

3. Create a text file of at least 10 MB size.

Load its contents into a table with a text field, first without

compression, and then using each of the algorithms. Compare the

final table size and the data loading time of all three options.

1. You can get the list of database tables from the pg_class table.

2. Using the pg_config view, you can find out which options were set for the

configure script when the server software was compiled. The string

containing the list of options is long; you can extract the necessary options

using the string_to_table function.

3. To obtain text for the experiment, you can take a sufficiently large binary

file (for example, the postgres executable) and convert it to text. For the

conversion, you can use the Base32 algorithm (the -w0 option disables line

breaks):

base32 -w0 < binary-file > text-file