Data organization

Low level

Authors: Egor Rogov, Pavel Luzanov, Ilya Bashtanov

Translated by Alexander Meleshko

Use of course materials

Non-commercial use of course materials (presentations, demonstrations) is

allowed without restrictions. Commercial use is possible only with the written

permission of Postgres Professional. It is prohibited to make changes to the

course materials.

Feedback

Please send your feedback, comments and suggestions to:

edu@postgrespro.ru

Disclaimer

Postgres Professional assumes no responsibility for any damages and

losses, including loss of income, caused by direct or indirect, intentional or

accidental use of course materials. Postgres Professional company

specifically disclaims any warranties on course materials. Course materials

are provided “as is,” and Postgres Professional company has no obligations

to provide maintenance, support, updates, enhancements, or modifications.

Topics

Data files

Forks: main, visibility map, free space map

Oversized row versions and TOAST

Object forks

NNN_vm

NNN_fsm

1 GB

segment

main

fsm

NNN_fsm

NNN_fsm.1

NNN

NNN.1

NNN.2

pg_relation_size

Usually, each database object that stores data (table, index, sequence,

materialized view) has several corresponding forks. Each fork contains a

specific type of data.

Initially, each fork contains a single file. The file name is a numeric identifier

and may include a suffix derived from the fork name.

The file gradually increases in size until it reaches 1 GB, at which point the

next file for the same fork is created. Such files are sometimes called

segments.The segment sequence number is appended to

the end of the file name. The pg_relation_size function displays the total

size of a fork.

The 1 GB file size limit was established in the past to support file systems

that cannot operate with larger file sizes. A different file size limit can be set

during source code compilation with the --with-segsize flag.

So, a single database object may consist of multiple files on disk. A small

table will have three corresponding files on disk, and an index will have two.

All object files belonging to the same tablespace and the same database are

stored in the same directory. This may become an issue as some file

systems may perform poorly on directories with a large number of files.

Forks

Main fork

actual data (row versions)

exists for all objects

Initialization fork (init)

A “template” of the main fork

used in case of failure; exists only for unlogged tables

Visibility map (vm)

exists only for tables

Free space map (fsm)

exists for both tables and indexes

There are multiple types of forks.

The main fork contains the actual data, such as table row versions and

index records. The main fork file names match the identifier. All objects have

a main fork.

The file names of the initialization fork end with the “_init” suffix. This fork

exists only for unlogged tables (created with the UNLOGGED keywords)

and their indexes. Unlogged tables are no different from regular ones,

except that actions performed on them are not logged in WAL. This makes

operations on them faster, but their content cannot be recovered if a failure

occurs. When recovering after a failure, PostgreSQL simply wipes all

unlogged table forks and copies the initialization fork into the main fork. The

result is an empty table.

https://postgrespro.com/docs/postgresql/13/storage-init

The vm (visibility map) fork’s filenames end in “_vm”. The fork exists only for

tables. Separate MVCC for indexes is not supported.

The fsm (free space map) fork’s filenames end in “_fsm”. This fork exists for

both tables and indexes.

These two maps were discussed in the Architecture module.

https://postgrespro.com/docs/postgresql/13/storage-fsm

https://postgrespro.com/docs/postgresql/13/storage-vm

TOAST

A row version must fit into one page

some of the fields can be compressed

some fields can be moved into a TOAST table

fields can be both compressed and moved

TOAST table

located in the pg_toast (pg_toast_temp_N) schema

supported by its own index

contains chunks of oversized values, each chunk is smaller than a page

accessed by querying a corresponding oversized field

has its own MVCC

used transparently for the application

Any row version in PostgreSQL must fit entirely into one page. Oversized

row versions are stored using TOAST, The Oversized Attributes Storage

Technique. TOAST comprises several approaches to storing oversized field

values. Firstly, the value can be compressed so that the row version fits into

the page. Secondly, the value can be moved from the row version to a

separate service table. Both strategies can be applied to the same row

versions: some values would be compressed, some moved, some

compressed and moved.

Any table can have a separate TOAST table (with a dedicated index)

created for it, if necessary. The dedicated indexes are located in the

pg_toast schema and therefore are usually not visible (temporary TOAST

tables are stored in the pg_toast_temp_N schema, similarly to the regular

pg_temp_N).

The row versions in the TOAST table must also fit into one page each, so

longer values are split into multiple chunks, and are transparently “glued

together” by PostgreSQL when the application demands.

TOAST tables are used only when oversized values are queried. The tables

have their own versioning mechanism. Whenever a data update in the main

table does not modify the oversized value in the TOAST table, the new row

version in the table will refer to the same old TOAST value, saving disk

space.

https://postgrespro.com/docs/postgresql/13/storage-toast

Table size

pg_total_relation_size

pg_indexes_sizepg_table_size

table TOAST Indexes

As already mentioned, the size of a single fork can be obtained by the

pg_relation_size function. To get the total object size, other functions can be

used:

●

pg_table_size shows the size of the table and its TOAST part (the

TOAST table and its index), but not the regular index sizes. The same

function can be used to find the size of an individual index: both tables

and indexes are relations, and despite the name, the function accepts

any relation as input.

●

pg_indexes_size sums up the sizes of all table indexes except the

TOAST table index.

●

pg_total_relation_size shows the full size of the table, along

with all its indexes.

Takeaways

An object comprises several forks

A fork consists of one or more segment files

Oversized row versions are stored using TOAST

Practice

1. Create an unlogged table in a custom tablespace and make sure

that it has an init fork.

Delete the created tablespace.

2. Create a table with a column of the text type.

What storage strategy is used for this column?

Change the strategy to external and insert a short and a long

row into the table.

Check if the rows are in the TOAST table by making a direct

query to it. Explain why.

Practice+

1. Create a database.

Compare the database size returned by the pg_database_size

command with the total size of all tables in the database.

Explain the result.

1. You can get the list of database tables from the pg_class table.