Data Organization
Physical Structure
15
Copyright
© Postgres Professional, 2023
Authors: Egor Rogov, Pavel Luzanov, Ilya Bashtanov
Translated by Liudmila Mantrova
Cover photo by Oleg Bartunov (Phu monastery and Bhrikuti peak, Nepal)
Use of course materials
Non-commercial use of course materials (presentations, demonstrations) is
allowed without restrictions. Commercial use is possible only with the written
permission of Postgres Professional. It is prohibited to make changes to the
course materials.
Feedback
Please send your feedback, comments and suggestions to:
Disclaimer
Postgres Professional assumes no responsibility for any damages and
losses, including loss of income, caused by direct or indirect, intentional or
accidental use of course materials. Postgres Professional company
specifically disclaims any warranties on course materials. Course materials
are provided “as is,” and Postgres Professional company has no obligations
to provide maintenance, support, updates, enhancements, or modifications.
2
Agenda
Tablespaces and directories
Files and data pages
Forks: data, visibility map, free space map
TOAST
3
template1
postgres
Tablespaces
pg_catalog public pg_catalog publicnew schema
pg_global
tablespace
pg_default
таблица
таблица
object
таблица
таблица
object
таблица
таблица
object
таблица
таблица
object
таблица
таблица
object
таблица
таблица
object
таблица
таблица
object
таблица
таблица
object
таблица
таблица
object
таблица
таблица
object
таблица
таблица
object
default
tablespace
default
tablespace
Tablespaces are used to manage physical storage of data; they define the
layout of files in the file system.
For example, one tablespace can be created on slow disks to store archive
data, while another tablespace located on fast disks will be used for the data
that is being actively updated.
During cluster initialization, two tablespaces are created: pg_default and
pg_global.
The same tablespace can be used by several databases, and one database
can store data in several tablespaces.
Besides, each database has a so-called “default tablespace,” in which all
database objects are created unless another location is specified. This
tablespace also stores system catalog objects. Initially, the pg_default
tablespace is used as the default one, but you can change this behavior.
The pg_global tablespace is special: it stores those system catalog
objects that are common to the whole cluster.
4
template1postgres
Catalogs
pg_global
tablespace
pg_default
таблица
таблица
object
таблица
таблица
object
таблица
таблица
object
таблица
таблица
object
таблица
таблица
object
таблица
таблица
object
таблица
таблица
object
PGDATA/global/
...
...
...
PGDATA/base/dboid /
...
NNN
NNN.1
NNN.2
...
PGDATA/pg_tblspc/tsoid
/path-to-catalog/ver/dboid/
...
...
...
Basically, a tablespace is a reference to the directoty that stores data. The
pre-defined pg_global and pg_default tablespaces are always located
in the PGDATA/global/ and PGDATA/base/ directories, respectively. When
creating a custom tablespace, you can specify an arbitrary directory; the
server always uses relative links, so it creates an additional symlink in
PGDATA/pg_tblspc/ that points to this custom directory.
Within the PGDATA/base/ directory, all data is distributed between
subdirectories of different databases (PGDATA/global/ has no such
subdirectories because it contains the data related to the cluster as a
whole).
Custom tablespaces have one more nesting level to reflect the PostgreSQL
version. It is done to facilitate server upgrades.
Actual database objects are stored in files within these directories: each
object has its own files.
Each file, which is called a segment, takes no more than 1 GB of disk space
(this value can be altered at build time). That’s why each object can have
several corresponding files. Thus, a database can potentially have plenty of
files, and their impact on the file system has to be taken into account.
6
The main fork
the data itself
Free space map
appears in pages
after vacuum
Visibility map
marks pages
in which all
row versions are
visible in all snapshots
Files and Forks
segment
1 GB
page
8 kB
Each object is usually represented by several forks. Each fork is a set of
segments (i.e., a file or several files). All segments are split into separate
pages. The page size is usually 8 kB; it can be configured for the whole
cluster, but only at build time. The pages of different objects are read from
disk using the same buffer cache mechanism.
The main fork is the data itself: index rows or different versions of table
rows.
The vm fork is a visibility map (a bitmap). It marks the pages that contain
only the current tuples visible in all data snapshots. Visibility map is used to
optimize vacuum and to speed up index access.
The fsm fork is a free space map. It tracks free space within pages, which
can appear after vacuum. This map is used to quickly find a page that is
suitable for inserting a new tuple.
8
TOAST
A tuple must fit one page
some attributes can be compressed,
moved to a separate TOAST table,
or both compressed and moved
A TOAST table
is stored in the pg_toast schema
provides its own index
splits “oversized” attributes into chunks that are smaller than a page
is read only when an “oversized” attribute is queried
uses its own versioning
is seamlessly used by applications
In PostgreSQL, any row version must fit a single page. For “oversized”
tuples, the TOAST mechanism is used, which stands for “The Oversized
Attributes Storage Technique”. It implies several strategies. Some oversized
attributes can be compressed for the tuple to fit a page. If it is impossible,
the attribute can be moved into a separate service table. These two
approaches can also be combined.
If required, for each main table, PostgreSQL creates a separate TOAST
table (with a special index). Such tables and indexes are located in a
separate schema called pg_toast, so they are usually invisible.
Tuples in TOAST tables must also fit a single page, so to store oversized
values, PostgreSQL splits them into chunks. When required by an
application, these chunks are seamlessly put together to produce a full
value.
TOAST tables are accessed only if the oversized value has to be returned.
Besides, TOAST tables have their own versioning: if an update does not
affect the oversized value, the new tuple refers to the same value in the
TOAST table. This approach allows us to save some space.
10
Takeaways
At the physical level
the data is distributed between tablespaces (directories)
an object is represented by several forks
each fork consists of one or more segments
Tablespaces are managed by DBAs
Files, forks, and TOAST are managed internally by PostgreSQL
11
Practice
1. Create a new database and establish a connection with it.
Create a tablespace called ts.
Create table t in tablespace ts
and insert several rows into this table.
2. Calculate the size of the database, the table, as well as the size of
ts and pg_default tablespaces.
3. Move the table into the pg_default tablespace.
How has the tablespace size changed?
4. Delete the ts tablespace.