Data Organization
Physical Structure
12
Copyright
© Postgres Professional, 2017–2021
Authors: Egor Rogov, Pavel Luzanov
Translated by Liudmila Mantrova
Usage of Course Materials
Non-commercial use of course materials (presentations, demonstrations) is
allowed on an unrestricted basis. Commercial use is only possible with prior
written permission of Postgres Professional company. Modification of course
materials is forbidden.
Contact Us
Please send your feedback to: edu@postgrespro.ru
Disclaimer
In no event shall Postgres Professional company be liable for any damages
or loss, including loss of profits, that arise from direct or indirect, special or
incidental use of course materials. Postgres Professional company
specifically disclaims any warranties on course materials. Course materials
are provided “as is,” and Postgres Professional company has no obligations
to provide maintenance, support, updates, enhancements, or modifications.
2
Agenda
Tablespaces and directories
Files and data pages
Forks: data, visibility map, free space map
TOAST
3
template1
postgres
Tablespaces
pg_catalog public pg_catalog publicnew schema
pg_global
tablespace
pg_default
таблица
таблица
object
таблица
таблица
object
таблица
таблица
object
таблица
таблица
object
таблица
таблица
object
таблица
таблица
object
таблица
таблица
object
таблица
таблица
object
таблица
таблица
object
таблица
таблица
object
таблица
таблица
object
default
tablespace
default
tablespace
Tablespaces are used to manage physical storage of data; they define the
layout of files in the file system.
For example, one tablespace can be created on slow disks to store archive
data, while another tablespace located on fast disks will be used for the data
that is being actively updated.
During cluster initialization, two tablespaces are created: pg_default and
pg_global.
One and the same tablespace can be used by several databases, and one
database can store data in several tablespaces.
Besides, each database has a so-called “default tablespace,” in which all
database objects are created unless another location is specified. This
tablespace also stores system catalog objects. Initially, the pg_default
tablespace is used as the default one, but you can change this behavior.
The pg_global tablespace is special: it stores those system catalog
objects that are common to the whole cluster.
4
template1postgres
Catalogs
pg_global
tablespace
pg_default
таблица
таблица
object
таблица
таблица
object
таблица
таблица
object
таблица
таблица
object
таблица
таблица
object
таблица
таблица
object
таблица
таблица
object
PGDATA/global/
...
...
...
PGDATA/base/dboid /
...
NNN
NNN.1
NNN.2
...
PGDATA/pg_tblspc/tsoid
/path-to-catalog/ver/dboid/
...
...
...
Basically, a tablespace is a reference to the catalog that stores data. The
pre-defined pg_global and pg_default tablespaces are always located
in the PGDATA/global/ and PGDATA/base/ directories, respectively. When
creating a custom tablespace, you can specify an arbitrary directory; the
server always uses relative links, so it creates an additional symlink in
PGDATA/pg_tblspc/ that points to this custom directory.
Within the PGDATA/base/ directory, all data is distributed between
subdirectories of different databases (PGDATA/global/ has no such
subdirectories because it contains the data related to the cluster as a
whole).
Custom tablespaces have one more nesting level to reflect the PostgreSQL
version. It is done to facilitate server upgrades.
Actual database objects are stored in files within these directories: each
object has its own files.
Each file, which is called a segment, takes no more than 1 GB of disk space
(this value can be altered at build time). That’s why each object can have
several corresponding files. Thus, a database can potentially have plenty of
files, and their impact on the file system has to be taken into account.
6
The main fork
the data itself
Free space map
appears in pages
after vacuum
Visibility map
marks pages
in which all
row versions are
visible in all snapshots
Files and Forks
segment
1 GB
page
8 kB
Each object is usually represented by several forks. Each fork is a set of
segments (i.e., a file or several files). All segments are split into separate
pages. The page size is usually 8 kB; it can be configured for the whole
cluster, but only at build time. The pages of different objects are read from
disk using the same buffer cache mechanism.
The main fork is the data itself: index rows or different versions of table
rows (which are called tuples).
The vm fork is a visibility map (a bitmap). It marks the pages that contain
only the current tuples visible in all data snapshots. In other words, these
are the pages that have not been modified in a while and have no outdated
tuples left.
Visibility map is used to optimize vacuum (the pages marked in the visibility
map do not have to be vacuumed) and to speed up index access. The point
is that versioning information is stored only for tables, but not for indexes
(that’s why indexes have no visibility maps). Once a reference to a tuple is
found in index, the corresponding table page has to be read to check the
visibility status of this tuple. But this step can be skipped if the index itself
already contains all the required columns, and the page is present in the
visibility map.
The fsm fork is a free space map. It tracks free space within pages, which
can appear after vacuum. This map is used to quickly find a page that is
suitable for inserting a new tuple.
8
TOAST
A tuple must fit one page
some attributes can be compressed,
moved to a separate TOAST table,
or both compressed and moved
A TOAST table
is stored in the pg_toast schema
provides its own index
splits “oversized” attributes into chunks that are smaller than a page
is read only when an “oversized” attribute is queried
uses its own versioning
is seamlessly used by applications
In PostgreSQL, any row version must fit a single page. For “oversized”
tuples, the TOAST mechanism is used, which stands for “The Oversized
Attributes Storage Technique”. It implies several strategies. Some oversized
attributes can be compressed for the tuple to fit a page. If it is impossible,
the attribute can be moved into a separate service table. These two
approaches can also be combined.
If required, for each main table, PostgreSQL creates a separate TOAST
table (with a special index). Such tables and indexes are located in a
separate schema called pg_toast, so they are usually invisible.
Tuples in TOAST tables must also fit a single page, so to store oversized
values, PostgreSQL splits them into chunks. When required by an
application, these chunks are seamlessly put together to produce a full
value.
TOAST tables are accessed only if the oversized value has to be returned.
Besides, TOAST tables have their own versioning: if an update does not
affect the oversized value, the new tuple refers to the same value in the
TOAST table. This approach allows us to save some space.
10
Summary
At the physical level
the data is distributed between tablespaces (directories)
an object is represented by several forks
each fork consists of one or more segments
Tablespaces are managed by DBAs
Files, forks, and TOAST are managed internally by PostgreSQL
11
Practice
1. Create a new database and establish a connection with it.
Create a tablespace called ts.
Create table t in tablespace ts
and insert several rows into this table.
2. Calculate the size of the database, the table, as well as the size of
ts and pg_default tablespaces.
3. Move the table into the pg_default tablespace.
How has the tablespace size changed?
4. Delete the ts tablespace.