Data organization
Tablespaces
13
Copyright
© Postgres Professional, 2015–2022
Authors: Egor Rogov, Pavel Luzanov, Ilya Bashtanov
Translated by Alexander Meleshko
Use of course materials
Non-commercial use of course materials (presentations, demonstrations) is
allowed without restrictions. Commercial use is possible only with the written
permission of Postgres Professional. It is prohibited to make changes to the
course materials.
Feedback
Please send your feedback, comments and suggestions to:
Disclaimer
Postgres Professional assumes no responsibility for any damages and
losses, including loss of income, caused by direct or indirect, intentional or
accidental use of course materials. Postgres Professional company
specifically disclaims any warranties on course materials. Course materials
are provided “as is,” and Postgres Professional company has no obligations
to provide maintenance, support, updates, enhancements, or modifications.
2
Topics
Tablespaces and catalogs
Creating, modifying, and deleting tablespaces
Storing data in the file system
Moving data
3
Tablespaces
appdb
pg_catalog public
postgres
pg_catalog publicschema
pg_global
tablespace
tablespace
pg_default
tablespace
table
table
object
table
table
object
table
table
object
table
table
object
table
table
object
table
table
object
table
table
object
table
table
object
table
table
object
table
table
object
table
table
object
default
tablespace
default
tablespace
Tablespaces are used to organize the physical storage of data and
determine the location of data in the file system.
For example, one tablespace can be used on slow disks for archived data,
and another on fast disks with frequent activity.
On cluster initialization, two tablespaces are created: pg_default and
pg_global.
A tablespace can be used by multiple databases, and a database can use
multiple tablespaces at once.
Each database has a default tablespace where all database objects are
created (unless specified otherwise). System catalog objects are also stored
in the default tablespace. Databases will use the pg_default tablespace as
their default, unless another one is set by the user.
The pg_global tablespace is special as it stores only those objects that are
shared by the whole cluster.
4
Directories
appdb
postgres
pg_global
tablespace
tablespace
pg_default
tablespace
table
table
object
table
table
object
table
table
object
table
table
object
table
table
object
table
table
object
table
table
object
PGDATA/global/
PGDATA/base/dboid /
PGDATA/pg_tblspc/tsoid
/path-to-catalog/ver/dboid/
Essentially, a tablespace is a reference to the directory in which the data is
located. The standard tablespaces pg_global and pg_default are always
located in PGDATA/global/ and PGDATA/base/, respectively. When a
custom tablespace is created, an arbitrary directory can be specified.
For convenience, PostgreSQL also creates a symbolic link to the directory in
PGDATA/pg_tblspc/.
The PGDATA/base/ directory comprises different directories for each
database (unlike PGDATA/global/, which stores data referring to the whole
cluster).
Inside a custom tablespace directory, there is another level of directories for
different PostgreSQL server versions. This is helpful during server upgrade.
Finally, these directories are where the actual objects are stored, one or
more files per object.
6
Takeaways
Tablespaces organize physical data storage
Logical (databases, schemas) and physical (tablespaces) forms
of data separation are independent
7
Practice
Why does pg_default become the default tablespace when creating a
database without specifying the TABLESPACE keyword?
1. Create a new tablespace.
2. Set it as the default tablespace for the template1 database.
3. Create a new database.
Check which default tablespace is set for the new database.
4. Find the symbolic link to the tablespace catalog in PGDATA.
5. Delete the created tablespace.
8
Practice+
1. Set the random_pa ge_cost parameter for the pg_default
tablespace to 1.1.
1. Use the ALTER TABLESPACE ... SET command:
The seq_page_cost and random_page_cost parameters are used by the
planner. They refer to the approximate cost of reading one page of data from
disk with sequential and random access, respectively.
The lower the ratio between these parameters, the more often the planner
will prefer index access over sequential table scanning.
The parameters *_cost and, more specifically, random_page_cost are
discussed in more detail in the “Query performance tuning” course (QPT).