Data organization

Tablespaces

Authors: Egor Rogov, Pavel Luzanov, Ilya Bashtanov

Translated by Alexander Meleshko

Use of course materials

Non-commercial use of course materials (presentations, demonstrations) is

allowed without restrictions. Commercial use is possible only with the written

permission of Postgres Professional. It is prohibited to make changes to the

course materials.

Feedback

Please send your feedback, comments and suggestions to:

edu@postgrespro.ru

Disclaimer

Postgres Professional assumes no responsibility for any damages and

losses, including loss of income, caused by direct or indirect, intentional or

accidental use of course materials. Postgres Professional company

specifically disclaims any warranties on course materials. Course materials

are provided “as is,” and Postgres Professional company has no obligations

to provide maintenance, support, updates, enhancements, or modifications.

Topics

Tablespaces and catalogs

Creating, modifying, and deleting tablespaces

Storing data in the file system

Moving data

Tablespaces

appdb

pg_catalog public

postgres

pg_catalog publicschema

pg_global

tablespace

pg_default

tablespace

table

object

table

object

table

object

table

object

table

object

table

object

table

object

table

object

table

object

table

object

table

object

default

tablespace

default

tablespace

Tablespaces are used to organize the physical storage of data and

determine the location of data in the file system.

For example, one tablespace can be used on slow disks for archived data,

and another on fast disks with frequent activity.

On cluster initialization, two tablespaces are created: pg_default and

pg_global.

A tablespace can be used by multiple databases, and a database can use

multiple tablespaces at once.

Each database has a default tablespace where all database objects are

created (unless specified otherwise). System catalog objects are also stored

in the default tablespace. Databases will use the pg_default tablespace as

their default, unless another one is set by the user.

The pg_global tablespace is special as it stores only those objects that are

shared by the whole cluster.

https://postgrespro.com/docs/postgresql/13/manage-ag-tablespaces

Directories

appdb

postgres

pg_global

tablespace

pg_default

tablespace

table

object

table

object

table

object

table

object

table

object

table

object

table

object

PGDATA/global/

PGDATA/base/dboid /

PGDATA/pg_tblspc/tsoid

/path-to-catalog/ver/dboid/

Essentially, a tablespace is a reference to the directory in which the data is

located. The standard tablespaces pg_global and pg_default are always

located in PGDATA/global/ and PGDATA/base/, respectively. When a

custom tablespace is created, an arbitrary directory can be specified.

For convenience, PostgreSQL also creates a symbolic link to the directory in

PGDATA/pg_tblspc/.

The PGDATA/base/ directory comprises different directories for each

database (unlike PGDATA/global/, which stores data referring to the whole

cluster).

Inside a custom tablespace directory, there is another level of directories for

different PostgreSQL server versions. This is helpful during server upgrade.

Finally, these directories are where the actual objects are stored, one or

more files per object.

Takeaways

Tablespaces organize physical data storage

Logical (databases, schemas) and physical (tablespaces) forms

of data separation are independent

Practice

Why does pg_default become the default tablespace when creating a

database without specifying the TABLESPACE keyword?

1. Create a new tablespace.

2. Set it as the default tablespace for the template1 database.

3. Create a new database.

Check which default tablespace is set for the new database.

4. Find the symbolic link to the tablespace catalog in PGDATA.

5. Delete the created tablespace.

Practice+

1. Set the random_pa ge_cost parameter for the pg_default

tablespace to 1.1.

1. Use the ALTER TABLESPACE ... SET command:

https://postgrespro.com/docs/postgresql/13/sql-altertablespace

The seq_page_cost and random_page_cost parameters are used by the

planner. They refer to the approximate cost of reading one page of data from

disk with sequential and random access, respectively.

The lower the ratio between these parameters, the more often the planner

will prefer index access over sequential table scanning.

The parameters *_cost and, more specifically, random_page_cost are

discussed in more detail in the “Query performance tuning” course (QPT).