Data organization

Databases and schemas

Authors: Egor Rogov, Pavel Luzanov, Ilya Bashtanov

Translated by Alexander Meleshko

Use of course materials

Non-commercial use of course materials (presentations, demonstrations) is

allowed without restrictions. Commercial use is possible only with the written

permission of Postgres Professional. It is prohibited to make changes to the

course materials.

Feedback

Please send your feedback, comments and suggestions to:

edu@postgrespro.ru

Disclaimer

Postgres Professional assumes no responsibility for any damages and

losses, including loss of income, caused by direct or indirect, intentional or

accidental use of course materials. Postgres Professional company

specifically disclaims any warranties on course materials. Course materials

are provided “as is,” and Postgres Professional company has no obligations

to provide maintenance, support, updates, enhancements, or modifications.

Topics

Databases and templates

Schemas and search path

Special schemas, temporary objects

Managing databases, schemas, and objects

Database cluster

Cluster initialization creates three databases

A new database is always cloned from an existing one

postgres

table

object

template0

table

object

template1

table

object

new database

table

object

does not

change

connection

by default

common

changes

A PostgreSQL instance manages a database cluster that comprises multiple

databases. When a cluster is initialized, three identical databases are

created in a specific way. All other databases created by users are cloned

from one of the existing ones.

https://postgrespro.com/docs/postgresql/13/bki

The template1 database is used by default when creating new databases.

Any objects and extensions added to the template will be copied into any

database created from it.

The template0 database must never be modified. It is required in at least

two scenarios. Firstly, it is necessary when restoring a database from a

backup created via pg_dump (since the copy will include not only the

database objects, but also any objects from template1). Secondly, it is

used when creating a new database with an encoding different from the one

specified during cluster initialization.

The postgres database is used to connect to by default by the postgres

user. It is not a hard requirement to have it, but some utilities expect it to be

there, so removing it isn’t a good idea, even if you never use the database

directly.

https://postgrespro.com/docs/postgresql/13/manage-ag-templatedbs

Schemas

Database object namespace

each object belongs to a schema

Purposes

dividing objects into logical groups

preventing name conflicts between applications

Schemas and users are different entities

Schemas are namespaces for database objects. They separate objects into

groups for easier management and serve to prevent name conflicts when

multiple users or applications access the same database.

Every object that exists in the database belongs to a schema.

In PostgreSQL, schema and user are different entities (although the default

settings allow users to seamlessly operate schemas of the same name).

https://postgrespro.com/docs/postgresql/13/ddl-schemas

Databases and schemas

appdb

pg_catalog public

postgres

pg_catalog public

table

object

table

object

table

object

table

object

app

table

object

schema

created by

a user

$ psql -d appdb

SELECT * FROM app.t;

system

catalog

“shared”

schema

A cluster comprises multiple databases. Each database contains various

schemas across which database objects are distributed.

There are several standard schemas that exist in any database. More

schemas can be added by users.

Clients may connect to only one database at a time, but within the database,

the client is free to access objects from any schemas.

Search path

Determining an object’s schema

a schema is explicitly defined by a qualified name (schema.name)

a name without a qualifier is looked up among all schemas specified in the

search path

Search path

defined by the search_path parameter

non-existent schemas and schemas with restricted access are excluded;

implicit schemas are included

the real value is shown by the current_schemas function

the first explicitly specified schema in the path is where objects are created

When specifying an object, the schema it belongs to must be determined,

since different schemas may contain objects of the same name.

When the object name is qualified (with the schema name), the object is

looked up in the given schema, as shown on the previous slide. If the name

is used without a qualifier, PostgreSQL tries to look the name up in one of

the schemas listed in the search path, which is determined by the

search_path configuration parameter.

The actual search path may differ from the search_path parameter value:

any non-existent schemas are excluded, as well as schemas to which the

user does not have access (more on that in the Access control module).

In addition, some special schemas are implicitly added to the beginning of

the search path.

The actual search path, including implicit schemas, returns a call to the

current_schemas(true) function. The schemas are searched in the

order specified in the search path, from left to right. If the desired object

name is not found in the first schema, the next one is searched, and so on.

When an object is created with an unqualified name, it is placed into the first

explicitly defined schema in the path.

The concept of search_path is similar to the PATH variable in popular

operating systems.

https://postgrespro.com/docs/postgresql/13/ddl-schemas#DDL-SCHEMAS-P

ATH

https://postgrespro.com/docs/postgresql/13/ddl-schemas#GUC-SEARCH-PA

Special schemas

The public schema

included in the search path by default

all objects will belong to this schema, unless configured otherwise

The schema named after the user

included in the search path by default but is not created automatically

if created, the user’s objects will belong to this schema

The pg_catalog schema

a schema for system catalog objects

if not explicitly included in the path, implicitly included as the first one

There are several special schemas usually present in any database.

The public schema is used by default for storing objects, unless intentionally

configured otherwise.

The pg_catalog schema stores system catalog objects. The system catalog

is a collection of tables containing metadata about objects belonging to the

cluster. information_schema is another schema with an alternative

representation of the system catalog (as defined in the SQL standard).

The pg_catalog schema contains system catalog objects (in particular, the

pg_* tables).

If pg_catalog is not specified in the search path, the schema will be implicitly

placed in the front of the path so that system objects remain visible (but

after pg_temp).

Special schemas

Temporary tables

exist for the duration of the session or transaction

not logged (no recovery after a crash)

do not utilize the shared buffer cache

pg_temp_N schema

created automatically for temporary tables

pg_temp: a link to a specific temporary schema for the session

if not explicitly included in the path, implicitly included as the very first one

at the end of the session, all objects of the temporary schema are dropped;

the schema itself remains and is reused for other sessions

PostgreSQL can work with temporary tables. Temporary tables store data

that should only be available to the current session and only for as long as

the session is active (or even for the duration of a single transaction within

the session).

Temporary tables are unlogged. This means that if a crash occurs, the table

contents will be lost. Additionally, temporary table pages are never stored in

the shared buffer cache, residing in the internal memory of their backend

process instead. Thanks to that, temporary tables can be accessed a bit

more quickly than regular ones.

Temporary tables are organized using schemas. When a session is started,

a temporary schema named pg_temp_N (pg_temp_1, pg_temp_2, etc.) is

created for it. It can be accessed by the name pg_temp (without a number,

because the name will always refer to the temporary schema specific for the

session).

If pg_temp is not specified in the path, it is searched before all others.

Otherwise, you can specify its position in the path explicitly, like with

pg_catalog.

After the session ends, all objects belonging to the temporary schema are

dropped, and the schema itself remains to be reused later on.

There are other special schemas, more technical in nature.

Takeaways

On the logical level,

a cluster contains databases,

a database contains schemas,

a schema contains specific objects (tables, indexes etc.)

Databases are created by cloning existing ones

A schema can be specified explicitly or determined using the

search path

Some schemas have specific purposes

Practice

1. Create a new database and connect to it.

2. Check the size of the created database.

3. Create two schemas. Name one app, and the other after your

user name.

Create several tables in both schemas and populate them with

some data.

4. Check how much the database size has increased.

5. Modify the search path value so that when connecting to the

database, tables from both schemas are accessible by an

unqualified name. The “username” schema should have priority.

Practice+

1. Create a database. For all sessions connecting to this database,

set the temp_buffers parameter to four times the default value.

1. Use the command ALTER DATABASE ... SET

https://postgrespro.com/docs/postgresql/13/sql-alterdatabase

More about the temp_buffers parameter:

https://postgrespro.com/docs/postgresql/13/runtime-config-resource#GUC-T

EMP-BUFFERS