Backup

Logical Backup

Authors: Egor Rogov, Pavel Luzanov

Translated by Liudmila Mantrova

Usage of Course Materials

Non-commercial use of course materials (presentations, demonstrations) is

allowed on an unrestricted basis. Commercial use is only possible with prior

written permission of Postgres Professional company. Modification of course

materials is forbidden.

Please send your feedback to: edu@postgrespro.ru

Disclaimer

In no event shall Postgres Professional company be liable for any damages

or loss, including loss of profits, that arise from direct or indirect, special or

incidental use of course materials. Postgres Professional company

specifically disclaims any warranties on course materials. Course materials

are provided “as is,” and Postgres Professional company has no obligations

to provide maintenance, support, updates, enhancements, or modifications.

Agenda

Logical and physical backups

Backup and restore of separate tables

Backup and restore of separate databases

Backup and restore of the whole cluster

Logical Backup

SQL commands to create objects and fill them with data

+ backup of a separate object or a database

+ recovery on a different architecture or PostgreSQL version

+ (binary compatibility is not required)

+ ease of use

− modest operation speed

− no point-in-time recovery

A logical backup is a set of SQL commands that can restore the database

cluster (or a separate database/table) from scratch: it creates all the

required objects and fills them with data.

These commands can be run on a different server version (if it provides

compatibility at the command level) or on a different platform/architecture

(binary compatibility is not required).

In particular, a logical backup can be used for long-term storage: you can

restore it even after upgrading the server to a higher version.

The process of creating a logical backup is relatively easy. It is usually

enough to run a single command or launch a single utility.

But for large databases, the execution of these commands can take a very

long time. Using a logical backup, you can restore your database system

only to its state exactly at the time the process of taking a backup was

started.

https://postgrespro.com/docs/postgresql/12/backup-dump

Physical Backup

A copy of the database cluster’s file system

+ faster than logical backup

+ statistics are restored

− recovery is only possible on a compatible system, with the same

PostgreSQL major version installed

− partial backup is impossible, the whole cluster is copied

WAL archive

+ point-in-time recovery is available

A physical backup implies creating a copy of all files related to the database

cluster, i.e., creating its full binary copy.

It is faster to copy files than dump SQL commands; besides, unlike restoring

a logical backup, starting a server using a physical copy is a matter of

several minutes. Another advantage is that you do not have to recollect

statistical data: it is also restored from a physical copy.

But this approach has its own shortcomings. A physical backup can be used

to restore the system only on a compatible platform (that has the same OS,

architecture, etc.) with the same PostgreSQL major version installed.

Besides, it is impossible to create a physical copy of a separate database:

you can only back up the whole database cluster.

Physical backups are usually used together with WAL archives. It enables

system recovery not only at the time of backup creation, but also at an

arbitrary point in time.

https://postgrespro.com/docs/postgresql/12/backup-file

https://postgrespro.com/docs/postgresql/12/continuous-archiving

Creating physical backups for any important production systems is a

common practice. It is the responsibility of a DBA to take such backups.

Making a Table Copy in SQL

YKS Yakutsk (129.770

MJZ Mirnyj (114.039

KHV Khabarovsk (135.188

PKC Petropavlovsk (1

UUS Yuzhno-Sakhalinsk (1

VVO Vladivostok (1

LED St. Petersburg (3

KGD Kaliningrad (2

KEJ Kemorovo (86.1072

CEK Chelyabinsk (6

MQF Magnetiogorsk (5

PEE Perm (56.0211

SGC Surgut (73.4018

=# COPY table TO 'file';

=# COPY table FROM 'file';

the file is located in the server file system and can be accessed by the owner of the

PostgreSQL instance

you can specify the columns to copy (or use an arbitrary query)

new rows are added to already existing ones during recovery

the format

is configurable

If you only need to save the contents of a single table, you can use the

COPY command.

The COPY TO flavor of this command enables you to save the table (or

some of its columns, or even the result of an arbitrary query) into a file,

display it in the terminal, or provide it as input to an application. You can also

specify some additional parameters, such as the format (plain text, CSV, or

binary), delimiter characters, text representation of NULL values, etc.

The COPY FROM flavor does the opposite: it reads data from a file or from

the terminal and inserts the retrieved rows into the table. The table is not

cleared in this case: new rows are simply appended to the already existing

ones.

The COPY command is much faster than the analogous INSERT commands:

the client does not have to access the server multiple times, and the server

does not have to repeatedly analyze the received commands.

Here is a subtle point: the COPY FROM command ignores the defined rules,

although integrity constraints and triggers are respected.

https://postgrespro.com/docs/postgresql/12/sql-copy

Making a Table Copy in psql

YKS Yakutsk (129.770

MJZ Mirnyj (114.039

KHV Khabarovsk (135.188

PKC Petropavlovsk (1

UUS Yuzhno-Sakhalinsk (1

VVO Vladivostok (1

LED St. Petersburg (3

KGD Kaliningrad (2

KEJ Kemorovo (86.1072

CEK Chelyabinsk (6

MQF Magnetiogorsk (5

PEE Perm (56.0211

SGC Surgut (73.4018

=# \copy table to 'file'

=# \copy table from 'file'

the file is located in the client file system and can be accessed by the OS user who

has started psql

the data is transferred between the client and the server

the syntax and the supported features are analogous to those provided by COPY

The psql utility provides a client version of the COPY command with a similar

syntax.

The file name provided in the COPY command corresponds to the file on the

database server. The user on whose behalf PostgreSQL is started (usually

postgres) must have access to this file.

The client implementation of this command refers to the file located on the

client, which allows keeping a local copy of data even if there is no access

to the server file system. The table contents is automatically sent between

the client and the server.

https://postgrespro.com/docs/postgresql/12/app-psql

A Database Backup

-- PostgreSQL database dum

-- Dumped from database ve

-- Dumped by pg_dump versi

SET statement_timeout = 0;

SET lock_timeout = 0;

SET

idle_in_transaction_sessio

SET client_encoding = 'UTF

SET standard_conforming_st

$ pg_dump -d database -f file

$ psql -f file

format: SQL commands

you can specify the database objects to be dumped

the new database must be cloned from template0

roles and tablespaces must be created in advance

it makes sense to perform ANALYZE after recovery

To create a full-fledged backup of a database, use the pg_dump utility.

If you omit the file name (-f, --file), the utility’s output will be displayed in the

terminal. The produced output is a script to be run in psql; it contains the

commands that will create the required objects and fill them with data.

You can use optional parameters to limit the set of backed up objects: for

example, you can choose to back up only particular tables, objects in

particular schemas, or use some other filters.

To restore the objects from the backup, it is enough to run the received

script in psql.

Note that the database to be restored should be cloned from template0

since all the changes made in template1 will also make it into the backup.

Besides, all the required roles and tablespaces must be set up in advance.

Since these objects do not belong to any particular database, they won’t be

included into the dump.

Once the database is restored, it makes sense to run the ANALYZE

command: it will collect statistics that the optimizer requires for query

planning.

https://postgrespro.com/docs/postgresql/12/app-pgdump

The custom Format

;

; Archive created at 2017-

; dbname: demo

; TOC Entries: 146

; Compression: -1

; Dump Version: 1.12-0

; Format: CUSTOM

;

; Selected TOC Entries:

;

2297; 1262 475453 DATABASE

5; 2615 475454 SCHEMA - bo

2298; 0 0 COMMENT - SCHEMA

$ pg_dump -d database -F c -f file

$ pg_restore -d database -j N file

an internal format with a table of contents (TOC)

database objects to be restored can be selected at the time of recovery

recovery can be performed in the parallel mode

The pg_dump utility allows you to specify the backup format. By default, the

plain format is used; it provides pure psql commands.

The custom format (-F c, --format=custom) creates a backup in a special

format that contains not only the backed up objects, but also a table of

contents (TOC). Having a TOC allows you to choose the objects to be

restored right at the time of recovery, not while making the dump.

By default, the output of the custom format is compressed.

To restore the database, you need to run another utility: pg_restore.

It reads the file and converts it to psql commands. If you do not explicitly

provide the database name (in the -d option), all commands will be output

to the terminal. If the database is specified, pg_restore will connect to this

database and execute the commands; you won’t have to start psql.

To restore only some of the objects, you can use one of the following

approaches. The first one is to filter the objects to be restored, just like it is

done in pg_dump. In fact, pg_restore supports many pg_dump parameters.

The second option is to use the TOC to retrieve the list of objects included

into the backup (via the --list option). Then you can edit this list manually:

delete the objects you do not need and pass the modified list to pg_restore

(via the --use-list option).

https://postgrespro.com/docs/postgresql/12/app-pgrestore.html

The directory Format

$ pg_dump -d database -F d -j N -f directory

$ pg_restore -d database -j N directory

the directory contains a separate file for each database object and a TOC

database objects to be restored can be selected at the time of recovery

both dump and restore operations can be performed in the parallel mode

a common data

snapshot guarantees

data consistency

You can also create backups in the directory format. In this case, pg_dump

produces a whole directory instead of a single file; it contains the backed up

objects and the table of contents. By default, all files in the directory are

compressed.

Its advantage over the custom format is that such a backup can be created

concurrently using several processes (the number of processes is specified

in option -j, --jobs).

Naturally, the backup will contain consistent data even though it has been

created concurrently. Consistency is ensured by using a single data

snapshot for all parallel processes.

https://postgrespro.com/docs/postgresql/12/functions-admin.html#FUNCTIO

NS-SNAPSHOT-SYNCHRONIZATION

Data recovery can also be performed in the parallel mode (it is also

supported for the custom format).

Other capabilities are quite similar to those provided by the previously

discussed formats: the directory format supports the same options and

approaches.

Format Comparison

plain custom directory tar

recovery

utility

psql pg_restore

compression zlib

partial restore yes yes yes

parallel backup yes

parallel restore yes yes

This slide compares different features provided by different backup formats.

Note that there is also one more format available: tar. We do not cover it

here as it does not bring anything new and has no advantages as compared

to other formats. In fact, this format is simply a tar-ed version of the directory

format, but it does not support compression or parallel execution.

A Database Cluster Backup

-- PostgreSQL database clu

SET default_transaction_re

SET client_encoding = 'UTF

SET standard_conforming_st

-- Roles

CREATE ROLE postgres;

$ pg_dumpall -f file

$ psql -f file

format: SQL commands

dumps the whole cluster, including roles and tablespaces

the user must have access to all objects of the database cluster

parallel backups are not supported

To back up the whole cluster, including roles and tablespaces, you can use

the pg_dumpall utility.

Since pg_dumpall requires access to all objects of all databases, it make

sense to run it on behalf of a superuser. The utility connects to each

database and dumps their contents using pg_dump. Besides, it also saves

cluster-wide data.

To start this process, pg_dumpall has to establish a connection with any

available database. By default, either postgres or template1 is selected,

but you can also specify a different database.

The pg_dumpall utility produces a psql script. This is the only supported

format. It means that pg_dumpall cannot perform parallel dumps, and it can

turn out to be a problem for large volumes of data. In this case, you can use

the --globals-only option to dump only roles and tablespaces, while all

databases will be dumped separately using pg_dump in the parallel mode.

https://postgrespro.com/docs/postgresql/12/app-pg-dumpall

Summary

Logical backups can be taken for the whole cluster, a particular

database, or separate database objects

Are good for

small amounts of data

long-term storage during which the server can been upgraded

migration to a different platform

Are not so good for

crash recovery with minimal data loss

Practice

1. Back up the bookstore database in the custom format.

“Accidentally” empty the authorship table. Check that the

application has stopped displaying book titles in “Bookstore,”

“Books,” and “Catalog” tabs.

Use the created backup to restore the lost data.

Check that normal operation of the bookstore is restored.

Task 1. Use the --data-only option for the restore operation as an attempt to

create a table will result in an error.

Practice

1. Create a table with a policy that allows reading only some of the

rows. Create an unprivileged role for Alice and grant her access

to this table.

Alice is responsible for creating table backups. Can she do it

without superuser rights? Try it out.

2. The \copy command provided by psql enables you to pass the

result as input to an arbitrary application. Use this capability to

open the result of some query in the Calc spreadsheet of

LibreOffice.

Task 1. While superuser roles bypass RLS policies, Alice won’t be able to

access some of the table rows, without even knowing that the received data

is incomplete.

If the row_security parameter is set, Alice will be notified that some data has

not been selected. Granting the BYPASSRLS privilege to this role will solve

the problem.

Task 2. The command must save the result into a file and then start

libreoffice with this file passed as a parameter. The file must be saved in the

CSV format.

Naturally, this approach is platform-dependent and will require modifications,

say, on Windows.