Backup
Logical Backup
12
Copyright
© Postgres Professional, 2017–2021
Authors: Egor Rogov, Pavel Luzanov
Translated by Liudmila Mantrova
Usage of Course Materials
Non-commercial use of course materials (presentations, demonstrations) is
allowed on an unrestricted basis. Commercial use is only possible with prior
written permission of Postgres Professional company. Modification of course
materials is forbidden.
Contact Us
Please send your feedback to: edu@postgrespro.ru
Disclaimer
In no event shall Postgres Professional company be liable for any damages
or loss, including loss of profits, that arise from direct or indirect, special or
incidental use of course materials. Postgres Professional company
specifically disclaims any warranties on course materials. Course materials
are provided “as is,” and Postgres Professional company has no obligations
to provide maintenance, support, updates, enhancements, or modifications.
2
Agenda
Logical and physical backups
Backup and restore of separate tables
Backup and restore of separate databases
Backup and restore of the whole cluster
3
Logical Backup
SQL commands to create objects and fill them with data
+ backup of a separate object or a database
+ recovery on a different architecture or PostgreSQL version
+ (binary compatibility is not required)
+ ease of use
− modest operation speed
− no point-in-time recovery
A logical backup is a set of SQL commands that can restore the database
cluster (or a separate database/table) from scratch: it creates all the
required objects and fills them with data.
These commands can be run on a different server version (if it provides
compatibility at the command level) or on a different platform/architecture
(binary compatibility is not required).
In particular, a logical backup can be used for long-term storage: you can
restore it even after upgrading the server to a higher version.
The process of creating a logical backup is relatively easy. It is usually
enough to run a single command or launch a single utility.
But for large databases, the execution of these commands can take a very
long time. Using a logical backup, you can restore your database system
only to its state exactly at the time the process of taking a backup was
started.
4
Physical Backup
A copy of the database clusters file system
+ faster than logical backup
+ statistics are restored
− recovery is only possible on a compatible system, with the same
PostgreSQL major version installed
− partial backup is impossible, the whole cluster is copied
WAL archive
+ point-in-time recovery is available
A physical backup implies creating a copy of all files related to the database
cluster, i.e., creating its full binary copy.
It is faster to copy files than dump SQL commands; besides, unlike restoring
a logical backup, starting a server using a physical copy is a matter of
several minutes. Another advantage is that you do not have to recollect
statistical data: it is also restored from a physical copy.
But this approach has its own shortcomings. A physical backup can be used
to restore the system only on a compatible platform (that has the same OS,
architecture, etc.) with the same PostgreSQL major version installed.
Besides, it is impossible to create a physical copy of a separate database:
you can only back up the whole database cluster.
Physical backups are usually used together with WAL archives. It enables
system recovery not only at the time of backup creation, but also at an
arbitrary point in time.
Creating physical backups for any important production systems is a
common practice. It is the responsibility of a DBA to take such backups.
5
Making a Table Copy in SQL
YKS Yakutsk (129.770
MJZ Mirnyj (114.039
KHV Khabarovsk (135.188
PKC Petropavlovsk (1
UUS Yuzhno-Sakhalinsk (1
VVO Vladivostok (1
LED St. Petersburg (3
KGD Kaliningrad (2
KEJ Kemorovo (86.1072
CEK Chelyabinsk (6
MQF Magnetiogorsk (5
PEE Perm (56.0211
SGC Surgut (73.4018
=# COPY table TO 'file';
=# COPY table FROM 'file';
the file is located in the server file system and can be accessed by the owner of the
PostgreSQL instance
you can specify the columns to copy (or use an arbitrary query)
new rows are added to already existing ones during recovery
the format
is configurable
If you only need to save the contents of a single table, you can use the
COPY command.
The COPY TO flavor of this command enables you to save the table (or
some of its columns, or even the result of an arbitrary query) into a file,
display it in the terminal, or provide it as input to an application. You can also
specify some additional parameters, such as the format (plain text, CSV, or
binary), delimiter characters, text representation of NULL values, etc.
The COPY FROM flavor does the opposite: it reads data from a file or from
the terminal and inserts the retrieved rows into the table. The table is not
cleared in this case: new rows are simply appended to the already existing
ones.
The COPY command is much faster than the analogous INSERT commands:
the client does not have to access the server multiple times, and the server
does not have to repeatedly analyze the received commands.
Here is a subtle point: the COPY FROM command ignores the defined rules,
although integrity constraints and triggers are respected.
6
Making a Table Copy in psql
YKS Yakutsk (129.770
MJZ Mirnyj (114.039
KHV Khabarovsk (135.188
PKC Petropavlovsk (1
UUS Yuzhno-Sakhalinsk (1
VVO Vladivostok (1
LED St. Petersburg (3
KGD Kaliningrad (2
KEJ Kemorovo (86.1072
CEK Chelyabinsk (6
MQF Magnetiogorsk (5
PEE Perm (56.0211
SGC Surgut (73.4018
=# \copy table to 'file'
=# \copy table from 'file'
the file is located in the client file system and can be accessed by the OS user who
has started psql
the data is transferred between the client and the server
the syntax and the supported features are analogous to those provided by COPY
The psql utility provides a client version of the COPY command with a similar
syntax.
The file name provided in the COPY command corresponds to the file on the
database server. The user on whose behalf PostgreSQL is started (usually
postgres) must have access to this file.
The client implementation of this command refers to the file located on the
client, which allows keeping a local copy of data even if there is no access
to the server file system. The table contents is automatically sent between
the client and the server.
8
A Database Backup
--
-- PostgreSQL database dum
--
-- Dumped from database ve
-- Dumped by pg_dump versi
SET statement_timeout = 0;
SET lock_timeout = 0;
SET
idle_in_transaction_sessio
SET client_encoding = 'UTF
SET standard_conforming_st
$ pg_dump -d database -f file
$ psql -f file
format: SQL commands
you can specify the database objects to be dumped
the new database must be cloned from template0
roles and tablespaces must be created in advance
it makes sense to perform ANALYZE after recovery
To create a full-fledged backup of a database, use the pg_dump utility.
If you omit the file name (-f, --file), the utility’s output will be displayed in the
terminal. The produced output is a script to be run in psql; it contains the
commands that will create the required objects and fill them with data.
You can use optional parameters to limit the set of backed up objects: for
example, you can choose to back up only particular tables, objects in
particular schemas, or use some other filters.
To restore the objects from the backup, it is enough to run the received
script in psql.
Note that the database to be restored should be cloned from template0
since all the changes made in template1 will also make it into the backup.
Besides, all the required roles and tablespaces must be set up in advance.
Since these objects do not belong to any particular database, they won’t be
included into the dump.
Once the database is restored, it makes sense to run the ANALYZE
command: it will collect statistics that the optimizer requires for query
planning.
9
The custom Format
;
; Archive created at 2017-
; dbname: demo
; TOC Entries: 146
; Compression: -1
; Dump Version: 1.12-0
; Format: CUSTOM
;
; Selected TOC Entries:
;
2297; 1262 475453 DATABASE
5; 2615 475454 SCHEMA - bo
2298; 0 0 COMMENT - SCHEMA
$ pg_dump -d database -F c -f file
$ pg_restore -d database -j N file
an internal format with a table of contents (TOC)
database objects to be restored can be selected at the time of recovery
recovery can be performed in the parallel mode
The pg_dump utility allows you to specify the backup format. By default, the
plain format is used; it provides pure psql commands.
The custom format (-F c, --format=custom) creates a backup in a special
format that contains not only the backed up objects, but also a table of
contents (TOC). Having a TOC allows you to choose the objects to be
restored right at the time of recovery, not while making the dump.
By default, the output of the custom format is compressed.
To restore the database, you need to run another utility: pg_restore.
It reads the file and converts it to psql commands. If you do not explicitly
provide the database name (in the -d option), all commands will be output
to the terminal. If the database is specified, pg_restore will connect to this
database and execute the commands; you won’t have to start psql.
To restore only some of the objects, you can use one of the following
approaches. The first one is to filter the objects to be restored, just like it is
done in pg_dump. In fact, pg_restore supports many pg_dump parameters.
The second option is to use the TOC to retrieve the list of objects included
into the backup (via the --list option). Then you can edit this list manually:
delete the objects you do not need and pass the modified list to pg_restore
(via the --use-list option).
10
The directory Format
$ pg_dump -d database -F d -j N -f directory
$ pg_restore -d database -j N directory
the directory contains a separate file for each database object and a TOC
database objects to be restored can be selected at the time of recovery
both dump and restore operations can be performed in the parallel mode
a common data
snapshot guarantees
data consistency
You can also create backups in the directory format. In this case, pg_dump
produces a whole directory instead of a single file; it contains the backed up
objects and the table of contents. By default, all files in the directory are
compressed.
Its advantage over the custom format is that such a backup can be created
concurrently using several processes (the number of processes is specified
in option -j, --jobs).
Naturally, the backup will contain consistent data even though it has been
created concurrently. Consistency is ensured by using a single data
snapshot for all parallel processes.
Data recovery can also be performed in the parallel mode (it is also
supported for the custom format).
Other capabilities are quite similar to those provided by the previously
discussed formats: the directory format supports the same options and
approaches.
11
Format Comparison
plain custom directory tar
recovery
utility
psql pg_restore
compression zlib
partial restore yes yes yes
parallel backup yes
parallel restore yes yes
This slide compares different features provided by different backup formats.
Note that there is also one more format available: tar. We do not cover it
here as it does not bring anything new and has no advantages as compared
to other formats. In fact, this format is simply a tar-ed version of the directory
format, but it does not support compression or parallel execution.
13
A Database Cluster Backup
--
-- PostgreSQL database clu
--
SET default_transaction_re
SET client_encoding = 'UTF
SET standard_conforming_st
--
-- Roles
--
CREATE ROLE postgres;
$ pg_dumpall -f file
$ psql -f file
format: SQL commands
dumps the whole cluster, including roles and tablespaces
the user must have access to all objects of the database cluster
parallel backups are not supported
To back up the whole cluster, including roles and tablespaces, you can use
the pg_dumpall utility.
Since pg_dumpall requires access to all objects of all databases, it make
sense to run it on behalf of a superuser. The utility connects to each
database and dumps their contents using pg_dump. Besides, it also saves
cluster-wide data.
To start this process, pg_dumpall has to establish a connection with any
available database. By default, either postgres or template1 is selected,
but you can also specify a different database.
The pg_dumpall utility produces a psql script. This is the only supported
format. It means that pg_dumpall cannot perform parallel dumps, and it can
turn out to be a problem for large volumes of data. In this case, you can use
the --globals-only option to dump only roles and tablespaces, while all
databases will be dumped separately using pg_dump in the parallel mode.
15
Summary
Logical backups can be taken for the whole cluster, a particular
database, or separate database objects
Are good for
small amounts of data
long-term storage during which the server can been upgraded
migration to a different platform
Are not so good for
crash recovery with minimal data loss
16
Practice
1. Back up the bookstore database in the custom format.
“Accidentally” empty the authorship table. Check that the
application has stopped displaying book titles in “Bookstore,”
“Books,” and “Catalog” tabs.
Use the created backup to restore the lost data.
Check that normal operation of the bookstore is restored.
Task 1. Use the --data-only option for the restore operation as an attempt to
create a table will result in an error.
17
Practice
1. Create a table with a policy that allows reading only some of the
rows. Create an unprivileged role for Alice and grant her access
to this table.
Alice is responsible for creating table backups. Can she do it
without superuser rights? Try it out.
2. The \copy command provided by psql enables you to pass the
result as input to an arbitrary application. Use this capability to
open the result of some query in the Calc spreadsheet of
LibreOffice.
Task 1. While superuser roles bypass RLS policies, Alice wont be able to
access some of the table rows, without even knowing that the received data
is incomplete.
If the row_security parameter is set, Alice will be notified that some data has
not been selected. Granting the BYPASSRLS privilege to this role will solve
the problem.
Task 2. The command must save the result into a file and then start
libreoffice with this file passed as a parameter. The file must be saved in the
CSV format.
Naturally, this approach is platform-dependent and will require modifications,
say, on Windows.