Replication
Overview
15
Copyright
© Postgres Professional, 2023
Authors: Egor Rogov, Pavel Luzanov, Ilya Bashtanov
Translated by Alexander Meleshko
Cover photo by Oleg Bartunov (Phu monastery and Bhrikuti peak, Nepal)
Use of course materials
Non-commercial use of course materials (presentations, demonstrations) is
allowed without restrictions. Commercial use is possible only with the written
permission of Postgres Professional. It is prohibited to make changes to the
course materials.
Feedback
Please send your feedback, comments and suggestions to:
Disclaimer
Postgres Professional assumes no responsibility for any damages and
losses, including loss of income, caused by direct or indirect, intentional or
accidental use of course materials. Postgres Professional company
specifically disclaims any warranties on course materials. Course materials
are provided “as is,” and Postgres Professional company has no obligations
to provide maintenance, support, updates, enhancements, or modifications.
2
Agenda
Replication purposes and types
Physical replication
Logical replication
Replication use cases
3
Replication purposes
Replication
the process of synchronizing multiple copies of a database cluster
on different servers
Purposes
reliability if one of the servers fails,
the system must maintain availability
(performance degradation is acceptable)
scalability load distribution between servers
A single database server may not meet all the requirements.
First, reliability. One physical server is a possible point of failure. If the
server fails, the system becomes unavailable.
Secondly, performance. One server may not be able to handle
the load. Often, the ability to scale and distribute the load between multiple
servers is preferable to increasing single server capacity.
The solution is to have multiple servers managing the same databases.
Replication refers to the process of synchronizing these servers.
4
Replication types
Physical
primary-replica: data flow in one direction only
delivery of WAL records or files
binary server compatibility is required
only the cluster as a whole can be replicated
Logical
publication-subscription: data flow is possible in both directions
row level information (log_level = logical)
protocol-level compatibility is required
can replicate individual tables
There are multiple ways to set up synchronization between servers. The two
main venues available in PostgreSQL are physical and logical replication.
During physical replication, one server is assigned the main server role and
the other becomes a replica. The main server transfers WAL records to a
replica (in the form of files or a stream of records). The replica applies these
records to its data files. The WAL record application is purely mechanical,
without "understanding the meaning" of the changes, so binary compatibility
between servers is necessary (the same platforms and major PostgreSQL
versions). Since the WAL is shared across the entire cluster, only the cluster
as a whole can be replicated.
During logical replication, higher-level information is added to the WAL,
allowing the replica to sort out changes at the row level (requires the
parameter wal_level = logical). This sort of replication does not require
binary compatibility, it only needs the replica to be able to understand the
incoming WAL information. Logical replication allows, if necessary, to
replicate only the changes made to individual tables.
Logical replication was introduced in PostgreSQL 10. Before, you had to use
the pg_logical extension or set up trigger-based replication.
5
Physical replication
How physical replication works
WAL transmission modes
Replica usage
Switching to replica and back
Replica configuration and use cases
Let's discuss physical replication first.
It works by translating changes to the replica in the form of WAL records.
This is a very efficient mechanism, but it requires binary compatibility
between servers (the major version of the server, the operating system, the
hardware platform).
Physical replication is one-way only: while there may be any number of
replicas, there is always only one main server.
6
Replication
Backup
base backup via pg_basebackup
WAL files archive
Continuous recovery
deploy the backup
set configuration parameters
create a standby.signal file
launch the server
the server restores consistency and continues to apply incoming logs
delivery by replication protocol stream or WAL archive
connections (read-only) are allowed immediately after consistency is restored
Setting up replication is very similar to setting up a physical backup. The
difference is that the backup deploys immediately, without waiting for the
main server to crash, and works in continuous recovery mode: it
continuously reads and applies new WAL segments coming from the main
server. To tell the replica to start in this mode, a standby.signal file is created
instead of recovery.signal.
This way, the replica is constantly maintained in an almost up-to-date state
and if the main server fails, the replica is ready to take over.
By default, the replica operates in the "hot standby" mode. This means that
during the recovery process, it allows connections to read data (as soon as
consistency is restored). You can prohibit these connections (this is called
"warm standby").
Unlike backup, replication does not allow you to recover to an arbitrary point
in the past. In other words, replication cannot be used to correct an error
(although it is possible to configure a replica so that it lags behind the main
server by a certain amount of time).
7
Replica usage
Allowed
read-only queries (select, copy to, cursors)
setting server parameters (set, reset)
transaction management (begin, commit, rollback...)
creating a backup (pg_basebackup)
Not allowed
any changes (insert, update, delete, truncate, nextval...)
locks expecting changes (select for update...)
DDL commands (create, drop...), including creating temporary tables
maintenance commands (vacuum, analyze, reindex...)
access control (grant, revoke...)
triggers and advisory locks are disabled
In hot standby mode, no data changes (including sequences), locks, DDL
commands, service commands such as VACUUM and ANALYZE, or access
control commands are allowed on the replica. Basically, anything that
changes the data in any way is prohibited.
The replica can still process read-only queries. Changing server parameters
and executing transaction management commands is allowed. For example,
you can start a (reading) transaction with a specific isolation level.
In addition, the replica can also be used for making backups (of course,
taking into account the possible lag behind the main server).
8
Streaming replication
main server
(primary)
WAL segments
select, insert
update, delete
backup server
(replica)
wal sender wal receiver
continuous
recovery
feedback
There are two ways to deliver WALs from the primary server to the replica.
The one used more commonly in production is streaming replication.
In this case, the replica connects to the primary server via the replication
protocol and receives the WAL record stream. This minimizes the replica lag
and can even eliminate it entirely (in synchronous mode).
There's a notable possible issue with reading from a replica. While a query
on a replica takes a MVCC snapshot, the primary server may vacuum the
row versions required for the snapshot. The affected query on the replica will
have to be terminated in this case. With streaming replication, the issue is
resolved by the feedback mechanism. It lets the primary server know if any
transaction IDs are in use by the replica so that it can delay the vacuuming.
10
Replication via WAL archive
archive_command
WAL archive
main server
(primary)
WAL segments
select, insert
update, delete
backup server
(replica)
alternative
WAL source
wal sender wal receiver
During streaming replication, there's a chance that the primary server will
delete a WAL segment that hasn't been received by the replica yet. To
ensure that it does not happen, you have to use either a replication slot or
streaming replication together with a WAL archive (that you need for
backups anyway).
When using a WAL archive, a special archiver process on the primary
server archives full WAL segments using the archive_command (this
mechanism is discussed in the Backup module).
If the replica cannot receive the next WAL entry via the replication protocol,
it will try to read it from the archive using the command from the
restore_command parameter.
In fact, replication can work with just the archive, without streaming
replication. But in this case:
- the replica is forced to lag behind the primary server by the time it takes to
fill the WAL segment,
- the primary server is not aware of the replica's existence, so vacuuming
can delete the row versions needed for replica snapshots (you can set up a
delay for applying conflicting records, but it is not always clear how long the
delay should it be).
11
Switching to a replica
Scheduled switchover
shutdown of the main server for maintenance without interruption of service
manual mode
Emergency switchover
switch to a replica due to a primary server failure
manual mode, but can be automated with external cluster software
There are different reasons for switching to a backup server. The switchover
can be performed routinely at a convenient time to allow for maintenance
shutdown of the main server. If it is a main server failure, on the other hand,
the switchover has to be performed as quickly as possible to avoid service
downtime.
Even an emergency switchover must be performed manually, because
PostgreSQL does not come with integrated cluster management software
that should monitor the state of the servers and initiate switchovers.
12
Switching to a replica
main server
(primary)
WAL segments
select, insert
update, delete
backup server
(replica)
wal sender wal receiver
The image above illustrates the state of the servers before a switchover.
The main server is on the left, the replica on the right, and replication is set
up between the two.
13
Switching to a replica
former main server
WAL segments
main server
(former replica)
select, insert
update, delete
In case of a main server failure or a planned switchover, the replica is given
the command to stop recovering and become an independent server, and
the former main server is disconnected.
Of course, a way to redirect users to a new server is required, but this is
done by means outside of PostgreSQL.
15
Main server recovery
backup server
(former main server)
main server
select, insert
update, delete
wal senderwal receiver
After the former main server is recovered or any maintenance on it is
complete, it connects as a replica to the new main server.
16
Main server recovery
Simply reconnecting the server will not work
WAL records missed by the replica due to delay will be lost
Restoring from a backup from scratch
a fresh new replica is deployed at the former primary server
this is time-consuming (rsync can accelerate it somewhat)
pg_rewind
“rolls back” the lost WAL records, replacing the corresponding pages on the
disk with pages from the new primary server
comes with a number of limitations
If the switchover has occurred due to a hardware failure (disk or server
replacement is required) or an operating system failure (OS reinstallation is
required), then the only option is to create a completely new replica on the
server.
If the switchover was a planned one, the server can be reconnected quickly
(now as a replica).
Unfortunately, you can't simply switch the server back on and connect it to
the new primary server over the replication protocol. Because of replication
delay, some WAL records could have not made it to the replica. If the old
primary server has such records and the new primary server doesn't, then
applying WALs from the new primary server will ruin the database.
You can always scrap the old primary server data and create a brand new
replica instead from a base backup. However, for large databases, this can
take a long time. The rsync process can speed this up to an extent.
An even faster option is to use pg_rewind
pg_rewind detects WAL records that have not reached the replica (starting
from the last common checkpoint) and finds the pages affected by these
records. The pages (should be just a few) are replaced with pages from the
new primary server. In addition, pg_rewind copies all service files from the
source server (the new primary server). The usual recovery process takes it
from there.
17
high availability
and distributed load for reading
1. Multiple replicas
main server
(primary)
WAL segments
select, insert
update, delete
backup server
(replica A)
wal sender
wal receiver
backup server
(replica B)
wal receiver
wal sender
The replication mechanism offers flexible system design options for a variety
of applications. Let's consider several typical cases and possible solutions.
Case: a system with high availability and reading load distribution.
Solution: a primary server and multiple replicas. Replicas can serve read-
only queries and can take over immediately if the main server fails.
Each replica will have a dedicated wal sender process on the main server
and a replication slot, if necessary.
The reading load distribution between replicas must be done by external
software.
18
data storage reliability
2. Synchronous replication
main server
(primary)
WAL segments
select, insert
update, delete
backup server
(replica A)
wal sender
wal receiver
backup server
(replica B)
wal receiver
wal sender
synchronous
replica sees
changes before the
main server
asynchronous
replica may
lag behind
Case: in the event of a main server failure, no data must be lost when
switching to a replica.
Synchronous replication is the solution. In a single server environment,
synchronous WAL recording ensures that the committed data will not be lost
in the event of a failure. Replication works in a similar manner. Changes on
the main server are committed only when a confirmation from the replica is
received. If necessary, synchronization can be managed at the transaction
level.
Synchronous replication does not ensure perfect data consistency between
servers. Changes can become visible on the primary server and on the
replica at different points in time.
Synchronization can be set up with multiple replicas. In this case, you can
also set up quorum-based voting.
In the image above, replica B is asynchronous and may lag behind; replica A
is synchronous. When committing changes, the main server performs
following actions:
- makes a WAL record (so that the change is not lost in case of failure),
- waits for a confirmation from the replica that the WAL record was received
on its end,
- changes the state of the transaction in the clog buffer.
With this setup, a query to a synchronous replica can see changes even
earlier than a query to the primary server.
19
3. Cascading replication
main server
(primary)
WAL segments
select, insert
update, delete
backup server
(replica B)
wal sender wal receiver
backup server
(replica A)
wal receiver
wal sender
multiple replicas
no additional load on the primary server
Case: have multiple replicas without creating additional load on the main
server.
The solution is to use cascading replication. With this setup, each replica
sequentially transfers WAL records to the next one.
Cascading replication does not support synchronization, but the main server
still collects feedback from all replicas, so the functionality is there.
When switching over, the replica closest to the main server in the replication
chain should be selected, as it lags behind the least.
The image shows that the main server has only one wal sender process,
and replicas transfer WAL records to each other along the chain. The further
away a replica from the main server, the more the delay. With this setup, the
replication process has to be monitored at multiple servers at once, making
monitoring more complicated.
20
4. Delayed replication
main server
(primary)
select, insert
update, delete
backup server
(replica)
wal sender wal receiver
WAL application delay
WAL segments
“time machine”
can recover to a specific moment in time without a WAL archive
Case: have the ability to view data at and recover to an arbitrary point in
time.
The usual archive-based point-in-time recovery mechanism can work here,
but it requires a lot of preparation and takes a lot of time. And PostgreSQL
itself doesn't allow to make data snapshots for a given moment in the past.
The solution is to have a replica apply WAL records not immediately, but
with a certain delay.
In order for the delay to work correctly, clock synchronization between
servers is necessary.
If a replica switches from continuous recovery mode into normal operation,
the rest of the records will be applied immediately.
Feedback is tricky in this setup. A large delay will cause table bloating on the
main server, since vacuuming will not delete old versions of rows that may
be needed by the replica as quickly as it usually does.
21
Logical replication
Publications and subscriptions
Conflict detection and resolution
Replica configuration and usage options
Stock logical replication tools first appeared in PostgreSQL 10.
Row-level logical changes are transmitted over the replication protocol.
Logical replication requires the wal_level parameter set to logical.
There is no main server or replica roles in logical replication, so it is possible
to set up bidirectional replication.
22
Logical replication
Publisher
streams data changes row by row in the order they are committed
(replicates INSERT, UPDATE, DELETE and TRUNCATE commands)
can do initial synchronization
always uses the logical replication slot
wal_level = logical
Subscriber
receives and applies changes
no parsing, rewriting and planning, just blind execution
possible conflicts with local data
Logical replication uses the publish-subscribe pattern. A publication is
created on one server, which can include a number of tables in a single
database. Other servers can subscribe to this publication to receive and
apply changes to the tables.
Only table row modifications are replicated, not SQL commands. DDL
commands are not transmitted, so target tables on the subscriber must be
created manually. Initial synchronization can be used to synchronize the
tables when a subscription is created.
The information about modified rows is extracted and decoded from existing
WAL records on the publishing server, and then sent by the wal sender
process to the subscriber over the replication protocol. The transmission
format is independent of the platform and server version. The log level on
the publishing server (the wal_level parameter) must be set to ''logical'' for
this all to work.
The logical replication worker process on the subscriber accepts and applies
the changes. In order to guarantee transmission reliability (no losses and
repetitions), a logical replication slot (similar to the physical replication slot)
is required.
The changes are applied without executing SQL commands, avoiding the
overhead of parsing and planning. On the other hand, a single SQL
command can result in multiple one-row changes.
23
Logical replication
publisher
select, insert
update, delete
subscriber
wal sender
WAL segments
select, insert
update, delete
WAL segments
logical repl.
worker
has
superuser
privileges
The image shows the logical replication worker process on the subscriber
that receives data from the publisher and applies it. Meanwhile, the server
works normally and accepts both read and write queries.
25
Conflicts
Identification modes for modifying and deleting rows
primary key columns (default)
columns of a specific unique index with the NOT NULL constraint
all columns
no identification (default for the system catalog)
Conflicts: violation of integrity constraints
replication is suspended until the conflict is resolved manually
correct the data or skip the conflicting transaction
Inserting new rows is straightforward. Changes and deletions are more
complicated. These operations need to somehow identify the old version of
the row. By default, primary key columns are used for this, but you can
specify other ways (replica identity) when defining a table, i.e. use a unique
index or all the table columns. Or you can disable replication for some tables
altogether (system catalog tables have it disabled by default).
Since the table on the publisher and the table on the subscriber can change
independently of each other, conflicts in the form of integrity constraint
violations are possible when inserting new row versions. Whenever this
happens, the process of applying records is suspended until the conflict is
resolved manually. You can either correct the data on the subscriber to
resolve the conflict, or cancel the application of the conflicting records.
27
receipt and consolidation
of data from regional servers
1. Consolidation
main
server
select, insert
update, delete
regional
server
wal sender
regional
server
wal sender
select, insert
update, delete
WAL segments
select, insert
update, delete
WAL segments
logical repl.
worker
logical repl.
worker
WAL segments
Let's discuss some logical replication use cases.
Suppose there are several regional branches, each of which runs on its own
PostgreSQL server. The goal is to consolidate some of the data on a central
server.
First, publications of the necessary data are created on regional servers.
The central server subscribes to these publications. The received data can
be processed using triggers on the central server (for example, unifying the
data format).
Inverted, the setup allows, for example, to transfer reference information
from the central server to regional ones.
Note that the replication relies on maintaining logical replication slots, and
the slots require a stable connection. If the connection breaks, the main
server will be forced to save its WAL files on disk.
An existing business logic may apply additional constraints on the system.
In some cases, it may be easier to transmit data in batches every now and
again.
The image shows two WAL receivers running on the central server, one for
each subscription.
28
2. Rolling out server updates
old server
select, insert
update, delete
new server
WAL segments
updating the PostgreSQL version
without interruption of service
14.7
15.2
Case: update the PostgreSQL major version on the server without
interrupting the service.
The two major versions don't have binary compatibility, so physical
replication will not work. However, logical replication can solve the problem.
As usual, external tools are required to switch users between servers.
First, a new server is created with the desired PostgreSQL version.
29
2. Rolling out server updates
old server
select, insert
update, delete
new server
wal sender
logical repl.
worker
WAL segments
updating the PostgreSQL version
without interruption of service
14.7 15.2
initial
synchronization
Then, logical replication of all required databases is set up between the
servers, and the servers are synchronized. This is possible because logical
replication does not require binary compatibility between servers.
30
updating the PostgreSQL version
without interruption of service
2. Rolling out server updates
new server
15.2
select, insert
update, delete
WAL segments
After that, clients switch to the new server, and the old one shuts down.
In practice, the process of updating major server versions using logical
replication is much more complicated and difficult. It is discussed in more
detail in the the topic "Server Update" in the DBA2 course.
31
3. Primary-primary
main server
main server
select, insert
update, delete
WAL segments
select, insert
update, delete
WAL segments
wal sender
logical repl.
worker
logical repl.
worker
wal sender
a cluster where
multiple servers can modify data
Case: provide reliable data storage on multiple servers with the ability to
change the data on any node.
Regular physical replication allows you to change data only on the primary
server. Logical replication makes it possible to change data simultaneously
on multiple servers. This requires that the applications working with the
cluster are built with certain considerations in mind in order to avoid conflicts
when modifying data in the same table. One of those is to ensure that
different servers work with different ranges of keys.
Keep in mind that the primary-primary setup with logical replication will not
support global distributed transactions. With synchronous replication,
reliability can be ensured, but consistency of data between servers cannot.
In addition, PostgreSQL does not offer any tools for automatic failure
processing, connecting or removing nodes from the cluster, etc. These tasks
must be solved by external means.
The image shows a primary-primary setup. Each of the servers creates a
publication and a subscription, establishing a bidirectional exchange of WAL
records. PostgreSQL 15 does not support such replication just yet, but this
feature is bound to appear sooner or later (see extensions pg_logical
32
Takeaways
The replication mechanism works by delivering
WAL records to the replica and applying them there
streaming WAL records or transferring files
Physical replication creates an exact copy of the entire cluster
unidirectional
requires binary compatibility
Logical replication streams individual row changes
multidirectional
requires protocol-level compatibility
33
Practice
1. Set up physical streaming replication between the two servers
in synchronous mode.
2. Verify that replication works as intended. Make sure that when
the replica is stopped, commits on the primary server could not
be completed.
3. Exit recovery mode on the replica.
4. Create two tables on both servers.
5. Set up logical replication for the first table from the first server
to the second, and the other way around for the second table.
6. Verify that replication works as intended.
Task 1. To do this, set the parameters on the main server:
- synchronous_commit = on,
- synchronous_standby_names = 'replica',
and on the replica in the postgresql.auto.conf file, add
"application_name=replica" to the primary_conninfo parameter.