Architecture

Isolation and MVCC

Authors: Egor Rogov, Pavel Luzanov

Translated by Liudmila Mantrova

Usage of Course Materials

Non-commercial use of course materials (presentations, demonstrations) is

allowed on an unrestricted basis. Commercial use is only possible with prior

written permission of Postgres Professional company. Modification of course

materials is forbidden.

Please send your feedback to: edu@postgrespro.ru

Disclaimer

In no event shall Postgres Professional company be liable for any damages

or loss, including loss of profits, that arise from direct or indirect, special or

incidental use of course materials. Postgres Professional company

specifically disclaims any warranties on course materials. Course materials

are provided “as is,” and Postgres Professional company has no obligations

to provide maintenance, support, updates, enhancements, or modifications.

Agenda

Multi-version concurrency control (MVCC)

Data snapshots

Isolation levels

Locks

Vacuum

MVCC

There can be multiple versions of one and the same row

different versions belong to different time frames

time = transaction ID (xid), assigned in the ascending order

row:

xid

DELETE

INSERT

UPDATE

version 1 version 2 version 3

UPDATE

Running several concurrent sessions poses an interesting question: what

should be done if two transactions access one and the same row

simultaneously? The answer is obvious if both of them read data. Handling

two writing transactions is also easy (they are simply queued to update the

data sequentially). The most intricate case is managing reads and writes.

There are two straightforward ways to do it. Transactions can lock each

other, but then performance will suffer. Alternatively, a reading transaction

could immediately see the changes made by a writing transaction, even if

they are not committed yet (it is called “dirty read”); but it is a very bad

scenario because these changes can be rolled back.

PostgreSQL does it the hard way: it keeps several versions of one and the

same row using multi-version concurrency control (MVCC). Thus, while a

reading transaction sees one version, a writing transaction modifies another

version.

To tell one version from another, PostgreSQL marks them with two tags that

define the “lifetime” of each version. Instead of timestamps, these tags use

transaction IDs, which are always increasing (it’s a bit more complex than

that, but we won’t get into details.) When a row is inserted, it is tagged with

ID of the transaction that has performed the INSERT command. When it is

deleted, it is tagged with ID of the transaction that has run the DELETE

command (but is not physically deleted). UPDATE consists of two

operations: DELETE and INSERT.

https://postgrespro.com/docs/postgresql/12/mvcc-intro

Data Snapshot

A consistent slice of data at a particular moment

transaction ID (xid) defines the snapshot creation moment

all changes that are not yet committed at this moment are filtered out using

the list of active transactions

row 3:

xid

snapshot

row 2:

row 1:

PostgreSQL uses snapshot isolation.

When accessing a table, a transaction must see only one version of each

row (or no versions at all). To achieve this, transactions use data snapshots

taken at a particular moment. Each snapshot contains only the latest

versions of committed data; if the data was not committed at that moment, it

won’t be visible in the snapshot. In other words, each row is represented by

its version that was current at the time of snapshot creation.

A snapshot is not a physical copy of all data; it’s just several numbers:

- the ID of the last transaction committed by the moment of snapshot

creation (which defines that moment)

- the list of transactions that were active at that time

The list is required to ensure that the snapshot does not contain any

changes of those transactions that had started before the snapshot

creation, but had not been committed by that time.

Knowing these numbers, we can always say which row version is visible in

the snapshot. Sometimes it is the current (the most recent) version, like in

the case of row 1 on this slide. Sometimes it is an earlier version: row 2 is

deleted (and this change is already committed), but the transaction still sees

this row while using the snapshot. Such behavior is correct: it ensures that

the data is consistent at each point in time.

Some rows will not make it into the snapshot at all: row 3 had been deleted

before the snapshot was built, so it was not included into the snapshot.

Isolation Levels

Read Uncommitted

is not supported by PostgreSQL: works as Read Committed

Read Committed (default)

the snapshot is taken at the beginning of an operator

identical queries can return different data

Repeatable Read

the snapshot is taken at the beginning of the first operator in the transaction

transactions can end with a serialization error

Serializable

full isolation, but additional overhead

transactions can end with a serialization error

The SQL standard defines four isolation levels: the stricter the level, the

less the interference between concurrent transactions. At the time when the

standard was adopted, it was assumed that stricter levels are harder to

implement and have higher negative impact on performance (since then

these views have somewhat changed).

The most relaxed level is Read Uncommitted, which allows dirty reads.

This level is not supported by PostgreSQL since it is of no practical use and

provides no performance gains.

The Read Committed level is the default isolation level in PostgreSQL.

At this level, snapshots are built at the beginning of each SQL operator.

Thus, each operator works with constant and consistent data, but two

identical queries can return different results when run one after another.

At the Repeatable Read level, a snapshot is built at the beginning of each

transaction (while executing the first operator), so all queries within one

transaction see the same data. This level is convenient for cases like

creating reports using series of queries.

The Serializable level provides full isolation: you can write operators as if

there is only one transaction. The price you pay for convenience is that

some transactions complete with an error; your application should be able

to retry such transactions.

https://postgrespro.com/docs/postgresql/12/transaction-iso

Locks

Row-level locks

reading never blocks writing

an update locks the affected rows for other updates, but not for reading

Table-level locks

forbid changing and deleting the table while it is worked with

forbid reading the table while it is being rebuilt or moved

address other similar needs

Lifetime of locks

locks are acquired either automatically as needed or manually

locks are released automatically at the end of transactions

What does MVCC give us? It minimizes the number of required locks, thus

increasing system performance.

Most of the locks are set at the row level. That being said, reading never

blocks neither reading, nor writing transactions. A row update does not

block reading of this row. The only case when a transaction has to wait for

the lock release is when it tries to modify the row that is already updated by

another transaction, but this change is not committed yet.

Locks can also be acquired at a higher level, e.g., on tables. They are

needed to ensure that no one can delete the table while other transactions

are reading data from it, or to forbid access to the table that is being rebuilt.

As a rule, such locks cause no issues because deleting and rebuilding

tables are very rare operations.

All the required locks are acquired automatically and are released

automatically once the transaction completes. It is also possible to set

custom locks; such need arises not too often.

https://postgrespro.com/docs/postgresql/12/explicit-locking

Transaction Status

Transaction status (xact)

service information; two bits per transaction

special files on disk

buffers in shared memory

Commit

sets the bit “transaction is committed”

Abort

sets the bit “transaction is aborted”

completes as fast as a commit (no data rollback is required)

To maintain MVCC, it is required to know the status of transactions.

A transaction can be either active or completed. A completed transaction is

either committed or aborted. Thus, keeping the state of each transaction

requires two bits. Status information is stored in special service files; for

recent transactions, this data is usually kept in shared memory to avoid

frequent disk access.

By any transaction outcome (both successful and unsuccessful), it’s enough

to simply set the corresponding status bits. Both commits and aborts

happen equally fast.

If an aborted transaction has already created new row versions, these

versions are not deleted (there is no “physical” data rollback). Thanks to the

status information, other transactions will see that the transaction that has

created or deleted row versions is actually aborted, and will ignore these

changes.

Vacuum

Both previous and current row versions are stored together

the size of tables and indexes grows over time

Vacuum

deletes row versions that are no longer required (i.e., they are not visible

in any data snapshot anymore)

works in parallel with other processes

when deleting row versions, leaves “holes” in data files,

to be later filled with new row versions

Vacuum full

fully rebuilds data files, making them compacted

locks the table while running

In PostgreSQL, all row versions (both current and outdated) are stored

together, in the same data files. It’s clear that outdated versions pile up over

time, which leads to table (and index) bloating and causes performance

degradation.

However, there is no need to store outdated row versions that are not

visible in any data snapshot anymore. Such versions are cleaned up by the

vacuum process. It physically removes deprecated row versions from data

files, leaving “holes” that will be later filled with new row versions.

Vacuum does not lock other processes and can be run concurrently.

You can fully rebuild the table and its indexes by running the VACUUM FULL

command. This process makes files more compacted, but the table is fully

locked while it is running.

https://postgrespro.com/docs/postgresql/12/routine-vacuuming

Autovacuum

Autovacuum launcher

a background process

launches background

workers from time to time

Autovacuum worker

performs cleanup

of a particular database,

for those tables that

actually need it

PostgreSQL

backend

postmaster

background workers

autovacuum

shared memory

xact

transaction

status

cache

Autovacuum is set up by DBA to perform timely data cleanup, so that

considerable file bloating is avoided. Instead of cleaning up all tables on

schedule, it is triggered automatically by update activity in tables.

However, abnormally long transactions can interfere with timely vacuuming.

The DEV2 course provides more information on how to avoid impeding

system operation.

Autovacuum is performed as follows. The background process called

autovacuum launcher regularly starts autovacuum workers in various

databases. These workers, in their turn, create the list of tables that require

vacuuming, and then perform the cleanup.

Summary

Files can store several versions of each row

Each transaction accesses its own data snapshot, which is

a consistent slice of data at a particular point in time

Writes do not block reads, reads do not block any operations

Isolation levels differ in snapshot creation times

Row versions pile up, so they require periodic cleanup

Practice

1. Create a table with a single row.

Start the first transaction at the Read Committed isolation level

and run a query on the table.

In another session, delete the row and commit this change.

How many rows will the first transaction see if it re-runs the

query? Check the result.

Commit the first transaction.

2. Repeat the experiment at the Repeatable Read isolation level:

BEGIN ISOLATION LEVEL REPEATABLE READ;

Explain the difference.