4
Data Snapshot
A consistent slice of data at a particular moment
transaction ID (xid) defines the snapshot creation moment
all changes that are not yet committed at this moment are filtered out using
the list of active transactions
row 3:
xid
snapshot
row 2:
row 1:
PostgreSQL uses snapshot isolation.
When accessing a table, a transaction must see only one version of each
row (or no versions at all). To achieve this, transactions use data snapshots
taken at a particular moment. Each snapshot contains only the latest
versions of committed data; if the data was not committed at that moment, it
won’t be visible in the snapshot. In other words, each row is represented by
its version that was current at the time of snapshot creation.
A snapshot is not a physical copy of all data; it’s just several numbers:
- the ID of the last transaction committed by the moment of snapshot
creation (which defines that moment)
- the list of transactions that were active at that time
The list is required to ensure that the snapshot does not contain any
changes of those transactions that had started before the snapshot
creation, but had not been committed by that time.
Knowing these numbers, we can always say which row version is visible in
the snapshot. Sometimes it is the current (the most recent) version, like in
the case of row 1 on this slide. Sometimes it is an earlier version: row 2 is
deleted (and this change is already committed), but the transaction still sees
this row while using the snapshot. Such behavior is correct: it ensures that
the data is consistent at each point in time.
Some rows will not make it into the snapshot at all: row 3 had been deleted
before the snapshot was built, so it was not included into the snapshot.