3
Storing multiple versions of the same row
versions have different time frames
timestamp = transaction ID (IDs are given in ascending order)
Row
xid
x
1
x
2
x
3
x
4
DELETE
INSERT
UPDATE
version 1 version 2 version 3
UPDATE
MVCC
When multiple sessions are running at the same time, two transactions may
access the same row at the same time. If both transactions are just reading
the row, there is no problem. If both transactions try to write, no problem
either (in this case, they line up and make the changes one after the other).
The tricky part is when one transaction wants to read a row and another
one wants to change it at the same time.
There are two simple ways about it. You can let such transactions block
each other, but then performance suffers. Otherwise, you can let the
reading transaction immediately see the changes made by the writing
transaction, even if they are not committed (this is called a "dirty read").
This is dangerous, because the changes can be rolled back.
PostgreSQL goes the hard way and utilizes what is known as Multiversion
concurrency control. In essence, the system stores multiple versions of
each row. This allows transactions to see different versions of the same row,
while only one transaction can modify a version at a time.
To distinguish between the versions, PostgreSQL marks each one with two
timestamps, which together specify a version's "time frame".
The timestamps are essentially just transaction IDs, which always come in
ascending order. (In reality, the whole thing is a bit more complicated, but
not worth getting into right now.) Upon creation, a row version is marked
with the ID if the transaction that executed the INSERT command. When
deleted, the version is marked with the the ID if the transaction that did the
DELETE command (but is not physically deleted). An UPDATE command is
a DELETE and an INSERT executed back to back.