7
Checkpoints
Periodic flushing of all dirty buffers to disk
guarantees that all changes before the checkpoint are saved on disk
reduces the size of WAL required for recovery
Crash recovery
starts from the latest checkpoint
replays WAL entries one by one if the corresponding changes are missing
xid
checkpoint
checkpoint crash
required WAL files
start
of recovery
When PostgreSQL server is started after a failure, it enters the recovery
mode. At this moment, the data stored on disk is inconsistent: some "hot"
pages may not have been flushed yet, even though they got updated before
other pages already written to disk.
To restore consistency, PostgreSQL reads WAL and replays all WAL entries
one by one if the corresponding changes have not been flushed to disk.
Thus, it restarts all transactions and then aborts those that were not
registered in WAL as committed.
However, WAL size could become huge during server operation. It is
absolutely impossible to store all WAL entries and replay them all after a
failure. That’s why the database system periodically performs a checkpoint:
all dirty buffers are forced to disk (including xact buffers that store
transaction status metadata). It guarantees that all transaction changes that
had happened before the checkpoint are saved on disk.
A checkpoint can take a lot of time, and that’s OK. The “point” itself in the
sense of a particular moment marks the beginning of the process. But the
checkpoint is considered complete only after all dirty buffers that were
present at that moment are flushed to disk.
Crash recovery starts from the latest checkpoint, which allows PostgreSQL
to store only those WAL files that were written after the last completed
checkpoint.