Architecture
PostgreSQL overview
15
Copyright
© Postgres Professional, 2023
Authors: Egor Rogov, Pavel Luzanov, Ilya Bashtanov
Translated by Alexander Meleshko
Cover photo by Oleg Bartunov (Phu monastery and Bhrikuti peak, Nepal)
Use of course materials
Non-commercial use of course materials (presentations, demonstrations) is
allowed without restrictions. Commercial use is possible only with the written
permission of Postgres Professional. It is prohibited to make changes to the
course materials.
Feedback
Please send your feedback, comments and suggestions to:
Disclaimer
Postgres Professional assumes no responsibility for any damages and
losses, including loss of income, caused by direct or indirect, intentional or
accidental use of course materials. Postgres Professional company
specifically disclaims any warranties on course materials. Course materials
are provided “as is,” and Postgres Professional company has no obligations
to provide maintenance, support, updates, enhancements, or modifications.
2
Agenda
Client-server protocol
Transactionality and its implementation
Query processing and execution
Processes and memory structures
Storing data on disk and disk operations
System extensibility
3
Client and server
connection authentication
query generation query execution
transaction management transactionality control
protocol
PostgreSQL
Python client
psycopg2
Java client
JDBC
SQL client
libpq
A client application, such as psql or any other program written in any
programming language, connects to the server and "communicates" with it
in one way or another. In order for the client and the server to understand
each other, they must use the same communication protocol. Usually, the
client uses a driver that implements the protocol and provides a set of
functions to use in the program. Internally, the driver can use the standard
protocol implementation (the libpq library), or can implement the protocol
itself.
The language the client is written in is unimportant, as the functionality
behind the syntax is defined by the protocol. As an example, we will use the
SQL language and the psql client. Of course, no one really would program a
client in SQL, but we will use it here purely for educational purposes. It
should not be that difficult to substitute any of the SQL commands provided
below with corresponding statements in your programming language of
choice.
Generally speaking, a connection protocol allows the client to connect to a
database in a cluster. The server performs authentication: decides if the
client should be allowed to connect, i.e. by demanding a password.
Then, the client sends the server queries in the SQL language, the server
executes the queries and sends back the results. A powerful and convenient
query language is one of the fundamentals of relational databases.
Another one is the ability to maintain consistency between transactions.
4
Transactions
client
application
PostgreSQL
driver
atomicity everything or nothing
consistency integrity constraints and user constraints
isolation parallel processes impact
durability no data loss after a failure
operations
COMMIT /
ROLLBACK;
BEGIN;
PostgreSQL
A transaction is a sequence of operations that preserves the consistency of
data, provided that the operations are performed completely and without
interference from other transactions.
Transactions must satisfy four properties collectively known as ACID:
- Atomicity. A transaction is either completed in full or not completed at all.
To that end, the beginning of a transaction is marked with the BEGIN
command, and the end with either COMMIT (commit changes) or
ROLLBACK (undo changes).
- Consistency. Transactions move the database from one consistent state to
another consistent one (consistency here means that certain restrictions are
fulfilled).
- Isolation. Transactions running simultaneously should not affect each
other.
- Durability. Once data is committed, it should not be lost even after a server
failure.
In PostgreSQL, the client application is the side usually responsible for
transaction management (that is, for determining what commands make up
a transaction, and for committing or canceling the transaction).
Transactions can also be managed on the backend by stored procedures.
6
Query execution
client
application
driver
parsing ← system catalog
rewriting ← rules
planning ← statistics
execution ← data
query
result
PostgreSQL
Query execution is complicated. First, a query is sent from a client to the
server as text. The server parses the text, analyzing its syntax (whether
letters are formed into words, and words into commands) and semantics
(whether there are tables and other objects in the database that the query
refers to by name). To do that, the server needs data on what is actually
stored in the database. This meta-data is called the system catalog and is
stored in special tables in the same database.
A query can be rewritten (transformed). For example, a view name can be
substituted with the query text. Users can implement their own
transformations using the rule system.
SQL is a declarative language: a query defines what data to get, but not how
to get it. It is at this point when the query (already parsed and presented in
the form of a tree) is passed on to the planner, which develops an execution
plan. For example, the planner can decide whether or not to use indexes to
find the data for the query. To plan the execution efficiently, the planner
needs certain information about the tables it is going to work with, such as
the size of the tables and the distribution of data within them. Together, this
information is called statistics.
When a plan is selected, the query is executed in accordance with it, and
the result is returned to the client in its entirety.
This is convenient and simple when we're talking about just a row or two,
but it can quickly become problematic with large outputs.
7
Prepared statements
client
application
driver
parsing
rewriting
binding ← parameter values
planning
execution
bonding
result
preparation
PostgreSQL
Each query goes through the steps listed above: parsing, rewriting, planning
and execution. But if the same query (possibly with different parameters) is
executed over and over again, there is no point in parsing it anew every
time.
Therefore, in addition to the usual query execution process, the PostgreSQL
protocol provides an extended mode that can control statement execution
more precisely.
One of its features is the ability to prepare a statement. When a statement is
prepared, it is parsed and rewritten as usual and its parse tree is saved.
When the statement is executed, specific parameter values are bound to it.
If necessary, planning is redone (in some cases, PostgreSQL remembers
the query plan and does not repeat this step). Then, the statement is
executed.
Another advantage of prepared statements is that they are protected from
possible SQL injections.
9
Cursors
client
application
PostgreSQL
driver
parsing
rewriting
binding ← parameter values
planning
execution
output
preparation
result
result
bonding
PostgreSQL
The client may not want to get all the output at once. There can be too much
data, and not all of it may be needed.
This issue is solved by cursors, another feature of the extended mode. The
protocol can open a cursor for any operator, and then receive the output row
by row.
A cursor can be imagined as a sliding window that shows only a part of the
output at a time. When an output row is received, the window shifts down. In
other words, cursors allow you to work with relational data (that comes in
sets) iteratively, row by row.
An open cursor is represented on the server by a so-called portal. This term
is mentioned in the documentation, but in general, the words "cursor" and
"portal" can be considered synonyms.
A statement used within a cursor is implicitly prepared (that is, its parsing
tree and possibly execution plan are saved).
11
backend
Processes and memory
client
application
PostgreSQL
postmaster
backend
background processes
shared memory
local
memory
parsed queries,
cursor states,
system catalog cache,
space for sorting,
joining etc.
Between processing queries from clients, the server must store technical
information, such as parsed queries and their plans and the status of open
cursors (portals). But where is it stored and how?
Under the hood, a PostgreSQL server consists of several interacting
processes.
First of all, when the server starts, a process traditionally called postmaster
is started. It starts all other processes (using the fork system call in Unix).
It also "babysits" them: if any of the processes crashes, postmaster will
restart it (or restart the entire server if it considers that the failed process
could have damaged any of the shared data).
Operations of the server are maintained by a number of background
processes. The main ones will be discussed in later topics.
In order for the processes to exchange information between them,
postmaster allocates shared memory that all the processes can access.
In addition to shared memory, each process has its own local memory,
accessible only to itself.
Postmaster also listens for incoming connections. For each connecting
client, postmaster generates a designated backend process for the client to
communicate with on the server side, and each client gets its own process.
The space required to execute a client's query (parsed queries
and their plans, cursor states, system catalog cache, a place to sort data,
etc.) is allocated in the local memory of the backend process of this client.
12
Multiple clients
client
application
PostgreSQL
postmaster
backend
background processes
shared memory
MVCC
locks
When multiple clients connect to a server, each one gets a backend process
created for it. As long as there are not too many clients, RAM is sufficient,
and connections do not occur too often, sustaining many connections at
once isn't a problem in itself.
However, when multiple processes try to access the same database object,
things must be done to ensure that one process will not change the data
while another is in the process of reading it.
For objects in shared memory, this is ensured by short-term locks.
PostgreSQL does this carefully enough so that the system scales well with
an increase in the number of processors (cores).
Tables are more complicated. Locks will have to be held until the end of
transactions (that is, potentially for a long time), so scalability may suffer.
To avoid that, PostgreSQL uses a multiversion concurrency control
mechanism (MVCC) and snapshot isolation: multiple versions of the same
data can exist simultaneously, and each process sees only its own (but
always consistent) data snapshot. Now, only those processes that are trying
to change data that has already been changed, but not yet committed by
other processes, will be locked.
MVCC is the main mechanism that enables the first three properties of
transactions (atomicity, consistency, and isolation). We will talk about it more
in its dedicated topic.
13
Connection pool
client
application
PostgreSQL
postmaster
background processes
shared memory
backend
pool
If there are too many clients, or connections are established
and broken too often, using a connection pool may help. This functionality is
usually provided by the application server or third-party pool managers (the
most popular of which is PgBouncer).
With a connection pool in place, clients connect not to the PostgreSQL
server directly, but to the pool manager. The manager keeps several
connections to the database server open and uses free ones to fulfill client
queries. From the server's point of view, the number of clients remains
constant regardless of how many clients access the pool manager.
The drawback is that multiple clients end up sharing the same backend
process, which, as we remember, stores client-specific state in its local
memory (such as parsed queries for prepared statements). Therefore, care
should be taken when developing applications for such deployments.
The use of connection pools is discussed in greater detail in the DEV2
course.
14
Data storage
header
data
buffer cache
page
(usually 8kB)
file
Data is stored as regular OS files on disks.
Logically, the files are divided into pages (sometimes the term block is
used). A page is usually 8kB in size. It can be changed within some limits
(16kB or 32kB), but only during compilation. A cluster that has been
compiled and started up can work with pages of only one size.
Each page has a certain internal structure. It contains a header and actual
data. There may be free space between them if the page is not fully
occupied.
Since disks work much slower than RAM (especially HDD, but SSD too),
data read from disk gets cached. A buffer cache, a certain amount of space
in RAM, is allocated for recently read pages. The idea is that the system
may want to read the same pages multiple times, and keeping them on hand
may save time compared to repeated disk scans. Any recently changed data
is also cached for some time before being written on disk.
15
Data storage
client
application
OS
WAL
PostgreSQL
postmaster
background processes
shared memory
buffer cache
cache
backend
The PostgreSQL buffer cache is located in shared memory so that all
processes have access to it.
PostgreSQL doesn't directly access disks storing its data. Instead, it relies
on the operating system. The operating system also has its own data cache.
Therefore, if a page is not found in the buffer cache, there is a chance that it
is in the OS cache and access to the disk will be avoided.
In case of a failure (for example, power supply dies), the contents of the
RAM are lost and the data changed but not yet written to disk will be lost.
This is unacceptable and breaks the durability property of transactions. To
avoid that, PostgreSQL keeps a log that allows it to redo lost operations and
restore data to a consistent state. We will talk about the buffer cache and
the write-ahead log in the dedicated topic later on.
16
OS
Extensibility
client
application
PostgreSQL
postmaster
background processes
backend
shared memory
background
workers
programming
languages
кеш
index
types
data
types
functions,
operators,
triggers
FDW
PostgreSQL is designed with extensibility in mind.
An application developer can create their own data types based on existing
ones (composite types, ranges, arrays, enumerations) and stored functions
for data processing (including triggers for specific events).
And with the C programming language, extensions can be developed to add
arbitrary functionality to the system. Most extensions can be installed "hot",
without stopping the server. Thanks to this architecture, there are plenty of
existing extensions doing things such as:
- adding support for programming languages (in addition to standard SQL,
PL/pgSQL PL/Perl, PL/Python and PL/Tcl),
- introducing new data types and operators to work with them,
- creating new index types that work more efficiently with specific data types
(in addition to standard B-trees, GiST, SP-GiST, GIN, BRIN, Bloom),
- interfacing with external systems using foreign data wrappers (FDW),
- starting background worker processes to perform periodic tasks.
Extensibility is discussed in more detail the DEV2 course.
17
Takeaways
The protocol allows clients to connect to the server, transmit
queries and manage transactions
Each client is served by a dedicated backend process
Data is stored in files and accessed via the operating system
Data is cached both in local memory (system catalog, parsed
queries) and in shared memory (buffer cache)