Administrative tasks
Monitoring
15
Copyright
© Postgres Professional, 2023
Authors: Egor Rogov, Pavel Luzanov, Ilya Bashtanov
Translated by Alexander Meleshko
Cover photo by Oleg Bartunov (Phu monastery and Bhrikuti peak, Nepal)
Use of course materials
Non-commercial use of course materials (presentations, demonstrations) is
allowed without restrictions. Commercial use is possible only with the written
permission of Postgres Professional. It is prohibited to make changes to the
course materials.
Feedback
Please send your feedback, comments and suggestions to:
Disclaimer
Postgres Professional assumes no responsibility for any damages and
losses, including loss of income, caused by direct or indirect, intentional or
accidental use of course materials. Postgres Professional company
specifically disclaims any warranties on course materials. Course materials
are provided “as is,” and Postgres Professional company has no obligations
to provide maintenance, support, updates, enhancements, or modifications.
2
Agenda
OS tools
Database statistics
Server message log
External monitoring systems
3
OS tools
Processes
ps (grep postgres)
update_process_title parameter for updating the status of processes
Resource usage
iostat, vmstat, sar, top...
Disk space
df, du, quota...
PostgreSQL runs on an operating system and to a certain extent depends
on its configuration.
Unix provides multiple state and performance monitoring tools.
In particular, you can monitor the processes belonging to PostgreSQL.
The server parameter update_process_title (on by default) displays the state
of each process next to its title, making it even more convenient.
Various tools are available to monitor the use of system resources (CPU,
RAM, disks): iostat, vmstat, sar, top, etc.
Disk space monitoring is also necessary. The space occupied by the
database on disk can be viewed both from the database itself (see the Data
Organization module) and from the OS (with the du command). The amount
of disk space available is also displayed with the df command in the OS.
If disk quotas are used, they must also be taken into account.
The tools and approaches to monitoring differ significantly between various
OS and file systems, so we will not discuss them in detail.
4
Database statistics
Statistics collection
Ongoing system activities
Command execution monitoring
Extensions
There are two primary sources of information about the state of the system.
The first one is statistical information collected by PostgreSQL and stored
inside the database.
5
Statistiсs collection
Statistics collection settings
statistics parameter
table and index access track_counts
(touched rows) on by default
needed for vacuuming
page accesses track_io_timing
off by default
user function calls track_functions
off by default
In addition to tracking ongoing activities, PostgreSQL also collects some
statistics.
The amount of information collected is controlled by several server
parameters, since the more information is collected, the greater the
overhead.
6
server
process
server
process
Architecture (pre-15)
backend stats collector
transaction
statistics
aggregated
statistics
by transaction
statistics
snapshot
twice a second
on first
access
by transaction
PGDATA/pg_stat_tmp/
PGDATA/pg_stat/
on shutdown
Prior to PostgreSQL 15, statistics collection looked like this.
Backends collect statistics from executed transactions. The stats collector
process collects statistics from all backends and aggregates it. Once every
half a second, the collector dumps statistics to temporary files in the
PGDATA/pg_stat_tmp directory. (Therefore, moving this directory to an in-
memory file system can improve overall performance.)
When a backend requests statistics data (via views or functions), it's served
a statistics snapshot, the most recent version of statistics provided by the
collector. Unless explicitly requested, the process will not read new
snapshots until the end of the transaction to ensure consistency.
Due to latency, the worker process will not always have the latest statistics,
but it is seldom necessary.
On server shutdown, the collector dumps statistics data into permanent files
inside the PGDATA/pg_stat catalog. When the server starts up again, it can
keep using the data. Statistics can be reset manually by the administrator,
and always reset after a crash.
7
shared memory
server
process
server
process
Architecture (15+)
backend
transaction
statistics
by transaction
statistics
on shutdown
stats_fetch_consistency
= none, cache, snapshot
PGDATA/pg_stat/
aggregated
statistics
In PostgreSQL 15 a separate stats collector process was removed. Instead,
backends write collected statisitcs right into the shared memory of the
server.
When a backend requests statistics data, it either re-reads data from the
shared memory (if the stats_fetch_consistency parameter is set to none),
or caches the received data until the current transaction ends (cache),
or fetches and caches a snapshot of all statistics data (snapshot).
On server shutdown, the server still flushes statistics data into permanent
files inside the PGDATA/pg_stat directory.
9
Ongoing activities
Configuration
statistics parameter
current activities track_activities
and backends’ and background on by default
processes’ waits
The current activities of all backends and background processes are
displayed in the pg_stat_activity view. We will focus on it more in the demo.
This view depends on the track_activities parameter (enabled by default).
11
Command execution
Views for monitoring command executions
command execution
ANALYZE pg_stat_progress_analyze
CREATE INDEX, REINDEX pg_stat_progress_create_index
VACUUM pg_stat_progress_vacuum
including autovacuuming
CLUSTER, VACUUM FULL pg_stat_progress_cluster
Create base backup pg_stat_progress_basebackup
COPY pg_stat_progress_copy
You can monitor the progress of some potentially long-running commands
using the corresponding views.
The structures of the views are described in the documentation:
Backup is discussed in the Backup module.
12
Additional statistics
Stock extensions
pg_stat_statements query statistics
pgstattuple row versions statistics
pg_buffercache buffer cache status
Other extensions
pg_wait_sampling statistics for waits
pg_stat_kcache CPU and I/O statistics
pg_qualstats predicate statistics
There are extensions, both stock and third-party, that enable the collection
of additional statistics.
For example, the pg_stat_statements extension collects information
about queries executed by the system, pg_buffercache provides tools for
monitoring the buffer cache, etc.
13
Server message log
Log record configuration
Log file rotation
Log analysis
The other primary source of information about the state of the server is the
message log.
14
Message receiver (log_destination = list)
stderr error stream
csvlog CSV format
jsonlog JSON format
syslog the syslog daemon
eventlog Windows event log
Message collector (logging_collector = on)
can provide additional info
never loses messages (unlike syslog)
writes stderr and csvlog to the log_directory/log_filename file
Server message log
requires message collector
The server log can be output in various formats and forwarded to various
destinations. The format and the destination are determined primarily by the
log_destination parameter (you can list multiple destinations separated by a
comma).
The stderr flag (on by default) streams message log errors into the standard
error log as plain text. The syslog flag tells the log to forward messages to
the syslog daemon (for Unix systems), and the eventlog flag does the same
for the Windows event log.
The message collector is an auxiliary process that collects additional
information from all PostgreSQL processes to supplement the basic log
messages. It is designed to keep track of every message, therefore it can
become the bottleneck in high-load environments.
The message collector is switched on and off by the logging_collector flag.
When stderr is on, the log writes into the file defined by the log_filename
parameter, which is located in the directory defined by the log_directory
parameter.
When the collector is on and csvlog is selected as a destination, the log will
also write output into a CSV file log_filename.csv. Similarly, with jsonlog
option log will get into a JSON file log_filename.json.
15
What to log
Settings
information parameter
level of messages log_min_messages
long command execution time log_min_duration_statement
command execution time log_duration
application name application_name
checkpoints log_checkpoints
connections and disconnections log_(dis)connections
long lock waits log_lock_waits
command execution outputs log_statement
temporary files usage log_temp_files
etc.
A lot of useful information can be output to the server message log. By
default, almost all output is disabled so as not to turn logging into the
bottleneck for the disk subsystem. The administrator must decide what
information is important, provide the necessary disk space to store it, and
evaluate the impact on the overall system performance.
16
Log file rotation
By the message collector
statistics parameter
file name mask log_filename
rotation time, minutes log_rotation_age
rotation file size, kB log_rotation_size
allow to rewrite files log_truncate_on_rotation = on
different file name masks and rotation times allow for different
combinations:
'postgresql-%H.log', '1h' 24 files a day
'postgresql-%a.log', '1d' 7 files a week
External tools
logrotate system utility
If all the log output goes into a single file, sooner or later the file will grow to
an unmanageable size, making administration and analysis highly
inconvenient. Therefore, a log rotation scheme is usually employed.
The message collector has its own rotation tools. Some of the parameters
that configure them are listed on the slide.
The log_filename parameter can specify not just a name, but a file name
mask using designated date and time characters.
The log_rotation_age parameter determines how log a file is used before the
log switches to a new one (and log_rotation_size is the file size at which to
switch to the next one).
The log_truncate_on_rotation flag determines if the log should overwrite
existing files or not.
Different rotation schemes can be defined by using various file name mask
and switch time combinations.
Alternatively, external rotation management tools can be used, such as
logrotate from the Ubuntu package (it's configured through the
/etc/logrotate.d/postgresql-common file).
17
Log analysis
OS tools
grep, awk...
Special analysis tools
pgBadger — requires a certain log configuration
There are different ways to analyze logs.
You can search for certain information using OS tools or specially designed
scripts.
The de facto standard for log analysis is the PgBadger application
(https://github.com/dalibo/pgbadger), but it imposes certain restrictions on
the contents of the log. In particular, only messages in English are allowed.
19
External monitoring
Universal monitoring systems
Zabbix, Munin, Cacti...
cloud-based: Okmeter, NewRelic, Datadog...
PostgreSQL monitoring systems
PGObserver
PostgreSQL Workload Analyzer (PoWA)
Open PostgreSQL Monitoring (OPM)
pg_profile, pgpro_pwr
etc.
In practice, for any serious environment, you need a full-fledged monitoring
system that collects various metrics from both PostgreSQL and the
operating system, stores the history of these metrics, displays them as
readable graphs, notifies when certain metrics reach certain thresholds, etc.
PostgreSQL does not come with such a system by itself, it only provides the
means by which such information can be acquired. We've gone over them
already. Therefore, for full-scale monitoring, an external system is required.
There are quite a few such systems on the market. Some are universal and
come with PostgreSQL plugins or settings. These include Zabbix, Munin,
Cacti, cloud services such as Okmeter, NewRelic, Datadog, and others.
There are also systems specifically designed for PostgreSQL: PGObserver,
PoWA, OPM, etc. The pg_profile extension allows you to build snapshots of
static data and compare them, identifying resource-intensive operations and
their dynamics. pgpro_pwr is its extended, commercially available version.
An incomplete but representative list of monitoring systems can be viewed
20
Takeaways
Monitoring collects data on server operations
both from the operating system
and from the database points of view
PostgreSQL provides collected statistics
and the server message log
Full-scale monitoring requires an external system
21
Practice
1. In a new database, create a table, insert several rows, and then
delete all rows.
Look at the table access statistics and reference the values
(n_tup_ins, n_tup_del, n_live_tup, n_dead_tup)
against your activity.
Perform a vacuum, check the statistics again
and compare with the previous figures.
2. Create a deadlock with two transactions.
See what information is recorded in
the server message log.
Task 2. Deadlock is a situation when two (or more) transactions are waiting
for each other to complete first. Unlike a normal lock, transactions have no
way to get out of deadlock, and the DBMS is forced to resolve it by forcibly
interrupting one of the transactions.
The easiest way to reproduce a deadlock is on a table with two rows. The
first transaction changes (and locks) the first row, and the second one locks
the second row. Then the first transaction tries to change the second row,
discovers that it's locked, and starts waiting. And then the second
transaction tries to change the first row, and also waits for the lock to be
released.