Administrative Tasks
Monitoring
16
Copyright
© Postgres Professional, 2017–2025
Authors: Egor Rogov, Pavel Luzanov, Ilya Bashtanov, Alexey Beresnev
Translated by: Liudmila Mantrova, Alexander Meleshko, Elena Sharafutdinova
Photo: Oleg Bartunov (Phu Monastery and Bhrikuti Peak, Nepal)
Use of Course Materials
Non-commercial use of course materials (presentations, demonstrations) is
allowed without restrictions. Commercial use is possible only with the written
permission of Postgres Professional. It is prohibited to make changes to the
course materials.
Feedback
Please send your feedback, comments and suggestions to:
edu@postgrespro.ru
Disclaimer
Postgres Professional assumes no responsibility for any damages and losses,
including loss of income, caused by direct or indirect, intentional or accidental
use of course materials. Postgres Professional company specifically disclaims
any warranties on course materials. Course materials are provided “as is,” and
Postgres Professional company has no obligations to provide maintenance,
support, updates, enhancements, or modifications.
2
Topics
OS Tools
Сumulative Statistics
Server Message Log
External Monitoring Systems
3
OS Tools
Processes
ps, pgrep...
update_process_title parameter for updating the status of processes
cluster_name parameter for setting the cluster name
Resource usage
iostat, vmstat, sar, top...
Disk space
df, du, quota...
PostgreSQL runs on an operating system and to a certain extent depends
on its configuration.
PostgreSQL process information is accessible via OS tools. The server
parameter update_process_title (on by default) displays the state of each
process next to its title. The cluster_name parameter specifies the instance
name used to identify it among running processes.
Various tools are available to monitor the use of system resources (CPU,
RAM, disks) in Unix: iostat, vmstat, sar, top, etc.
Disk space monitoring is also necessary. The space occupied by the
database on disk can be viewed both from the database itself (see the Data
Organization module) and from the OS (with the du command). The amount
of disk space available is also displayed with the df command in the OS. If
disk quotas are used, they must also be taken into account.
The tools and approaches to monitoring differ significantly between various
OS and file systems, so we will not discuss them in detail.
4
Сumulative Statistics
Statistics Collection
Current System Activities
Command Execution Monitoring
Extensions
There are two primary sources of information about the state of the system.
The first one is statistical information collected by PostgreSQL and stored
inside the cluster.
5
Statistics Collection
Settings of cumulative statistics
parameter action
track_activities monitor current commands
track_counts collect table and index access statistics
track_functions monitor user function calls
off by default
track_io_timing monitor block read and write timing statistics
off by default
track_wal_io_timing monitor write timing of WAL operations
off by default
The cumulative statistics system in PostgreSQL collects and provides data
on server operations. Cumulative statistics track access to tables and
indexes at both the disk block and row levels. Additionally, they record
details such as the number of rows, vacuum and analyze operations for
each table.
It is also possible to track the number of user function calls and their
execution time.
The amount of information collected is controlled by several server
parameters, since the more information is collected, the greater the
overhead.
6
Architecture
server
process
server
process
backend shared
memory
transaction
statistics
between transactions
cumulative
statistics
PGDATA/pg_stat/
statistics
snapshot
stats_fetch_consistency
normal
server shutdown
cumulative
statistics
none
cache
snapshot
Backends collect statistics in running transactions. The process stores this
data in shared memory, updating it at most once per second (compile-time
setting).
Cumulative statistics are stored in PGDATA/pg_stat/ during a normal server
shutdown and reloaded upon startup. In case of a crash shutdown, all
counters are reset.
Backend may cache statistical data. The caching level is controlled by the
stats_fetch_consistency parameter.
none — no caching; statistics reside only in shared memory.
cache — statistics for a single object are cached.
snapshot — statistics for the entire database are cached.
Cache is the default mode, which balances consistency and performance
efficiency.
Cached statistics are not refreshed and are discarded at the end of a
transaction, or when pg_stat_clear_snapshot() is called.
Due to latency and caching, the backend will not always have the latest
statistics, but it is seldom necessary.
8
Current Activities
Configuration
statistics parameter
current activities and waits track_activities
of backends on by default
and background processes
The current activities of all backends and background processes are
displayed in the pg_stat_activity view. We will focus on it more in the demo.
This view depends on the track_activities parameter (enabled by default).
10
Command Execution
Views for monitoring command executions
command execution
ANALYZE pg_stat_progress_analyze
CREATE INDEX, REINDEX pg_stat_progress_create_index
VACUUM pg_stat_progress_vacuum
including autovacuuming
CLUSTER, VACUUM FULL pg_stat_progress_cluster
Create base backup pg_stat_progress_basebackup
COPY pg_stat_progress_copy
You can monitor the progress of some potentially long-running commands
using the corresponding views.
The structures of the views are described in the documentation:
Backup is discussed in the Backup module.
11
Additional Statistics
Additional supplied extensions
pg_stat_statements query statistics
pgstattuple row versions statistics
pg_buffercache buffer cache status
Other extensions
pg_wait_sampling statistics for waits
pg_stat_kcache CPU and I/O statistics
pg_qualstats predicate statistics
There are extensions, both additional supplied and third-party, that enable
the collection of additional statistics.
For example, the pg_stat_statements extension collects information about
queries executed by the system, pg_buffercache provides tools for
monitoring the buffer cache, etc.
Many key extensions are discussed in more detail in the DBA2 and DEV2
courses.
12
Server Message Log
Log Record Configuration
Log File Rotation
Log Analysis
The other primary source of information about the state of the server is the
message log.
13
Server Message Log
Message receiver (log_destination = list)
stderr error stream
csvlog CSV format (if the collector is enabled)
jsonlog JSON format (if the collector is enabled)
syslog the syslog daemon
eventlog Windows event log
Message collector (logging_collector = on)
can provide additional info
never loses messages (unlike syslog)
writes stderr, csvlog and jsonlog to the log_directory/log_filename
The server log can be output in various formats and forwarded to various
destinations. The format and the destination are determined primarily by the
log_destination parameter (you can list multiple receivers separated by a
comma).
The stderr value (on by default) sends messages to the standard error
stream as plain text. The syslog value forwards messages to the syslog
daemon (for Unix systems), and the eventlog value does the same for the
Windows event log.
The message collector is an auxiliary process that collects additional
information from all PostgreSQL processes to supplement the basic log
messages. It is designed to keep track of every message, therefore it can
become the bottleneck in high-load environments.
The message collector is switched on and off by the logging_collector
parameter. When stderr is enabled, the messages are written into the file
defined by the log_filename parameter, which is located in the directory
defined by the log_directory parameter.
When the collector is on and csvlog is selected as a receiver, the info will
also be output into a CSV file log_filename.csv. With the jsonlog output
enabled, log files are written in JSON format and use the .json file extension.
14
What to Log?
Settings
information parameter
level of messages log_min_messages
long command execution time log_min_duration_statement
command execution time log_duration
application name application_name
checkpoints log_checkpoints
connections and disconnections log_(dis)connections
long lock waits log_lock_waits
command executed log_statement
temporary file usage log_temp_files
...
A lot of useful information can be output to the server message log. By
default, almost all output is disabled so as not to turn logging into the
bottleneck for the I/O subsystem. The administrator should decide what
information is important, provide the necessary disk space to store it, and
evaluate the impact of the log output on the overall system performance.
15
Log File Rotation
By the message collector
setting parameter
file name pattern log_filename
rotation time, minutes log_rotation_age
rotation file size, KB log_rotation_size
overwrite file log_truncate_on_rotation = on
combining file name patterns and rotation times allow for different rotation
schemes:
'postgresql-%H.log', '1h' 24 files a day
'postgresql-%a.log', '1d' 7 files a week
External tools
logrotate system utility
If all the log output goes into a single file, sooner or later the file will grow to
an unmanageable size, making administration and analysis highly
inconvenient. Therefore, a log rotation scheme is usually employed.
The message collector has its own rotation tools. Some of the parameters
that configure them are listed on the slide.
The log_filename parameter can specify not just a name, but a file name
pattern using designated date and time characters.
The log_rotation_age parameter determines how long a file is used before
the output switches to a new one (and log_rotation_size is the file size at
which to switch to the next one).
The log_truncate_on_rotation flag determines if PostgreSQL should
overwrite existing files or append messages to them.
Different rotation schemes can be defined by using various file name
patterns and switch time combinations.
Alternatively, rotation can be managed by external tools. For example,
Ubuntu package uses logrotate system utility (it is configured through the
/etc/logrotate.d/postgresql-common file).
16
Log Analysis
OS tools
grep, awk...
Special analysis tools
pgBadger — requires a certain log configuration
There are different ways to analyze logs.
You can search for certain information using OS tools or specially designed
scripts.
The de facto standard for log analysis is the PgBadger application
https://github.com/darold/pgbadger, but it imposes certain restrictions on the
contents of the log. In particular, only messages in English are allowed.
18
External Monitoring
Universal monitoring systems
Zabbix, Munin, Cacti...
cloud-based: Okmeter, NewRelic, Datadog...
PostgreSQL monitoring systems
pg_profile, pgpro_pwr
PGObserver
PostgreSQL Workload Analyzer (PoWA)
Open PostgreSQL Monitoring (OPM)
...
In practice, you need a full-fledged monitoring system that collects various
metrics from both PostgreSQL and the operating system, stores the history
of these metrics, displays them as readable graphs, notifies when certain
metrics exceed certain limits, etc.
PostgreSQL does not come with such a system by itself, it only provides the
means by which such information can be acquired. We have gone over them
already. Therefore, for full-scale monitoring, an external system is required.
There are quite a few such systems on the market. Some are universal and
come with PostgreSQL plugins or agents. These include Zabbix, Munin,
Cacti, cloud services such as Okmeter, NewRelic, Datadog, and others.
There are also systems specifically designed for PostgreSQL: PGObserver,
PoWA, OPM, etc. The pg_profile extension allows you to build snapshots of
static data and compare them, identifying resource-intensive operations and
their dynamics. pgpro_pwr is its extended, commercially available version.
An incomplete but representative list of monitoring systems can be viewed
19
Takeaways
Monitoring collects data on server operations
both from the operating system and from the database points
of view
PostgreSQL provides cumulative statistics
and the server message log
Full-scale monitoring requires an external system
20
Practice
1. In a new database, create a table, insert several rows, and then
delete all rows.
Look at the table access statistics and reference the values
(n_tup_ins, n_tup_del, n_live_tup, n_dead_tup) against your
activity.
Perform a vacuum, check the statistics again and compare with
the previous figures.
2. Create a deadlock with two transactions.
See what information is recorded in the server message log.
2. Deadlock is a situation when two (or more) transactions are waiting for
each other to complete first. Unlike a normal lock, transactions have no way
to get out of deadlock, and the DBMS has to resolve it by forcibly
interrupting one of the transactions.
The easiest way to reproduce a deadlock is on a table with two rows. The
first transaction changes (and locks) the first row, and the second one locks
the second row. Then the first transaction tries to change the second row,
discovers that it is locked, and starts waiting. And then the second
transaction tries to change the first row, and also waits for the lock to be
released.
21
Practice+
1. Install the pg_stat_statements extension.
Execute several queries.
See what information gets into the pg_stat_statements view.
1. To install the extension, before executing the CREATE EXTENSION
command, change the value of the shared_preload_libraries parameter, and
restart the server.