Administrative Tasks

Monitoring

Authors: Egor Rogov, Pavel Luzanov, Ilya Bashtanov, Aleksei Beresnev

Translated by: Liudmila Mantrova, Alexander Meleshko, Elena Sharafutdinova

Photo: Oleg Bartunov (Phu Monastery and Bhrikuti Peak, Nepal)

Use of Course Materials

Non-commercial use of course materials (presentations, demonstrations) is

allowed without restrictions. Commercial use is possible only with the written

permission of Postgres Professional. It is prohibited to make changes to the

course materials.

Feedback

Please send your feedback, comments and suggestions to:

edu@postgrespro.ru

Disclaimer

Postgres Professional assumes no responsibility for any damages and losses,

including loss of income, caused by direct or indirect, intentional or accidental

use of course materials. Postgres Professional company specifically disclaims

any warranties on course materials. Course materials are provided “as is,” and

Postgres Professional company has no obligations to provide maintenance,

support, updates, enhancements, or modifications.

Topics

OS Tools

Сumulative Statistics

Server Message Log

External Monitoring Systems

OS Tools

Processes

ps, pgrep...

update_process_title parameter for updating the status of processes

cluster_name parameter for setting the cluster name

Resource usage

iostat, vmstat, sar, top...

Disk space

df, du, quota...

PostgreSQL runs on an operating system and to a certain extent depends

on its configuration.

PostgreSQL process information is accessible via OS tools. The server

parameter update_process_title (on by default) displays the state of each

process next to its title. The cluster_name parameter specifies the instance

name used to identify it among running processes.

Various tools are available to monitor the use of system resources (CPU,

RAM, disks) in Unix: iostat, vmstat, sar, top, etc.

Disk space monitoring is also necessary. The space occupied by the

database on disk can be viewed both from the database itself (see the Data

Organization module) and from the OS (with the du command). The amount

of disk space available is also displayed with the df command in the OS. If

disk quotas are used, they must also be taken into account.

The tools and approaches to monitoring differ significantly between various

OS and file systems, so we will not discuss them in detail.

https://postgrespro.com/docs/postgresql/16/monitoring-ps

https://postgrespro.com/docs/postgresql/16/diskusage

Сumulative Statistics

Statistics Collection

Current System Activities

Command Execution Monitoring

Extensions

There are two primary sources of information about the state of the system.

The first one is statistical information collected by PostgreSQL and stored

inside the cluster.

Statistics Collection

Settings of cumulative statistics

parameter action

track_activities monitor current commands

track_counts collect table and index access statistics

track_functions monitor user function calls

off by default

track_io_timing monitor block read and write timing statistics

off by default

track_wal_io_timing monitor write timing of WAL operations

off by default

The cumulative statistics system in PostgreSQL collects and provides data

on server operations. Cumulative statistics track access to tables and

indexes at both the disk block and row levels. Additionally, they record

details such as the number of rows, vacuum and analyze operations for

each table.

It is also possible to track the number of user function calls and their

execution time.

The amount of information collected is controlled by several server

parameters, since the more information is collected, the greater the

overhead.

https://postgrespro.com/docs/postgresql/16/monitoring-stats

Architecture

server

process

server

process

backend shared

memory

transaction

statistics

between transactions

cumulative

statistics

PGDATA/pg_stat/

statistics

snapshot

stats_fetch_consistency

normal

server shutdown

cumulative

statistics

none

cache

snapshot

Backends collect statistics in running transactions. The process stores this

data in shared memory, updating it at most once per second (compile-time

setting).

Cumulative statistics are stored in PGDATA/pg_stat/ during a normal server

shutdown and reloaded upon startup. In case of a crash shutdown, all

counters are reset.

Backend may cache statistical data. The caching level is controlled by the

stats_fetch_consistency parameter.

●

none — no caching; statistics reside only in shared memory.

●

cache — statistics for a single object are cached.

●

snapshot — statistics for the entire database are cached.

Cache is the default mode, which balances consistency and performance

efficiency.

Cached statistics are not refreshed and are discarded at the end of a

transaction, or when pg_stat_clear_snapshot() is called.

Due to latency and caching, the backend will not always have the latest

statistics, but it is seldom necessary.

Current Activities

Configuration

statistics parameter

current activities and waits track_activities

of backends on by default

and background processes

The current activities of all backends and background processes are

displayed in the pg_stat_activity view. We will focus on it more in the demo.

This view depends on the track_activities parameter (enabled by default).

Command Execution

Views for monitoring command executions

command execution

ANALYZE pg_stat_progress_analyze

CREATE INDEX, REINDEX pg_stat_progress_create_index

VACUUM pg_stat_progress_vacuum

including autovacuuming

CLUSTER, VACUUM FULL pg_stat_progress_cluster

Create base backup pg_stat_progress_basebackup

COPY pg_stat_progress_copy

You can monitor the progress of some potentially long-running commands

using the corresponding views.

The structures of the views are described in the documentation:

https://postgrespro.com/docs/postgresql/16/progress-reporting

Backup is discussed in the Backup module.

Additional Statistics

Additional supplied extensions

pg_stat_statements query statistics

pgstattuple row versions statistics

pg_buffercache buffer cache status

Other extensions

pg_wait_sampling statistics for waits

pg_stat_kcache CPU and I/O statistics

pg_qualstats predicate statistics

…

There are extensions, both additional supplied and third-party, that enable

the collection of additional statistics.

For example, the pg_stat_statements extension collects information about

queries executed by the system, pg_buffercache provides tools for

monitoring the buffer cache, etc.

Many key extensions are discussed in more detail in the DBA2 and DEV2

courses.

Server Message Log

Log Record Configuration

Log File Rotation

Log Analysis

The other primary source of information about the state of the server is the

message log.

Server Message Log

Message receiver (log_destination = list)

stderr error stream

csvlog CSV format (if the collector is enabled)

jsonlog JSON format (if the collector is enabled)

syslog the syslog daemon

eventlog Windows event log

Message collector (logging_collector = on)

can provide additional info

never loses messages (unlike syslog)

writes stderr, csvlog and jsonlog to the log_directory/log_filename

The server log can be output in various formats and forwarded to various

destinations. The format and the destination are determined primarily by the

log_destination parameter (you can list multiple receivers separated by a

comma).

The stderr value (on by default) sends messages to the standard error

stream as plain text. The syslog value forwards messages to the syslog

daemon (for Unix systems), and the eventlog value does the same for the

Windows event log.

The message collector is an auxiliary process that collects additional

information from all PostgreSQL processes to supplement the basic log

messages. It is designed to keep track of every message, therefore it can

become the bottleneck in high-load environments.

The message collector is switched on and off by the logging_collector

parameter. When stderr is enabled, the messages are written into the file

defined by the log_filename parameter, which is located in the directory

defined by the log_directory parameter.

When the collector is on and csvlog is selected as a receiver, the info will

also be output into a CSV file log_filename.csv. With the jsonlog output

enabled, log files are written in JSON format and use the .json file extension.

What to Log?

Settings

information parameter

level of messages log_min_messages

long command execution time log_min_duration_statement

command execution time log_duration

application name application_name

checkpoints log_checkpoints

connections and disconnections log_(dis)connections

long lock waits log_lock_waits

command executed log_statement

temporary file usage log_temp_files

...

A lot of useful information can be output to the server message log. By

default, almost all output is disabled so as not to turn logging into the

bottleneck for the I/O subsystem. The administrator should decide what

information is important, provide the necessary disk space to store it, and

evaluate the impact of the log output on the overall system performance.

Log File Rotation

By the message collector

setting parameter

file name pattern log_filename

rotation time, minutes log_rotation_age

rotation file size, KB log_rotation_size

overwrite file log_truncate_on_rotation = on

combining file name patterns and rotation times allow for different rotation

schemes:

'postgresql-%H.log', '1h' 24 files a day

'postgresql-%a.log', '1d' 7 files a week

External tools

logrotate system utility

If all the log output goes into a single file, sooner or later the file will grow to

an unmanageable size, making administration and analysis highly

inconvenient. Therefore, a log rotation scheme is usually employed.

https://postgrespro.com/docs/postgresql/16/logfile-maintenance

The message collector has its own rotation tools. Some of the parameters

that configure them are listed on the slide.

The log_filename parameter can specify not just a name, but a file name

pattern using designated date and time characters.

The log_rotation_age parameter determines how long a file is used before

the output switches to a new one (and log_rotation_size is the file size at

which to switch to the next one).

The log_truncate_on_rotation flag determines if PostgreSQL should

overwrite existing files or append messages to them.

Different rotation schemes can be defined by using various file name

patterns and switch time combinations.

https://postgrespro.com/docs/postgresql/16/runtime-config-logging#RUNTIM

E-CONFIG-LOGGING-WHERE

Alternatively, rotation can be managed by external tools. For example,

Ubuntu package uses logrotate system utility (it is configured through the

/etc/logrotate.d/postgresql-common file).

Log Analysis

OS tools

grep, awk...

Special analysis tools

pgBadger — requires a certain log configuration

There are different ways to analyze logs.

You can search for certain information using OS tools or specially designed

scripts.

The de facto standard for log analysis is the PgBadger application

https://github.com/darold/pgbadger, but it imposes certain restrictions on the

contents of the log. In particular, only messages in English are allowed.

External Monitoring

Universal monitoring systems

Zabbix, Munin, Cacti...

cloud-based: Okmeter, NewRelic, Datadog...

PostgreSQL monitoring systems

pg_profile, pgpro_pwr

PGObserver

PostgreSQL Workload Analyzer (PoWA)

Open PostgreSQL Monitoring (OPM)

...

In practice, you need a full-fledged monitoring system that collects various

metrics from both PostgreSQL and the operating system, stores the history

of these metrics, displays them as readable graphs, notifies when certain

metrics exceed certain limits, etc.

PostgreSQL does not come with such a system by itself, it only provides the

means by which such information can be acquired. We have gone over them

already. Therefore, for full-scale monitoring, an external system is required.

There are quite a few such systems on the market. Some are universal and

come with PostgreSQL plugins or agents. These include Zabbix, Munin,

Cacti, cloud services such as Okmeter, NewRelic, Datadog, and others.

There are also systems specifically designed for PostgreSQL: PGObserver,

PoWA, OPM, etc. The pg_profile extension allows you to build snapshots of

static data and compare them, identifying resource-intensive operations and

their dynamics. pgpro_pwr is its extended, commercially available version.

https://postgrespro.com/docs/enterprise/16/pgpro-pwr

An incomplete but representative list of monitoring systems can be viewed

here: https://wiki.postgresql.org/wiki/Monitoring

Takeaways

Monitoring collects data on server operations

both from the operating system and from the database points

of view

PostgreSQL provides cumulative statistics

and the server message log

Full-scale monitoring requires an external system

Practice

1. In a new database, create a table, insert several rows, and then

delete all rows.

Look at the table access statistics and reference the values

(n_tup_ins, n_tup_del, n_live_tup, n_dead_tup) against your

activity.

Perform a vacuum, check the statistics again and compare with

the previous figures.

2. Create a deadlock with two transactions.

See what information is recorded in the server message log.

2. Deadlock is a situation when two (or more) transactions are waiting for

each other to complete first. Unlike a normal lock, transactions have no way

to get out of deadlock, and the DBMS has to resolve it by forcibly

interrupting one of the transactions.

The easiest way to reproduce a deadlock is on a table with two rows. The

first transaction changes (and locks) the first row, and the second one locks

the second row. Then the first transaction tries to change the second row,

discovers that it is locked, and starts waiting. And then the second

transaction tries to change the first row, and also waits for the lock to be

released.

Practice+

1. Install the pg_stat_statements extension.

Execute several queries.

See what information gets into the pg_stat_statements view.

1. To install the extension, before executing the CREATE EXTENSION

command, change the value of the shared_preload_libraries parameter, and

restart the server.

https://postgrespro.com/docs/postgresql/16/pgstatstatements