Administrative tasks

Monitoring

Authors: Egor Rogov, Pavel Luzanov, Ilya Bashtanov

Translated by Alexander Meleshko

Use of course materials

Non-commercial use of course materials (presentations, demonstrations) is

allowed without restrictions. Commercial use is possible only with the written

permission of Postgres Professional. It is prohibited to make changes to the

course materials.

Feedback

Please send your feedback, comments and suggestions to:

edu@postgrespro.ru

Disclaimer

Postgres Professional assumes no responsibility for any damages and

losses, including loss of income, caused by direct or indirect, intentional or

accidental use of course materials. Postgres Professional company

specifically disclaims any warranties on course materials. Course materials

are provided “as is,” and Postgres Professional company has no obligations

to provide maintenance, support, updates, enhancements, or modifications.

Topics

OS tools

Database statistics

Server message log

External monitoring systems

OS tools

Processes

ps (grep postgres)

update_process_title parameter for updating the status of processes

Resource usage

iostat, vmstat, sar, top...

Disk space

df, du, quota...

PostgreSQL runs on an operating system and to a certain extent depends

on its configuration.

Unix provides multiple state and performance monitoring tools.

In particular, you can monitor the processes belonging to PostgreSQL.

The server parameter update_process_title (on by default) displays the state

of each process next to its title, making it even more convenient.

Various tools are available to monitor the use of system resources (CPU,

RAM, disks): iostat, vmstat, sar, top, etc.

Disk space monitoring is also necessary. The space occupied by the

database on disk can be viewed both from the database itself (see the Data

Organization module) and from the OS (with the du command). The amount

of disk space available is also displayed with the df command in the OS.

If disk quotas are used, they must also be taken into account.

The tools and approaches to monitoring differ significantly between various

OS and file systems, so we will not discuss them in detail.

https://postgrespro.com/docs/postgresql/13/monitoring-ps

https://postgrespro.com/docs/postgresql/13/diskusage

Database statistics

Stats collector

Ongoing system activities

Command execution monitoring

Extensions

There are two primary sources of information about the state of the system.

The first one is statistical information collected by PostgreSQL and stored

inside the database.

Stats collector

Stats collector process settings

statistics parameter

table and index access track_counts

(access, touched rows) on by default

needed for vacuuming

page accesses track_io_timing

off by default

user function calls track_functions

off by default

In addition to tracking ongoing activities, PostgreSQL also collects some

statistics.

These statistics are collected by the stats collector background process.

The amount of information collected is controlled by several server

parameters, since the more information is collected, the greater the

overhead.

https://postgrespro.com/docs/postgresql/13/monitoring-stats

server

process

server

process

Architecture

backend stats collector

transaction

statistics

aggregated

statistics

by transaction

PGDATA/pg_stat_tmp/

PGDATA/pg_stat/

aggregated

statistics

snapshot

twice

a second

on first

access

by transaction

Backends collect statistics from executed transactions. The stats collector

process collects statistics from all backends and aggregates it. Once every

half a second (can be changed during compilation), the collector dumps

statistics to temporary files in the PGDATA/pg_stat_tmp directory.

(Therefore, moving this directory to an in-memory file system can improve

overall performance.)

When a worker process requests statistics data (via views or functions), it’s

served a statistics snapshot, the most recent version of statistics provided

by the collector. Unless explicitly requested, the process will not read new

snapshots until the end of the transaction to ensure consistency.

Due to latency, the worker process will not always have the latest statistics,

but it is seldom necessary.

On server shutdown, the collector dumps statistics data into permanent files

inside the PGDATA/pg_stat catalog. When the server starts up again, it can

keep using the data. Statistics can be reset manually by the administrator,

and always reset after a crash.

Ongoing activities

Configuration

statistics parameter

current activities track_activities

and backends’ and background on by default

processes’ waits

The current activities of all backends and background processes are

displayed in the pg_stat_activity view. We will focus on it more in the demo.

This view depends on the track_activities parameter (enabled by default).

Command execution

Views for monitoring command executions

command execution

ANALYZE pg_stat_progress_analyze

CREATE INDEX, REINDEX pg_stat_progress_create_index

VACUUM pg_stat_progress_vacuum

including autovacuuming

CLUSTER, VACUUM FULL pg_stat_progress_cluster

Create base backup pg_stat_progress_basebackup

You can monitor the progress of some potentially long-running commands

using the corresponding views.

The structures of the views are described in the documentation:

https://postgrespro.com/docs/postgresql/13/progress-reporting

Backup is discussed in the Backup module.

In PostgreSQL 14, the pg_stat_progress_copy view was added to this list to

track the COPY command.

Additional statistics

Stock extensions

pg_stat_statements query statistics

pgstattuple row versions statistics

pg_buffercache buffer cache status

Other extensions

pg_wait_sampling statistics for waits

pg_stat_kcache CPU and I/O statistics

pg_qualstats predicate statistics

etc.

There are extensions, both stock and third-party, that enable the collection

of additional statistics.

For example, the pg_stat_statements extension collects information

about queries executed by the system, pg_buffercache provides tools for

monitoring the buffer cache, etc.

Server message log

Log record configuration

Log file rotation

Log analysis

The other primary source of information about the state of the server is the

message log.

Server message log

Message receiver (log_destination = list)

stderr error stream

csvlog CSV format (if the collector is enabled)

syslog the syslog daemon

eventlog Windows event log

Message collector (logging_collector = on)

can provide additional info

never loses messages (unlike syslog)

writes stderr and csvlog to the log_directory/log_filename file

The server log can be output in various formats and forwarded to various

destinations. The format and the destination are determined primarily by the

log_destination parameter (you can list multiple destinations separated by a

comma).

The stderr flag (on by default) streams message log errors into the standard

error log as plain text. The syslog flag tells the log to forward messages to

the syslog daemon (for Unix systems), and the eventlog flag does the same

for the Windows event log.

The message collector is an auxiliary process that collects additional

information from all PostgreSQL processes to supplement the basic log

messages. It is designed to keep track of every message, therefore it can

become the bottleneck in high-load environments.

The message collector is switched on and off by the logging_collector flag.

When stderr is on, the log writes into the file defined by the log_filename

parameter, which is located in the directory defined by the log_directory

parameter.

When the collector is on and csvlog is selected as a destination, the log will

also write output into a CSV file log_filename.csv. And in PostgreSQL 15,

jsonlog becomes a destination option.

What to log?

Settings

information parameter

level of messages log_min_messages

long command execution time log_min_duration_statement

command execution time log_duration

application name application_name

checkpoints log_checkpoints

connections and disconnections log_(dis)connections

long lock waits log_lock_waits

command execution outputs log_statement

temporary files usage log_temp_files

etc.

A lot of useful information can be output to the server message log.

By default, almost all output is disabled so as not to turn logging into the

bottleneck for the disk subsystem. The administrator must decide what

information is important, provide the necessary disk space to store it, and

evaluate the impact on the overall system performance.

Log file rotation

By the message collector

statistics parameter

file name mask log_filename

rotation time, minutes log_rotation_age

rotation file size, KB log_rotation_size

allow to rewrite files log_truncate_on_rotation = on

different file name masks and rotation times allow for different

combinations:

'postgresql-%H.log', '1h' 24 files a day

'postgresql-%a.log', '1d' 7 files a week

External tools

logrotate system utility

If all the log output goes into a single file, sooner or later the file will grow to

an unmanageable size, making administration and analysis highly

inconvenient. Therefore, a log rotation scheme is usually employed.

https://postgrespro.com/docs/postgresql/13/logfile-maintenance

The message collector has its own rotation tools. Some of the parameters

that configure them are listed on the slide.

The log_filename parameter can specify not just a name, but a file name

mask using designated date and time characters.

The log_rotation_age parameter determines how log a file is used before the

log switches to a new one (and log_rotation_size is the file size at which to

switch to the next one).

The log_truncate_on_rotation flag determines if the log should overwrite

existing files or not.

Different rotation schemes can be defined by using various file name mask

and switch time combinations.

https://postgrespro.com/docs/postgresql/13/runtime-config-

logging.html#RUNTIME-CONFIG-LOGGING-WHERE

Alternatively, external rotation management tools can be used, such as

logrotate from the Ubuntu package (it’s configured through the

/etc/logrotate.d/postgresql-common file).

Log analysis

OS tools

grep, awk...

Special analysis tools

pgBadger — requires a certain log configuration

There are different ways to analyze logs.

You can search for certain information using OS tools or specially designed

scripts.

The de facto standard for log analysis is the PgBadger application

(https://github.com/dalibo/pgbadger), but it imposes certain restrictions on

the contents of the log.

In particular, only messages in English are allowed.

External monitoring

Universal monitoring systems

Zabbix, Munin, Cacti...

cloud-based: Okmeter, NewRelic, Datadog...

PostgreSQL monitoring systems

PGObserver

PostgreSQL Workload Analyzer (PoWA)

Open PostgreSQL Monitoring (OPM)

pg_profile, pgpro_pwr

etc.

In practice, for any serious environment, you need a full-fledged monitoring

system that collects various metrics from both PostgreSQL and the

operating system, stores the history of these metrics, displays them as

readable graphs, notifies when certain metrics reach certain thresholds, etc.

PostgreSQL does not come with such a system by itself, it only provides the

means by which such information can be acquired. We’ve gone over them

already. Therefore, for full-scale monitoring, an external system is required.

There are quite a few such systems on the market. Some are universal and

come with PostgreSQL plugins or settings. These include Zabbix, Munin,

Cacti, cloud services such as Okmeter, NewRelic, Datadog, and others.

There are also systems specifically designed for PostgreSQL: PGObserver,

PoWA, OPM, etc. The pg_profile extension allows you to build snapshots of

static data and compare them, identifying resource-intensive operations and

their dynamics. pgpro_pwr is its extended, commercially available version.

An incomplete but representative list of monitoring systems can be viewed

here: https://wiki.postgresql.org/wiki/Monitoring

Takeaways

Monitoring collects data on server operations

both from the operating system

and from the database points of view

PostgreSQL provides collected statistics

and the server message log

Full-scale monitoring requires an external system

Practice

1. In a new database, create a table, insert several rows, and then

delete all rows.

Look at the table access statistics and reference the values

(n_tup_ins, n_tup_del, n_live_tup, n_dead_tup) against your

activity.

Perform a vacuum, check the statistics again and compare with

the previous figures.

2. Create a deadlock with two transactions.

See what information is recorded in the server message log.

2. Deadlock is a situation when two (or more) transactions are waiting for

each other to complete first. Unlike a normal lock, transactions have no way

to get out of deadlock, and the DBMS is forced to resolve it by forcibly

interrupting one of the transactions.

The easiest way to reproduce a deadlock is on a table with two rows. The

first transaction changes (and locks) the first row, and the second one locks

the second row. Then the first transaction tries to change the second row,

discovers that it’s locked, and starts waiting. And then the second

transaction tries to change the first row, and also waits for the lock to be

released.

Practice+

1. Install the pg_stat_statements extension.

Execute several queries.

See what information gets into the pg_stat_statements view.

1. To install the extension, in addition to executing the CREATE

EXTENSION command, you need to change the value of the

shared_preload_libraries parameter and restart the server.

https://postgrespro.com/docs/postgresql/13/pgstatstatements