User Tools

Site Tools


dba:postgresql:basic_monitorization

Postgresql: Basic Monitorization [ESP]

Monitorización recomendada (aka DBA monitorization)

Queries que se han de lanzar desde el sistema de monitorización para gestionar eventos “críticos”.

Alta Prioridad -> 24*7

Estas queries significarían un CRITICAL (24×7)

Locks

Bloqueos dentro de la base de datos (un update bloquea otro):

snippet.sql
SELECT blocked_locks.pid     AS blocked_pid,
         blocked_activity.usename  AS blocked_user,
         blocking_locks.pid     AS blocking_pid,
         blocking_activity.usename AS blocking_user,
         blocked_activity.query    AS blocked_statement,
         blocking_activity.query   AS current_statement_in_blocking_process,
         blocked_activity.application_name AS blocked_application,
         blocking_activity.application_name AS blocking_application
   FROM  pg_catalog.pg_locks         blocked_locks
    JOIN pg_catalog.pg_stat_activity blocked_activity  ON blocked_activity.pid = blocked_locks.pid
    JOIN pg_catalog.pg_locks         blocking_locks 
        ON blocking_locks.locktype = blocked_locks.locktype
        AND blocking_locks.DATABASE IS NOT DISTINCT FROM blocked_locks.DATABASE
        AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation
        AND blocking_locks.page IS NOT DISTINCT FROM blocked_locks.page
        AND blocking_locks.tuple IS NOT DISTINCT FROM blocked_locks.tuple
        AND blocking_locks.virtualxid IS NOT DISTINCT FROM blocked_locks.virtualxid
        AND blocking_locks.transactionid IS NOT DISTINCT FROM blocked_locks.transactionid
        AND blocking_locks.classid IS NOT DISTINCT FROM blocked_locks.classid
        AND blocking_locks.objid IS NOT DISTINCT FROM blocked_locks.objid
        AND blocking_locks.objsubid IS NOT DISTINCT FROM blocked_locks.objsubid
        AND blocking_locks.pid != blocked_locks.pid
    JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid
   WHERE NOT blocked_locks.GRANTED;

TXID Wraparound

Hay que montorizar por umbrales (típica monitorización con OK/WARNING/CRITICAL):

snippet.sql
WITH max_age AS (
    SELECT 2000000000 AS max_old_xid
        , setting AS autovacuum_freeze_max_age
        FROM pg_catalog.pg_settings
        WHERE name = 'autovacuum_freeze_max_age' )
, per_database_stats AS (
    SELECT datname
        , m.max_old_xid::INT
        , m.autovacuum_freeze_max_age::INT
        , age(d.datfrozenxid) AS oldest_current_xid
    FROM pg_catalog.pg_database d
    JOIN max_age m ON (TRUE)
    WHERE d.datallowconn )
SELECT MAX(oldest_current_xid) AS oldest_current_xid
    , MAX(ROUND(100*(oldest_current_xid/max_old_xid::FLOAT))) AS percent_towards_wraparound
    , MAX(ROUND(100*(oldest_current_xid/autovacuum_freeze_max_age::FLOAT))) AS percent_towards_emergency_autovac
FROM per_database_stats

Excelente explicación del problema aquí.

Postgres Backend Connections

Hay que montorizar por umbrales (típica monitorización con OK/WARNING/CRITICAL):

Número de conexiones totales

snippet.sql
SELECT SUM(numbackends) FROM pg_stat_database;

Número de conexiones detallado por bd

snippet.sql
SELECT
    COUNT(datid) AS CURRENT,
    (
        SELECT
            setting AS mc
        FROM
            pg_settings
        WHERE
            name = 'max_connections') AS mc,
    d.datname
FROM
    pg_database d
    LEFT JOIN pg_stat_activity s ON (s.datid = d.oid)
GROUP BY
    2,
    3
ORDER BY
    datname
;

Query time

Segundos de las queries en ejecución (típica monitorización con OK/WARNING/CRITICAL):

snippet.sql
SELECT
    datname,
    datid,
    pid AS pid,
    usename,
    client_addr,
    query AS query,
    state AS state,
    CASE WHEN client_port < 0 THEN
        0
    ELSE
        client_port
    END AS client_port,
    COALESCE(ROUND(EXTRACT(epoch FROM now() - query_start)), 0) AS seconds
FROM
    pg_stat_activity
WHERE (query_start IS NOT NULL
    AND (state NOT LIKE 'idle%'
        OR state IS NULL))
AND usename <> 'barman'
ORDER BY
    query_start,
    pid DESC;

Media Prioridad -> 8*5

Alertas que se deberían monitorzar 8×5 para su revisión como una tarea del día a día del DBA.

VACUUM

Básico para saber si hay alguna tabla de la que no se esté haciendo vacuum.
Esta query devuelve las tablas de las que no se ha hecho vacuum en los últimos 7 días.

snippet.sql
SELECT
    schemaname,
    relname,
    n_dead_tup,
    n_live_tup,
    autovacuum_count,
    last_vacuum,
    last_autovacuum,
    last_autoanalyze,
    last_analyze
FROM
    pg_stat_all_tables
WHERE
    n_live_tup > 0
AND schemaname NOT IN ('pg_catalog', 'information_schema', 'partman', 'cron')
AND last_vacuum>NOW() - INTERVAL '7 days' ;

Para una alerta bastaría con saber schemaname.relname.
Puede que la tabla tenga algún problema a analizar si el vacuum está programado y no se está realizando.

Failed cron jobs

Solo en caso de ejecutar tareas programadas con la extensión pg_cron.
La query se ha de ejecutar sobre la BD configurada en el parámetro cron.database_name de la extensión:

snippet.sql
SELECT * FROM cron.job_run_details WHERE STATUS = 'failed';
dba/postgresql/basic_monitorization.txt · Last modified: 2023/10/23 07:59 by dodger