User Tools

Site Tools


dba:postgresql:a_tale_of_postgresql

A Thale(s) of PostgreSql

setup

Master

postgresql.conf

Sample main config file:

postgresql.conf
listen_addresses = '*'		# what IP address(es) to listen on;
port = 5432				# (change requires restart)
max_connections = 100			# (change requires restart)
shared_buffers = 128MB			# min 128kB
dynamic_shared_memory_type = windows	# the default is the first option
wal_level = replica			# minimal, replica, or logical
wal_log_hints = on			# also do full page writes of non-critical updates
max_wal_size = 1GB
min_wal_size = 80MB
archive_mode = on		# enables archiving; off, on, or always
archive_command = 'copy "%p" "\\\\server02\\\archivelog\\%f"'
archive_timeout = 300		# force a logfile segment switch after this
max_wal_senders = 3		# max number of walsender processes
wal_keep_segments = 64		# in logfile segments; 0 disables
hot_standby = on			# "off" disallows queries during recovery
hot_standby_feedback = on		# send info from standby to prevent
log_destination = 'stderr'		# Valid values are combinations of
logging_collector = on		# Enable capturing of stderr and csvlog
log_directory = 'log'			# directory where log files are written,
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'	# log file name pattern,
log_file_mode = 0640			# creation mode for log files,
log_timezone = 'Europe/London'
datestyle = 'iso, dmy'
timezone = 'Europe/London'
lc_messages = 'Portuguese_Portugal.1252'			# locale for system error message
lc_monetary = 'Portuguese_Portugal.1252'			# locale for monetary formatting
lc_numeric = 'Portuguese_Portugal.1252'			# locale for number formatting
lc_time = 'Portuguese_Portugal.1252'				# locale for time formatting
default_text_search_config = 'pg_catalog.portuguese'

Config values that should be changed or (at least) readen:

  • listen_address and port
  • max_connections
  • archive_command
  • ALL logging parameters (log_*)
  • localization parameters


pg_hba.conf

Authentication and authorization file:

pg_hba.conf
host    all             all             127.0.0.1/32            trust
# next line MUST NOT be used on production
host    all             all             all            trust
host    all             all             ::1/128                 trust
# next line MUST NOT be used on production
host    replication     rep_user        all                     trust
host    replication     all             127.0.0.1/32            trust
host    replication     all             ::1/128                 trust
# next line MUST NOT be used on production
host    replication     all             all			            trust

Our suggestion is use either password or md5 authentication and allow connection only for the server pairs (that is Appserver1 and Appserver2).

Replication and failover process

Slave

re-instantiate

1st instantiation/re-instantiate:

cd \
cd APIS\Postgre12\bin
.\pg_basebackup.exe --host=10.6.7.1 --port=5432 --username=postgres  --pgdata=C:\APIS\Postgres12\data -P -v -R -X stream

The above command explained:

-h specifies the IP address of the CentOS Master server (replace 10.6.7.1 with IP Address of your Slave server)
-D specifies the data directory on Slave server, usually: C:\APIS\Postgres12\data
-U specifies the connection user (we use postgres, user, here, but you can and should create a dedicated user name. I used ‘'postgres'’ user to illustrate the steps, albeit if this is not a production server and you don’t mind it, you can set it up this way)
-P show progress reporting
-v use verbose mode
-R enable the creation of recovery config or in other words, this appends connection settings to postgresql.auto.conf and creates a standby.signal file in the data folder of your server
-X include the WAL files (write-ahead log files) in the backup. We use the '‘stream'’ option here to stream the WAL during the backup process

Startup

This should be use the 1st time or for testing, then you can start the slave with the Windows Services tool:

.\pg_ctl.exe -D c:\APIS\Postgres12\data start

Check the replication in the slave:

SELECT pg_is_in_recovery();

It should report t.

SELECT * FROM pg_stat_wal_receiver;

It should give you information about the streaming replication.
Check the replication in the master:

SELECT * FROM pg_stat_replication ;

Sample failover

Check the instances:

postgres@psql-instance-02 ~/12/data/log $ psql 
psql (12.8)
Type "help" for help.
 
postgres=# SELECT pg_is_in_recovery();
 pg_is_in_recovery 
-------------------
 t
(1 row)

And:

postgres@psql-instance-03 ~/12/data/log $ psql 
psql (12.8)
Type "help" for help.
 
postgres=# SELECT pg_is_in_recovery();
 pg_is_in_recovery 
-------------------
 f
(1 row)

hardcore kill the primary:

postgres@psql-instance-03 ~/12/data/log $ ps waux |egrep "/usr/pgsql-12/bin/postgres" | grep -v grep
postgres  31541  0.0  0.2 397492 17536 ?        Ss   Sep22   0:05 /usr/pgsql-12/bin/postgres -D /var/lib/pgsql/12/data
postgres@psql-instance-03 ~/12/data/log $ kill -9 31541
postgres@psql-instance-03 ~/12/data/log $ ps waux |egrep "/usr/pgsql-12/bin/postgres" | grep -v grep

Check that the standby has lost connectivity:

2021-09-27 10:52:11.247 CEST [39480] FATAL:  could not connect to the primary server: could not connect to server: Connection refused
                Is the server running on host "10.90.50.123" and accepting
                TCP/IP connections on port 5432?

Make it a new primary:

postgres@psql-instance-02 ~/12/data/log $ psql 
psql (12.8)
Type "help" for help.
 
postgres=# SELECT pg_promote();
 pg_promote 
------------
 t
(1 row)
 
postgres=# SELECT pg_is_in_recovery();
 pg_is_in_recovery 
-------------------
 f
(1 row)
 
 
postgres=# SELECT pg_current_wal_lsn();
 pg_current_wal_lsn 
--------------------
 0/16000800
(1 row)
 
postgres=# insert into t1(a) values('THIS IS THE NEW PRIMARY') ;
INSERT 0 1
postgres=# insert into t1(a) values(now()) ;
INSERT 0 1
postgres=# select * from t1 ;
               a               
-------------------------------
 mongui
 THIS IS THE NEW PRIMARY
 THIS IS THE NEW PRIMARY
 2021-09-22 14:00:34.907455+02
 2021-09-22 14:01:30.690242+02
 2021-09-22 14:02:19.782789+02
 THIS IS THE NEW PRIMARY
 2021-09-27 10:54:36.314229+02
(8 rows)

Make the old primary a slave from the new primary:

postgres@psql-instance-03 ~/12/data $ touch standby.signal
postgres@psql-instance-03 ~/12/data $ pg_ctl -D /var/lib/pgsql/12/data -l logfile start
pg_ctl: another server might be running; trying to start server anyway
waiting for server to start.... done
server started
postgres@psql-instance-03 ~/12/data $ psql
psql (12.8)
Type "help" for help.
 
postgres=# SELECT pg_is_in_recovery();
 pg_is_in_recovery 
-------------------
 t
(1 row)

using pg_rewing

In the case of you have 2 timelines after a failover, you should use pg_rewind instead of a whole re-instantiation with pg_basebackup.
Syntax is quite simple:

.\pg_rewind.exe -D ..\data --source-server="user=postgres host=10.6.7.2 port=5432" -P --debug

It will check from differences physically in the datafiles and just modify data blocks that had changed.
Then you can start the slave again.
Destination server MUST BE STOPPED!

Sample code for auto-recover

let PGVERSION=12
 
if [ $PGVERSION -ge 12 ]; then
RECOVERYCONF=${FAILED_NODE_PGDATA}/myrecovery.conf
else
RECOVERYCONF=${FAILED_NODE_PGDATA}/recovery.conf
fi
 
## If Standby is running, synchronize it with the new Primary.
if [ $? -eq 0 ]; then
 
    logger -i -p local1.info follow_master.sh: pg_rewind for $FAILED_NODE_ID
 
    # Create replication slot "${FAILED_NODE_HOST}"
    ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@${NEW_MASTER_NODE_HOST} -i ~/.ssh/id_rsa_pgpool "
        ${PGHOME}/bin/psql -p ${NEW_MASTER_NODE_PORT} -c \"SELECT pg_create_physical_replication_slot('${SLOT_NAME}');\"
    "
 
    ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@${FAILED_NODE_HOST} -i ~/.ssh/id_rsa_pgpool "
 
        set -o errexit
 
        ${PGHOME}/bin/pg_ctl -w -m f -D ${FAILED_NODE_PGDATA} stop
 
        cat > ${RECOVERYCONF} << EOT
primary_conninfo = 'host=${NEW_MASTER_NODE_HOST} port=${NEW_MASTER_NODE_PORT} user=${REPLUSER} application_name=${FAILED_NODE_ID} passfile=''/var/lib/pgsql/.pgpass'''
recovery_target_timeline = 'latest'
restore_command = 'scp ${NEW_MASTER_NODE_HOST}:${ARCHIVEDIR}/%f %p'
primary_slot_name = '${SLOT_NAME}'
EOT
 
        if [ ${PGVERSION} -ge 12 ]; then
            touch ${FAILED_NODE_PGDATA}/standby.signal
        else
            echo \"standby_mode = 'on'\" >> ${RECOVERYCONF}
        fi
 
        ${PGHOME}/bin/pg_rewind -D ${FAILED_NODE_PGDATA} --source-server=\"user=postgres host=${NEW_MASTER_NODE_HOST} port=${NEW_MASTER_NODE_PORT}\"
 
    "
    if [ $? -ne 0 ]; then
        logger -i -p local1.error follow_master.sh: end: pg_rewind failed. Try pg_basebackup.
 
        ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@${FAILED_NODE_HOST} -i ~/.ssh/id_rsa_pgpool "
 
            set -o errexit
 
            # Execute pg_basebackup
            rm -rf ${FAILED_NODE_PGDATA}
            rm -rf ${ARCHIVEDIR}/*
            ${PGHOME}/bin/pg_basebackup -h ${NEW_MASTER_NODE_HOST} -U $REPLUSER -p ${NEW_MASTER_NODE_PORT} -D ${FAILED_NODE_PGDATA} -X stream
 
            if [ ${PGVERSION} -ge 12 ]; then
                sed -i -e \"\\\$ainclude_if_exists = '$(echo ${RECOVERYCONF} | sed -e 's/\//\\\//g')'\" \
                       -e \"/^include_if_exists = '$(echo ${RECOVERYCONF} | sed -e 's/\//\\\//g')'/d\" ${FAILED_NODE_PGDATA}/postgresql.conf
            fi
 
            cat > ${RECOVERYCONF} << EOT
primary_conninfo = 'host=${NEW_MASTER_NODE_HOST} port=${NEW_MASTER_NODE_PORT} user=${REPLUSER} application_name=${FAILED_NODE_ID} passfile=''/var/lib/pgsql/.pgpass'''
recovery_target_timeline = 'latest'
restore_command = 'scp ${NEW_MASTER_NODE_HOST}:${ARCHIVEDIR}/%f %p'
primary_slot_name = '${SLOT_NAME}'
EOT
 
            if [ ${PGVERSION} -ge 12 ]; then
                touch ${FAILED_NODE_PGDATA}/standby.signal
            else
                echo \"standby_mode = 'on'\" >> ${RECOVERYCONF}
            fi
        "
 
        if [ $? -ne 0 ]; then
            # drop replication slot
            ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@${NEW_MASTER_NODE_HOST} -i ~/.ssh/id_rsa_pgpool "
                ${PGHOME}/bin/psql -p ${NEW_MASTER_NODE_PORT} -c \"SELECT pg_drop_replication_slot('${SLOT_NAME}')\"
            "
 
            logger -i -p local1.error follow_master.sh: end: pg_basebackup failed
            exit 1
        fi
    fi
 
    # start Standby node on ${FAILED_NODE_HOST}
    ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null \
            postgres@${FAILED_NODE_HOST} -i ~/.ssh/id_rsa_pgpool $PGHOME/bin/pg_ctl -l /dev/null -w -D ${FAILED_NODE_PGDATA} start
 
    # If start Standby successfully, attach this node
    if [ $? -eq 0 ]; then
 
        # Run pcp_attact_node to attach Standby node to Pgpool-II.
        ${PGPOOL_PATH}/pcp_attach_node -w -h localhost -U $PCP_USER -p ${PCP_PORT} -n ${FAILED_NODE_ID}
 
        if [ $? -ne 0 ]; then
                logger -i -p local1.error follow_master.sh: end: pcp_attach_node failed
                exit 1
        fi
 
    # If start Standby failed, drop replication slot "${FAILED_NODE_HOST}"
    else
 
        ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@${NEW_MASTER_NODE_HOST} -i ~/.ssh/id_rsa_pgpool \
        "${PGHOME}/bin/psql -p ${NEW_MASTER_NODE_PORT} -c \"SELECT pg_drop_replication_slot('${SLOT_NAME}')\""
 
        logger -i -p local1.error follow_master.sh: end: follow master command failed
        exit 1
    fi
 
else
    logger -i -p local1.info follow_master.sh: failed_nod_id=${FAILED_NODE_ID} is not running. skipping follow master command
    exit 0
fi
dba/postgresql/a_tale_of_postgresql.txt · Last modified: 2022/02/11 11:36 by 127.0.0.1