Table of Contents
A Thale(s) of PostgreSql
setup
Master
postgresql.conf
Sample main config file:
- postgresql.conf
listen_addresses = '*' # what IP address(es) to listen on; port = 5432 # (change requires restart) max_connections = 100 # (change requires restart) shared_buffers = 128MB # min 128kB dynamic_shared_memory_type = windows # the default is the first option wal_level = replica # minimal, replica, or logical wal_log_hints = on # also do full page writes of non-critical updates max_wal_size = 1GB min_wal_size = 80MB archive_mode = on # enables archiving; off, on, or always archive_command = 'copy "%p" "\\\\server02\\\archivelog\\%f"' archive_timeout = 300 # force a logfile segment switch after this max_wal_senders = 3 # max number of walsender processes wal_keep_segments = 64 # in logfile segments; 0 disables hot_standby = on # "off" disallows queries during recovery hot_standby_feedback = on # send info from standby to prevent log_destination = 'stderr' # Valid values are combinations of logging_collector = on # Enable capturing of stderr and csvlog log_directory = 'log' # directory where log files are written, log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log' # log file name pattern, log_file_mode = 0640 # creation mode for log files, log_timezone = 'Europe/London' datestyle = 'iso, dmy' timezone = 'Europe/London' lc_messages = 'Portuguese_Portugal.1252' # locale for system error message lc_monetary = 'Portuguese_Portugal.1252' # locale for monetary formatting lc_numeric = 'Portuguese_Portugal.1252' # locale for number formatting lc_time = 'Portuguese_Portugal.1252' # locale for time formatting default_text_search_config = 'pg_catalog.portuguese'
Config values that should be changed or (at least) readen:
listen_address
andport
max_connections
archive_command
- ALL logging parameters (
log_*
) - localization parameters
pg_hba.conf
Authentication and authorization file:
- pg_hba.conf
host all all 127.0.0.1/32 trust # next line MUST NOT be used on production host all all all trust host all all ::1/128 trust # next line MUST NOT be used on production host replication rep_user all trust host replication all 127.0.0.1/32 trust host replication all ::1/128 trust # next line MUST NOT be used on production host replication all all trust
Our suggestion is use either password
or md5
authentication and allow connection only for the server pairs (that is Appserver1 and Appserver2).
Replication and failover process
Slave
re-instantiate
1st instantiation/re-instantiate:
cd \ cd APIS\Postgre12\bin .\pg_basebackup.exe --host=10.6.7.1 --port=5432 --username=postgres --pgdata=C:\APIS\Postgres12\data -P -v -R -X stream
The above command explained:
-h | specifies the IP address of the CentOS Master server (replace 10.6.7.1 with IP Address of your Slave server) |
-D | specifies the data directory on Slave server, usually: C:\APIS\Postgres12\data |
-U | specifies the connection user (we use postgres, user, here, but you can and should create a dedicated user name. I used ‘'postgres'’ user to illustrate the steps, albeit if this is not a production server and you don’t mind it, you can set it up this way) |
-P | show progress reporting |
-v | use verbose mode |
-R | enable the creation of recovery config or in other words, this appends connection settings to postgresql.auto.conf and creates a standby.signal file in the data folder of your server |
-X | include the WAL files (write-ahead log files) in the backup. We use the '‘stream'’ option here to stream the WAL during the backup process |
Startup
This should be use the 1st time or for testing, then you can start the slave with the Windows Services tool:
.\pg_ctl.exe -D c:\APIS\Postgres12\data start
Check the replication in the slave:
SELECT pg_is_in_recovery();
It should report t
.
SELECT * FROM pg_stat_wal_receiver;
It should give you information about the streaming replication.
Check the replication in the master:
SELECT * FROM pg_stat_replication ;
Sample failover
Check the instances:
postgres@psql-instance-02 ~/12/data/log $ psql psql (12.8) Type "help" for help. postgres=# SELECT pg_is_in_recovery(); pg_is_in_recovery ------------------- t (1 row)
And:
postgres@psql-instance-03 ~/12/data/log $ psql psql (12.8) Type "help" for help. postgres=# SELECT pg_is_in_recovery(); pg_is_in_recovery ------------------- f (1 row)
hardcore kill the primary:
postgres@psql-instance-03 ~/12/data/log $ ps waux |egrep "/usr/pgsql-12/bin/postgres" | grep -v grep postgres 31541 0.0 0.2 397492 17536 ? Ss Sep22 0:05 /usr/pgsql-12/bin/postgres -D /var/lib/pgsql/12/data postgres@psql-instance-03 ~/12/data/log $ kill -9 31541 postgres@psql-instance-03 ~/12/data/log $ ps waux |egrep "/usr/pgsql-12/bin/postgres" | grep -v grep
Check that the standby has lost connectivity:
2021-09-27 10:52:11.247 CEST [39480] FATAL: could not connect to the primary server: could not connect to server: Connection refused Is the server running on host "10.90.50.123" and accepting TCP/IP connections on port 5432?
Make it a new primary:
postgres@psql-instance-02 ~/12/data/log $ psql psql (12.8) Type "help" for help. postgres=# SELECT pg_promote(); pg_promote ------------ t (1 row) postgres=# SELECT pg_is_in_recovery(); pg_is_in_recovery ------------------- f (1 row) postgres=# SELECT pg_current_wal_lsn(); pg_current_wal_lsn -------------------- 0/16000800 (1 row) postgres=# insert into t1(a) values('THIS IS THE NEW PRIMARY') ; INSERT 0 1 postgres=# insert into t1(a) values(now()) ; INSERT 0 1 postgres=# select * from t1 ; a ------------------------------- mongui THIS IS THE NEW PRIMARY THIS IS THE NEW PRIMARY 2021-09-22 14:00:34.907455+02 2021-09-22 14:01:30.690242+02 2021-09-22 14:02:19.782789+02 THIS IS THE NEW PRIMARY 2021-09-27 10:54:36.314229+02 (8 rows)
Make the old primary a slave from the new primary:
postgres@psql-instance-03 ~/12/data $ touch standby.signal postgres@psql-instance-03 ~/12/data $ pg_ctl -D /var/lib/pgsql/12/data -l logfile start pg_ctl: another server might be running; trying to start server anyway waiting for server to start.... done server started postgres@psql-instance-03 ~/12/data $ psql psql (12.8) Type "help" for help. postgres=# SELECT pg_is_in_recovery(); pg_is_in_recovery ------------------- t (1 row)
using pg_rewing
In the case of you have 2 timelines after a failover, you should use pg_rewind
instead of a whole re-instantiation with pg_basebackup
.
Syntax is quite simple:
.\pg_rewind.exe -D ..\data --source-server="user=postgres host=10.6.7.2 port=5432" -P --debug
It will check from differences physically in the datafiles and just modify data blocks that had changed.
Then you can start the slave again.
Destination server MUST BE STOPPED!
Sample code for auto-recover
let PGVERSION=12 if [ $PGVERSION -ge 12 ]; then RECOVERYCONF=${FAILED_NODE_PGDATA}/myrecovery.conf else RECOVERYCONF=${FAILED_NODE_PGDATA}/recovery.conf fi ## If Standby is running, synchronize it with the new Primary. if [ $? -eq 0 ]; then logger -i -p local1.info follow_master.sh: pg_rewind for $FAILED_NODE_ID # Create replication slot "${FAILED_NODE_HOST}" ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@${NEW_MASTER_NODE_HOST} -i ~/.ssh/id_rsa_pgpool " ${PGHOME}/bin/psql -p ${NEW_MASTER_NODE_PORT} -c \"SELECT pg_create_physical_replication_slot('${SLOT_NAME}');\" " ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@${FAILED_NODE_HOST} -i ~/.ssh/id_rsa_pgpool " set -o errexit ${PGHOME}/bin/pg_ctl -w -m f -D ${FAILED_NODE_PGDATA} stop cat > ${RECOVERYCONF} << EOT primary_conninfo = 'host=${NEW_MASTER_NODE_HOST} port=${NEW_MASTER_NODE_PORT} user=${REPLUSER} application_name=${FAILED_NODE_ID} passfile=''/var/lib/pgsql/.pgpass''' recovery_target_timeline = 'latest' restore_command = 'scp ${NEW_MASTER_NODE_HOST}:${ARCHIVEDIR}/%f %p' primary_slot_name = '${SLOT_NAME}' EOT if [ ${PGVERSION} -ge 12 ]; then touch ${FAILED_NODE_PGDATA}/standby.signal else echo \"standby_mode = 'on'\" >> ${RECOVERYCONF} fi ${PGHOME}/bin/pg_rewind -D ${FAILED_NODE_PGDATA} --source-server=\"user=postgres host=${NEW_MASTER_NODE_HOST} port=${NEW_MASTER_NODE_PORT}\" " if [ $? -ne 0 ]; then logger -i -p local1.error follow_master.sh: end: pg_rewind failed. Try pg_basebackup. ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@${FAILED_NODE_HOST} -i ~/.ssh/id_rsa_pgpool " set -o errexit # Execute pg_basebackup rm -rf ${FAILED_NODE_PGDATA} rm -rf ${ARCHIVEDIR}/* ${PGHOME}/bin/pg_basebackup -h ${NEW_MASTER_NODE_HOST} -U $REPLUSER -p ${NEW_MASTER_NODE_PORT} -D ${FAILED_NODE_PGDATA} -X stream if [ ${PGVERSION} -ge 12 ]; then sed -i -e \"\\\$ainclude_if_exists = '$(echo ${RECOVERYCONF} | sed -e 's/\//\\\//g')'\" \ -e \"/^include_if_exists = '$(echo ${RECOVERYCONF} | sed -e 's/\//\\\//g')'/d\" ${FAILED_NODE_PGDATA}/postgresql.conf fi cat > ${RECOVERYCONF} << EOT primary_conninfo = 'host=${NEW_MASTER_NODE_HOST} port=${NEW_MASTER_NODE_PORT} user=${REPLUSER} application_name=${FAILED_NODE_ID} passfile=''/var/lib/pgsql/.pgpass''' recovery_target_timeline = 'latest' restore_command = 'scp ${NEW_MASTER_NODE_HOST}:${ARCHIVEDIR}/%f %p' primary_slot_name = '${SLOT_NAME}' EOT if [ ${PGVERSION} -ge 12 ]; then touch ${FAILED_NODE_PGDATA}/standby.signal else echo \"standby_mode = 'on'\" >> ${RECOVERYCONF} fi " if [ $? -ne 0 ]; then # drop replication slot ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@${NEW_MASTER_NODE_HOST} -i ~/.ssh/id_rsa_pgpool " ${PGHOME}/bin/psql -p ${NEW_MASTER_NODE_PORT} -c \"SELECT pg_drop_replication_slot('${SLOT_NAME}')\" " logger -i -p local1.error follow_master.sh: end: pg_basebackup failed exit 1 fi fi # start Standby node on ${FAILED_NODE_HOST} ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null \ postgres@${FAILED_NODE_HOST} -i ~/.ssh/id_rsa_pgpool $PGHOME/bin/pg_ctl -l /dev/null -w -D ${FAILED_NODE_PGDATA} start # If start Standby successfully, attach this node if [ $? -eq 0 ]; then # Run pcp_attact_node to attach Standby node to Pgpool-II. ${PGPOOL_PATH}/pcp_attach_node -w -h localhost -U $PCP_USER -p ${PCP_PORT} -n ${FAILED_NODE_ID} if [ $? -ne 0 ]; then logger -i -p local1.error follow_master.sh: end: pcp_attach_node failed exit 1 fi # If start Standby failed, drop replication slot "${FAILED_NODE_HOST}" else ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@${NEW_MASTER_NODE_HOST} -i ~/.ssh/id_rsa_pgpool \ "${PGHOME}/bin/psql -p ${NEW_MASTER_NODE_PORT} -c \"SELECT pg_drop_replication_slot('${SLOT_NAME}')\"" logger -i -p local1.error follow_master.sh: end: follow master command failed exit 1 fi else logger -i -p local1.info follow_master.sh: failed_nod_id=${FAILED_NODE_ID} is not running. skipping follow master command exit 0 fi