dba:postgresql:a_tale_of_postgresql_the_second
Differences
This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
dba:postgresql:a_tale_of_postgresql_the_second [2023/10/24 09:21] – created dodger | dba:postgresql:a_tale_of_postgresql_the_second [2023/10/25 06:45] (current) – removed dodger | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | # A Thale(s) of PostgreSql (the 2nd) | ||
- | |||
- | # Servers | ||
- | |||
- | | ip | role | | ||
- | |----|------| | ||
- | | `10.3.0.3` | primary | | ||
- | | `10.6.1.3` | backup | | ||
- | |||
- | # The problem | ||
- | |||
- | Just open the log give us information about the error: | ||
- | ``` | ||
- | The network name cannot be found. | ||
- | 2023-10-19 09: | ||
- | 2023-10-19 09: | ||
- | 2023-10-19 09: | ||
- | ``` | ||
- | |||
- | The command executed by `archive_command` is failing. | ||
- | The `archive_command` on PostgreSql main configuration file `postgresql.conf` is set as: | ||
- | ```config | ||
- | archive_command = 'copy " | ||
- | ``` | ||
- | |||
- | Additionally, | ||
- | ``` | ||
- | The network name cannot be found. | ||
- | ``` | ||
- | |||
- | Without connecting again, in my oppinion, the ip address: `10.6.1.3` (which is the _backup_ server), either is *not* accessible via windows shared drive or the shared drive `\\10.6.1.3\archivelog` is full (0% space available). | ||
- | |||
- | Anyway, the result of the `archive_command` is _ERROR_. And the default behaviour of PostgreSql in such case is *DON' | ||
- | |||
- | PostgreSql [Documentation](https:// | ||
- | > While designing your archiving setup, consider what will happen if the archive command fails repeatedly because some aspect requires operator intervention or the archive runs out of space. For example, this could occur if you write to tape without an autochanger; | ||
- | |||
- | |||
- | That's the cause of the problems you're facing. As the result of `archive_command` fail, WAL are not being automatically deleted then the disk will run out of space. PostgreSql is waiting for manual intervention so `archive_command` ends with _OK_ and it can continue deleting old WAL files. | ||
- | |||
- | |||
- | |||
- | # Solutions | ||
- | |||
- | ## Solution 1: Solve the connectivity problems | ||
- | |||
- | First solution should be fast and doesn' | ||
- | You have to check: | ||
- | * Connectivity from _primary_ to the shared Network drive: `\\10.6.1.3\archivelog` | ||
- | * Disk space available on the _backup_ server in the underlying drive of the shared network drive `\\10.6.1.3\archivelog` | ||
- | |||
- | ### Additional actions on this solution | ||
- | |||
- | To avoid possible future failures. | ||
- | |||
- | Add check to the monitorization tool to check: | ||
- | * Connectivity from _primary_ to the shared Network drive | ||
- | * Disk space available on the _backup_ server in the underlying drive of the shared network drive | ||
- | |||
- | |||
- | ## Solution 2: Change archive_command | ||
- | |||
- | This solution will involve PostgreSql server to be restarted to apply changes. | ||
- | |||
- | Modify `archive_command` to copy WAL files locally: | ||
- | ```conf | ||
- | archive_command = 'copy " | ||
- | ``` | ||
- | Create the destination directory: | ||
- | ```powershell | ||
- | mkdir C: | ||
- | ``` | ||
- | And restart PostgreSql. | ||
- | |||
- | *Note*: This solution is not recommended because you won't have cross-repliaction. That is, in our initial recomendations we suggest that _primary_ server copies the WAL files to the _backup_ server to be able to execute a Point-in-time-recover (PITR), that is a *best-practice*. | ||
- | |||
- | |||
- | # Thales Hyderabad info | ||
- | |||
- | ``` | ||
- | Archive: | ||
- | Written using Zippy | ||
- | Length | ||
- | --------- | ||
- | 9315434 | ||
- | 5903497 | ||
- | --------- | ||
- | | ||
- | ``` | ||
- | |||
- | |||
dba/postgresql/a_tale_of_postgresql_the_second.1698139289.txt.gz · Last modified: 2023/10/24 09:21 by dodger