PostgreSQL HA: Streaming Replication with Automatic Failover

Introduction

High availability (HA) is a foundational requirement for production PostgreSQL deployments. In this post I'll walk you through a practical, production-ready approach to PostgreSQL HA using streaming replication for data continuity plus automatic failover for resilient operations. You'll get configuration examples, commands, and recommended settings so your database cluster can survive failures with minimal downtime.

Why streaming replication + automatic failover?

Streaming replication provides continuous shipping of WAL (Write-Ahead Log) records from a primary to one or more replicas. That guarantees near real-time data replication. Automatic failover tools such as Patroni or repmgr remove human intervention from the process of promoting a replica when the primary fails, restoring write capability quickly and consistently.

Architecture overview

A common HA architecture includes:

  • Primary node (read/write)
  • One or more replicas (read-only, warm or hot)
  • Distributed consensus/leader election layer (etcd/consul/Zookeeper)
  • Failover manager (Patroni or repmgrd) which integrates with the consensus layer

Prerequisites

  • PostgreSQL 12+ installed on all nodes
  • Network connectivity between nodes on the PostgreSQL port (default 5432)
  • An etcd/consul cluster or equivalent for leader election (for Patroni)
  • A PostgreSQL superuser for configuration tasks

Step-by-step setup

1. Configure the primary

Edit postgresql.conf on the primary and set replication-related parameters:

wal_level = replicamax_wal_senders = 10max_replication_slots = 10archive_mode = onarchive_command = 'test ! -f /var/lib/postgresql/wal_archive/%f && cp %p /var/lib/postgresql/wal_archive/%f'synchronous_commit = onsynchronous_standby_names = '*'

Allow the replication user in pg_hba.conf (adjust CIDR to your network):

# allow replication connections from replicashost  replication  replicator  10.0.0.0/24  md5

Create a replication role:

CREATE ROLE replicator WITH REPLICATION LOGIN PASSWORD 'strong_password';

2. Take a base backup for the replica

The simplest way to provision a replica is to use pg_basebackup:

pg_basebackup -h primary.example.com -D /var/lib/postgresql/12/main -U replicator -v -P --wal-method=stream

For PostgreSQL 12 and later, after the base backup completes create an empty file named standby.signal inside the replica's $PGDATA and add the primary connection info to postgresql.conf or postgresql.auto.conf:

# in postgresql.conf on the replicaprimary_conninfo = 'host=primary.example.com port=5432 user=replicator password=strong_password'

Start PostgreSQL on the replica and confirm replication with:

SELECT client_addr, state, sync_state FROM pg_stat_replication;

Create a replication slot on the primary to prevent WAL from being removed before replicas consume it:

SELECT * FROM pg_create_physical_replication_slot('replica_slot_1');

Configure the replica to use the slot by adding primary_slot_name = 'replica_slot_1' to primary_conninfo or manage slots via Patroni.

Automatic failover with Patroni

Patroni is a lightweight orchestrator that uses a distributed consensus store (etcd/consul) to manage leader election and coordinates PostgreSQL configuration changes for failover. Below is a minimal Patroni YAML snippet for a node:

scope: postgres_clusternamespace: /service/name: node1etcd:  host: 10.0.0.1:2379bootstrap:  dcs:   ttl: 30  initdb:  - encoding: UTF8

Start Patroni as a system service on each node. Patroni will handle base backups (using pg_basebackup), create standby signal files, manage replication slots, and perform automatic failover using the consensus store.

Testing failover and recovery

1. Simulate primary failure (stop Patroni or PostgreSQL on primary):

systemctl stop patroni

2. Patroni will detect the failure and elect a new leader. Verify the new primary:

patronictl -c /etc/patroni.yml list

3. When the old primary returns, Patroni will rejoin it as a replica (unless configured otherwise).

Operational tips and best practices

  • Use multiple replicas across different availability zones to avoid correlated failures.
  • Monitor replication lag via pg_stat_replication and alerts for WAL accumulation.
  • Secure replication users and network traffic (use TLS for Patroni and PostgreSQL connections where possible).
  • Test failover regularly in a staging environment to verify application resilience.
  • Consider WAL archiving (WAL-G / WAL-E) for PITR and longer retention.

When to choose managed solutions

Running and operating HA PostgreSQL with streaming replication and Patroni is reliable, but it requires operational maturity. If you prefer a managed approach, platforms like Worlber Carbonate or Quick Deploy provide managed PostgreSQL and DaaS options that include encryption, backup, and HA orchestration handled for you. For teams that want transparent data encryption in addition to HA, Worlber's PGEE integrates with PostgreSQL deployments to secure data at rest.

Conclusion

Implementing streaming replication plus an automatic failover manager such as Patroni gives you a robust, production-ready PostgreSQL HA solution. Use replication slots and WAL archiving to protect against data loss, test failovers frequently, and monitor replication lag closely. If you need help designing or operating PostgreSQL HA in Saudi Arabia or the Middle East, Worlber's team of senior DBAs can help accelerate the journey from architecture to production.


Ready to Transform Your Database Infrastructure?

Worlber helps companies across Saudi Arabia and the Middle East build reliable, secure, and cost-effective database solutions.

📧 Email us: contactus@worlber.com

🌐 Visit: worlber.sa/contact

📞 Call: Talk to our database experts today

Worlber - Your trusted PostgreSQL partner in the Middle East

Read more