Postgres Ops

Running PostgreSQL in production: install, configure, authenticate, tune, find slow queries, vacuum sensibly, replicate, fail over, and upgrade across majors without drama.

If you only remember six things
  • shared_buffers ≈ 25% of RAM, effective_cache_size ≈ 60-75%. These are separate numbers; don't confuse them.
  • max_connections is a cap, not a target. Put pgbouncer in front and keep the DB number small.
  • pg_stat_activity and pg_stat_statements answer "what is this database doing right now" and "what does it do on average".
  • Autovacuum is not optional. If you've turned it off, you will learn about XID wraparound the hard way.
  • Streaming replication needs a replication role, a slot, and matching pg_hba.conf. Logical replication needs a publication and a subscription.
  • pg_upgrade --link between majors takes seconds; test the path in a throwaway copy first.

Install & initdb

On RHEL-family systems use the PGDG repository rather than the AppStream version — it stays current and supports side-by-side majors.

sudo dnf install -y https://download.postgresql.org/pub/repos/yum/reporpms/EL-9-x86_64/pgdg-redhat-repo-latest.noarch.rpm
sudo dnf -qy module disable postgresql
sudo dnf install -y postgresql16-server postgresql16-contrib

sudo /usr/pgsql-16/bin/postgresql-16-setup initdb
sudo systemctl enable --now postgresql-16

The contrib package ships extensions you will almost certainly want later: pg_stat_statements, pgcrypto, postgres_fdw.

Data directory defaults to /var/lib/pgsql/16/data/. If you mean to put it on a dedicated volume, pass PGSETUP_INITDB_OPTIONS="-D /srv/pgdata" and edit systemctl edit postgresql-16 to match before you run initdb.

postgresql.conf: the knobs that matter

Defaults are conservative enough to start on a Raspberry Pi. On real hardware, touch these six:

ParameterTypical valueNotes
shared_buffers25% of RAMPostgres's own page cache. More is not always better past ~40%.
effective_cache_size60-75% of RAMA hint to the planner, not an allocation. Tells it "this much memory is realistically available for caching".
work_mem16-64 MBPer-operation. Two sorts in one query = 2×. Many connections = many ×.
maintenance_work_mem512 MB - 2 GBVACUUM, CREATE INDEX, ALTER TABLE. Can be generous — one process at a time typically.
max_connections100-200Not a scaling dial. See next section.
wal_compressiononCheap CPU for cheap disk on a busy cluster.
checkpoint_timeout15minPaired with max_wal_size. Smoother checkpoints, bigger WAL.
# /var/lib/pgsql/16/data/postgresql.conf (excerpt)
listen_addresses = '*'
max_connections = 200
shared_buffers = 8GB
effective_cache_size = 24GB
work_mem = 32MB
maintenance_work_mem = 1GB
wal_level = replica
max_wal_size = 8GB
min_wal_size = 2GB
wal_compression = on
checkpoint_timeout = 15min
random_page_cost = 1.1           # SSD; leave at 4.0 for spinning disks
effective_io_concurrency = 200   # SSD
log_min_duration_statement = 500 # log everything slower than 500ms
log_line_prefix = '%m [%p] %u@%d '
shared_preload_libraries = 'pg_stat_statements'

After editing, some parameters need a restart (shared_buffers, max_connections, shared_preload_libraries); others take SELECT pg_reload_conf();. Check with:

SELECT name, context FROM pg_settings WHERE name IN
  ('shared_buffers','work_mem','max_connections','log_min_duration_statement');
-- context = 'postmaster' means restart; 'sighup' means reload.

max_connections vs pgbouncer

Each Postgres connection is a process with its own memory footprint (8-20 MB before any query runs). Setting max_connections = 2000 to "scale" your app tier will starve the box before any query runs.

The real answer is a pool between the app and the database:

# /etc/pgbouncer/pgbouncer.ini
[databases]
appdb = host=127.0.0.1 port=5432 dbname=appdb

[pgbouncer]
listen_addr = 0.0.0.0
listen_port = 6432
auth_type = scram-sha-256
auth_file = /etc/pgbouncer/userlist.txt
pool_mode = transaction          # sessions for long-lived, transaction for web apps
max_client_conn = 2000           # the app sees this
default_pool_size = 25           # Postgres sees roughly this per db/user
reserve_pool_size = 5
server_idle_timeout = 600
transaction pooling breaks anything that relies on session state: SET, LISTEN/NOTIFY, advisory locks spanning transactions, prepared statements (pre-PG14). If your app uses them, use session mode or move those features out.

pg_hba.conf authentication

Read top to bottom. First matching row wins. Order matters.

# TYPE  DATABASE        USER            ADDRESS                 METHOD
local   all             postgres                                peer
local   all             all                                     scram-sha-256
host    all             all             127.0.0.1/32            scram-sha-256
host    all             all             10.0.0.0/16             scram-sha-256
hostssl appdb           appuser         0.0.0.0/0               scram-sha-256
host    replication     repl            10.0.1.0/24             scram-sha-256

Roles, users, grants

-- A "user" is just a role with LOGIN.
CREATE ROLE app_rw LOGIN PASSWORD 'redacted';
CREATE ROLE app_ro LOGIN PASSWORD 'redacted';
CREATE ROLE app_group NOLOGIN;

-- Grant table/privilege membership via a group role, not to individuals.
GRANT app_group TO app_rw, app_ro;

-- The app database, owned by a dedicated role (never postgres).
CREATE DATABASE appdb OWNER app_group;

\c appdb
REVOKE ALL ON SCHEMA public FROM PUBLIC;
GRANT  USAGE ON SCHEMA public TO app_group;
GRANT  SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA public TO app_rw;
GRANT  SELECT                          ON ALL TABLES IN SCHEMA public TO app_ro;

-- And for tables that don't exist yet:
ALTER DEFAULT PRIVILEGES IN SCHEMA public
  GRANT SELECT, INSERT, UPDATE, DELETE ON TABLES TO app_rw;
ALTER DEFAULT PRIVILEGES IN SCHEMA public
  GRANT SELECT ON TABLES TO app_ro;
Never let the app connect as postgres. A compromised app process with superuser becomes a compromised database cluster.

Finding slow queries

Right now

SELECT pid, now() - query_start AS runtime, state, wait_event_type, wait_event,
       left(query, 120) AS query
FROM   pg_stat_activity
WHERE  state != 'idle'
ORDER  BY runtime DESC NULLS LAST
LIMIT  20;

-- Kill politely (cancel current statement, keep connection):
SELECT pg_cancel_backend(12345);
-- Kill hard (terminate backend):
SELECT pg_terminate_backend(12345);

On average

-- Needs shared_preload_libraries = 'pg_stat_statements' and a restart.
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

SELECT substr(query,1,80) AS q,
       calls,
       round(total_exec_time::numeric, 0)  AS total_ms,
       round(mean_exec_time::numeric, 1)   AS mean_ms,
       rows
FROM   pg_stat_statements
ORDER  BY total_exec_time DESC
LIMIT  20;

Actually reading a plan

EXPLAIN (ANALYZE, BUFFERS) SELECT ...;
-- ANALYZE actually runs the query; BUFFERS shows hit vs read (cache vs disk).
-- A sequential scan on a 10-row lookup table is fine. On a 10M-row table it's a bug.

ANALYZE, VACUUM, autovacuum

Postgres uses MVCC: updates leave dead row versions behind. VACUUM reclaims them; ANALYZE refreshes the planner's statistics. Autovacuum runs both, conservatively, on a schedule driven by per-table thresholds.

-- Inspect: what's old and bloated?
SELECT relname,
       n_live_tup, n_dead_tup,
       last_autovacuum, last_autoanalyze
FROM   pg_stat_user_tables
ORDER  BY n_dead_tup DESC
LIMIT  20;

On hot tables (write-heavy, millions of rows), global defaults are too lazy. Tune per-table:

ALTER TABLE orders SET (
  autovacuum_vacuum_scale_factor = 0.02,   -- vacuum at 2% dead tuples, not 20%
  autovacuum_analyze_scale_factor = 0.01,
  autovacuum_vacuum_cost_limit = 2000      -- let it go faster
);
XID wraparound. Postgres uses 32-bit transaction IDs. If autovacuum never catches up, the cluster refuses writes to protect data. Monitor SELECT datname, age(datfrozenxid) FROM pg_database;. Anything above 1 billion is a warning; 2 billion is an outage.

Replication: streaming and logical

Streaming (physical) replication

Byte-for-byte copy, entire cluster, read-only replica. The standard HA primitive.

-- On the primary:
CREATE ROLE repl WITH REPLICATION LOGIN PASSWORD 'redacted';
-- postgresql.conf
--   wal_level = replica
--   max_wal_senders = 10
--   max_replication_slots = 10
-- pg_hba.conf:
--   host  replication  repl  10.0.1.0/24  scram-sha-256
SELECT pg_create_physical_replication_slot('replica1');
# On the replica, as the postgres user:
sudo -u postgres pg_basebackup \
  -h primary.db -U repl \
  -D /var/lib/pgsql/16/data \
  -Fp -Xs -P -R --slot=replica1
# The -R flag writes these for you into postgresql.auto.conf:
primary_conninfo = 'host=primary.db user=repl password=... application_name=replica1'
primary_slot_name = 'replica1'
# and creates an empty standby.signal file.
-- On the primary, inspect replication state:
SELECT client_addr, state, sync_state,
       pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn) AS lag_bytes
FROM   pg_stat_replication;

Logical replication

Table-level, SQL-level replay. Use for cross-version migrations, table subsets, or moving to a different schema.

-- Publisher (primary):
ALTER SYSTEM SET wal_level = 'logical';  -- needs restart
CREATE PUBLICATION pub_orders FOR TABLE orders, order_items;

-- Subscriber (other cluster):
CREATE SUBSCRIPTION sub_orders
  CONNECTION 'host=primary.db dbname=appdb user=repl password=...'
  PUBLICATION pub_orders;

-- Monitor:
SELECT subname, received_lsn, latest_end_lsn FROM pg_stat_subscription;
Logical replication replays DML, not DDL. Run schema changes on both sides before you apply them to the app.

Failover & HA

Manual promotion

sudo -u postgres /usr/pgsql-16/bin/pg_ctl promote -D /var/lib/pgsql/16/data
# equivalently:
sudo -u postgres psql -c "SELECT pg_promote();"

Manual promotion is a decision a human should make when they understand why the primary is gone. Automated promotion is what you want for production, but it requires fencing, quorum, and a leader election you can trust.

ToolNicheGotchas
Patronietcd/consul/k8s-backed, widely used, good integration with HAProxyNeeds a solid DCS; split-brain if DCS flaps
repmgrLighter, no DCS, fine for small fleetsWitness node essential to avoid split-brain
pg_auto_failoverOpinionated, monitor+keeper modelOne monitor is a SPOF unless you HA it

Major version upgrades

Minor upgrades (16.3 → 16.4) are package upgrades and a restart. Major upgrades (15 → 16) change on-disk format. Three options:

  1. pg_dumpall | psql — simple, slow, requires downtime proportional to data size.
  2. pg_upgrade --link — seconds to minutes, hard-links data files into the new cluster.
  3. Logical replication — near-zero downtime, works cross-version, more moving parts.
# pg_upgrade with hard links. Both major versions must be installed.
sudo systemctl stop postgresql-15 postgresql-16

sudo -u postgres /usr/pgsql-16/bin/pg_upgrade \
  --old-bindir=/usr/pgsql-15/bin \
  --new-bindir=/usr/pgsql-16/bin \
  --old-datadir=/var/lib/pgsql/15/data \
  --new-datadir=/var/lib/pgsql/16/data \
  --link \
  --check          # dry-run first; remove --check to actually migrate

sudo systemctl start postgresql-16
sudo -u postgres /usr/pgsql-16/bin/vacuumdb --all --analyze-in-stages
--link means the old cluster is not startable once upgrade succeeds — the files are now owned by the new cluster. Take a filesystem snapshot or backup first.

Common gotchas

SymptomLikely causeFix
"FATAL: sorry, too many clients"App not using a poolpgbouncer in transaction mode
Sudden massive WAL growth, disk fillsUnused replication slot kept laggingSELECT pg_drop_replication_slot('name');
Queries slow after a big importPlanner stats are staleANALYZE table; or wait for autovacuum
Seq scan where an index should applyrandom_page_cost still 4.0 on SSDLower to 1.1
"canceling statement due to conflict with recovery" on standbyLong read conflicts with replayhot_standby_feedback = on or max_standby_streaming_delay
Locale/collation warning after restoreOS libc upgraded, indexes need rebuildREINDEX DATABASE appdb;

See also: Postgres Backup, Backup & Restore.