Postgres Ops
shared_buffers≈ 25% of RAM,effective_cache_size≈ 60-75%. These are separate numbers; don't confuse them.max_connectionsis a cap, not a target. Put pgbouncer in front and keep the DB number small.pg_stat_activityandpg_stat_statementsanswer "what is this database doing right now" and "what does it do on average".- Autovacuum is not optional. If you've turned it off, you will learn about XID wraparound the hard way.
- Streaming replication needs a
replicationrole, a slot, and matchingpg_hba.conf. Logical replication needs a publication and a subscription. pg_upgrade --linkbetween majors takes seconds; test the path in a throwaway copy first.
Install & initdb
On RHEL-family systems use the PGDG repository rather than the AppStream version — it stays current and supports side-by-side majors.
sudo dnf install -y https://download.postgresql.org/pub/repos/yum/reporpms/EL-9-x86_64/pgdg-redhat-repo-latest.noarch.rpm
sudo dnf -qy module disable postgresql
sudo dnf install -y postgresql16-server postgresql16-contrib
sudo /usr/pgsql-16/bin/postgresql-16-setup initdb
sudo systemctl enable --now postgresql-16
The contrib package ships extensions you will almost certainly want later: pg_stat_statements, pgcrypto, postgres_fdw.
/var/lib/pgsql/16/data/. If you mean to put it on a dedicated volume, pass PGSETUP_INITDB_OPTIONS="-D /srv/pgdata" and edit systemctl edit postgresql-16 to match before you run initdb.
postgresql.conf: the knobs that matter
Defaults are conservative enough to start on a Raspberry Pi. On real hardware, touch these six:
| Parameter | Typical value | Notes |
|---|---|---|
shared_buffers | 25% of RAM | Postgres's own page cache. More is not always better past ~40%. |
effective_cache_size | 60-75% of RAM | A hint to the planner, not an allocation. Tells it "this much memory is realistically available for caching". |
work_mem | 16-64 MB | Per-operation. Two sorts in one query = 2×. Many connections = many ×. |
maintenance_work_mem | 512 MB - 2 GB | VACUUM, CREATE INDEX, ALTER TABLE. Can be generous — one process at a time typically. |
max_connections | 100-200 | Not a scaling dial. See next section. |
wal_compression | on | Cheap CPU for cheap disk on a busy cluster. |
checkpoint_timeout | 15min | Paired with max_wal_size. Smoother checkpoints, bigger WAL. |
# /var/lib/pgsql/16/data/postgresql.conf (excerpt)
listen_addresses = '*'
max_connections = 200
shared_buffers = 8GB
effective_cache_size = 24GB
work_mem = 32MB
maintenance_work_mem = 1GB
wal_level = replica
max_wal_size = 8GB
min_wal_size = 2GB
wal_compression = on
checkpoint_timeout = 15min
random_page_cost = 1.1 # SSD; leave at 4.0 for spinning disks
effective_io_concurrency = 200 # SSD
log_min_duration_statement = 500 # log everything slower than 500ms
log_line_prefix = '%m [%p] %u@%d '
shared_preload_libraries = 'pg_stat_statements'
After editing, some parameters need a restart (shared_buffers, max_connections, shared_preload_libraries); others take SELECT pg_reload_conf();. Check with:
SELECT name, context FROM pg_settings WHERE name IN
('shared_buffers','work_mem','max_connections','log_min_duration_statement');
-- context = 'postmaster' means restart; 'sighup' means reload.
max_connections vs pgbouncer
Each Postgres connection is a process with its own memory footprint (8-20 MB before any query runs). Setting max_connections = 2000 to "scale" your app tier will starve the box before any query runs.
The real answer is a pool between the app and the database:
# /etc/pgbouncer/pgbouncer.ini
[databases]
appdb = host=127.0.0.1 port=5432 dbname=appdb
[pgbouncer]
listen_addr = 0.0.0.0
listen_port = 6432
auth_type = scram-sha-256
auth_file = /etc/pgbouncer/userlist.txt
pool_mode = transaction # sessions for long-lived, transaction for web apps
max_client_conn = 2000 # the app sees this
default_pool_size = 25 # Postgres sees roughly this per db/user
reserve_pool_size = 5
server_idle_timeout = 600
SET, LISTEN/NOTIFY, advisory locks spanning transactions, prepared statements (pre-PG14). If your app uses them, use session mode or move those features out.
pg_hba.conf authentication
Read top to bottom. First matching row wins. Order matters.
# TYPE DATABASE USER ADDRESS METHOD
local all postgres peer
local all all scram-sha-256
host all all 127.0.0.1/32 scram-sha-256
host all all 10.0.0.0/16 scram-sha-256
hostssl appdb appuser 0.0.0.0/0 scram-sha-256
host replication repl 10.0.1.0/24 scram-sha-256
localis Unix-socket,hostis TCP (plaintext or TLS),hostsslrequires TLS.peertrusts the OS username (only forlocal). Thepostgressuperuser row above is what letssudo -u postgres psqlwork without a password.md5is legacy — usescram-sha-256everywhere. Setpassword_encryption = scram-sha-256inpostgresql.confand have users re-set their passwords once.
Roles, users, grants
-- A "user" is just a role with LOGIN.
CREATE ROLE app_rw LOGIN PASSWORD 'redacted';
CREATE ROLE app_ro LOGIN PASSWORD 'redacted';
CREATE ROLE app_group NOLOGIN;
-- Grant table/privilege membership via a group role, not to individuals.
GRANT app_group TO app_rw, app_ro;
-- The app database, owned by a dedicated role (never postgres).
CREATE DATABASE appdb OWNER app_group;
\c appdb
REVOKE ALL ON SCHEMA public FROM PUBLIC;
GRANT USAGE ON SCHEMA public TO app_group;
GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA public TO app_rw;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO app_ro;
-- And for tables that don't exist yet:
ALTER DEFAULT PRIVILEGES IN SCHEMA public
GRANT SELECT, INSERT, UPDATE, DELETE ON TABLES TO app_rw;
ALTER DEFAULT PRIVILEGES IN SCHEMA public
GRANT SELECT ON TABLES TO app_ro;
postgres. A compromised app process with superuser becomes a compromised database cluster.
Finding slow queries
Right now
SELECT pid, now() - query_start AS runtime, state, wait_event_type, wait_event,
left(query, 120) AS query
FROM pg_stat_activity
WHERE state != 'idle'
ORDER BY runtime DESC NULLS LAST
LIMIT 20;
-- Kill politely (cancel current statement, keep connection):
SELECT pg_cancel_backend(12345);
-- Kill hard (terminate backend):
SELECT pg_terminate_backend(12345);
On average
-- Needs shared_preload_libraries = 'pg_stat_statements' and a restart.
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
SELECT substr(query,1,80) AS q,
calls,
round(total_exec_time::numeric, 0) AS total_ms,
round(mean_exec_time::numeric, 1) AS mean_ms,
rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;
Actually reading a plan
EXPLAIN (ANALYZE, BUFFERS) SELECT ...;
-- ANALYZE actually runs the query; BUFFERS shows hit vs read (cache vs disk).
-- A sequential scan on a 10-row lookup table is fine. On a 10M-row table it's a bug.
ANALYZE, VACUUM, autovacuum
Postgres uses MVCC: updates leave dead row versions behind. VACUUM reclaims them; ANALYZE refreshes the planner's statistics. Autovacuum runs both, conservatively, on a schedule driven by per-table thresholds.
-- Inspect: what's old and bloated?
SELECT relname,
n_live_tup, n_dead_tup,
last_autovacuum, last_autoanalyze
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC
LIMIT 20;
On hot tables (write-heavy, millions of rows), global defaults are too lazy. Tune per-table:
ALTER TABLE orders SET (
autovacuum_vacuum_scale_factor = 0.02, -- vacuum at 2% dead tuples, not 20%
autovacuum_analyze_scale_factor = 0.01,
autovacuum_vacuum_cost_limit = 2000 -- let it go faster
);
SELECT datname, age(datfrozenxid) FROM pg_database;. Anything above 1 billion is a warning; 2 billion is an outage.
Replication: streaming and logical
Streaming (physical) replication
Byte-for-byte copy, entire cluster, read-only replica. The standard HA primitive.
-- On the primary:
CREATE ROLE repl WITH REPLICATION LOGIN PASSWORD 'redacted';
-- postgresql.conf
-- wal_level = replica
-- max_wal_senders = 10
-- max_replication_slots = 10
-- pg_hba.conf:
-- host replication repl 10.0.1.0/24 scram-sha-256
SELECT pg_create_physical_replication_slot('replica1');
# On the replica, as the postgres user:
sudo -u postgres pg_basebackup \
-h primary.db -U repl \
-D /var/lib/pgsql/16/data \
-Fp -Xs -P -R --slot=replica1
# The -R flag writes these for you into postgresql.auto.conf:
primary_conninfo = 'host=primary.db user=repl password=... application_name=replica1'
primary_slot_name = 'replica1'
# and creates an empty standby.signal file.
-- On the primary, inspect replication state:
SELECT client_addr, state, sync_state,
pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn) AS lag_bytes
FROM pg_stat_replication;
Logical replication
Table-level, SQL-level replay. Use for cross-version migrations, table subsets, or moving to a different schema.
-- Publisher (primary):
ALTER SYSTEM SET wal_level = 'logical'; -- needs restart
CREATE PUBLICATION pub_orders FOR TABLE orders, order_items;
-- Subscriber (other cluster):
CREATE SUBSCRIPTION sub_orders
CONNECTION 'host=primary.db dbname=appdb user=repl password=...'
PUBLICATION pub_orders;
-- Monitor:
SELECT subname, received_lsn, latest_end_lsn FROM pg_stat_subscription;
Failover & HA
Manual promotion
sudo -u postgres /usr/pgsql-16/bin/pg_ctl promote -D /var/lib/pgsql/16/data
# equivalently:
sudo -u postgres psql -c "SELECT pg_promote();"
Manual promotion is a decision a human should make when they understand why the primary is gone. Automated promotion is what you want for production, but it requires fencing, quorum, and a leader election you can trust.
| Tool | Niche | Gotchas |
|---|---|---|
| Patroni | etcd/consul/k8s-backed, widely used, good integration with HAProxy | Needs a solid DCS; split-brain if DCS flaps |
| repmgr | Lighter, no DCS, fine for small fleets | Witness node essential to avoid split-brain |
| pg_auto_failover | Opinionated, monitor+keeper model | One monitor is a SPOF unless you HA it |
Major version upgrades
Minor upgrades (16.3 → 16.4) are package upgrades and a restart. Major upgrades (15 → 16) change on-disk format. Three options:
pg_dumpall | psql— simple, slow, requires downtime proportional to data size.pg_upgrade --link— seconds to minutes, hard-links data files into the new cluster.- Logical replication — near-zero downtime, works cross-version, more moving parts.
# pg_upgrade with hard links. Both major versions must be installed.
sudo systemctl stop postgresql-15 postgresql-16
sudo -u postgres /usr/pgsql-16/bin/pg_upgrade \
--old-bindir=/usr/pgsql-15/bin \
--new-bindir=/usr/pgsql-16/bin \
--old-datadir=/var/lib/pgsql/15/data \
--new-datadir=/var/lib/pgsql/16/data \
--link \
--check # dry-run first; remove --check to actually migrate
sudo systemctl start postgresql-16
sudo -u postgres /usr/pgsql-16/bin/vacuumdb --all --analyze-in-stages
--link means the old cluster is not startable once upgrade succeeds — the files are now owned by the new cluster. Take a filesystem snapshot or backup first.
Common gotchas
| Symptom | Likely cause | Fix |
|---|---|---|
| "FATAL: sorry, too many clients" | App not using a pool | pgbouncer in transaction mode |
| Sudden massive WAL growth, disk fills | Unused replication slot kept lagging | SELECT pg_drop_replication_slot('name'); |
| Queries slow after a big import | Planner stats are stale | ANALYZE table; or wait for autovacuum |
| Seq scan where an index should apply | random_page_cost still 4.0 on SSD | Lower to 1.1 |
| "canceling statement due to conflict with recovery" on standby | Long read conflicts with replay | hot_standby_feedback = on or max_standby_streaming_delay |
| Locale/collation warning after restore | OS libc upgraded, indexes need rebuild | REINDEX DATABASE appdb; |
See also: Postgres Backup, Backup & Restore.