LVM Thin & Snapshots

Thin pools, thin volumes, overprovisioning math, autoextend, snapshot-based backups, rollback, and the failure modes that cause data loss.

Thin LVM heuristics

Overprovision with a plan, not by accident. Track actual usage (data_percent) and metadata (metadata_percent) on every pool.
Metadata exhaustion = data loss. Watch metadata_percent at least as closely as data_percent. Metadata is roughly 0.1–0.5% of data in normal use; grow it before it fills.
Autoextend is a safety belt, not a plan. thin_pool_autoextend_threshold = 80, thin_pool_autoextend_percent = 20 in /etc/lvm/lvm.conf.
Thick snapshots age into silent invalidation when their CoW area fills. Thin snapshots don't, but they do share metadata with the parent pool.
Snapshots are not backups. They protect against wrong-delete and enable consistent reads for backup tooling; they do not survive loss of the underlying VG.

On this page

Thin pool vs regular LV
Creating a thin pool and thin volumes
Overprovisioning math
Monitoring: lvs and systemd
Autoextend configuration
Thick snapshots
Thin snapshots
Snapshot-based backups
Rollback procedure
Online extension
Metadata repair
Gotchas and failure modes
Troubleshooting
Cross-reference

Thin pool vs regular LV

Regular (thick) LV: Allocates all its extents from the VG at create time. Snapshots reserve a separate copy-on-write area sized at create time.
Thin pool: A special LV containing a data sub-LV and a metadata sub-LV. Thin volumes carved from it allocate chunks only on first write.
Thin volume: A virtual-sized LV backed by a thin pool. Its virtual size can exceed the pool's physical size; real blocks are materialised on demand.
Chunk size: The allocation unit for the pool (default 64 KiB; 256 KiB or more is common for VMs/databases). Bigger chunks = less metadata, more internal fragmentation.

Use thin LVM when you want fast cheap snapshots (CI, dev environments, container overlays), when you genuinely cannot size volumes up-front, or when you need many similar volumes (VM images). Prefer thick LVs for single-purpose hosts where the cost of a pool exhaustion incident outweighs the flexibility.

Creating a thin pool and thin volumes

pvcreate /dev/sdb /dev/sdc
vgcreate data /dev/sdb /dev/sdc

# Create a 500G thin pool with explicit metadata size and chunk size
lvcreate -L 500G -T data/thinpool \
  --poolmetadatasize 1G \
  --chunksize 256K

# Carve a 200G thin volume (virtual)
lvcreate -V 200G -T data/thinpool -n app
mkfs.xfs /dev/data/app
mkdir -p /srv/app
mount /dev/data/app /srv/app

# Carve another, overprovisioned
lvcreate -V 400G -T data/thinpool -n db
mkfs.xfs /dev/data/db
mount /dev/data/db /srv/db

lvs -a -o +chunk_size,metadata_percent,data_percent

Sample lvs -a output:

LV              VG   Attr       LSize   Pool     Origin Data%  Meta%  Chunk
thinpool        data twi-aotz-- 500.00g                0.05   0.20   256.00k
[thinpool_tdata] data Twi-ao---- 500.00g                                   0
[thinpool_tmeta] data ewi-ao----   1.00g                                   0
app             data Vwi-aotz-- 200.00g thinpool        0.11                0
db              data Vwi-aotz-- 400.00g thinpool        0.00                0

Chunk-size picking. 64 KiB works for many small files; 256 KiB or 512 KiB is better for VM images and databases. You cannot change chunk size after the pool exists — plan it.

Overprovisioning math

Overprovisioning is the ratio of the sum of thin-volume virtual sizes to the pool's physical data size. A pool with 500 GiB of data and 2 TiB of thin volumes is 4× overprovisioned.

Term	Symbol	Notes
Pool data size	`D`	From `lvs`: the `_tdata` LV size
Pool metadata size	`M`	From `lvs`: the `_tmeta` LV size (typ. 0.1–0.5% of D)
Sum of thin virtual sizes	`V`	`awk '{s+=$4} END{print s}'` across thin volumes
Overprovision ratio	`V/D`	≥ 1 means some overprovisioning; 2–3× typical
Used data	`U`	`data_percent * D / 100`
Used metadata	`Um`	`metadata_percent * M / 100`

Rule-of-thumb sizing:

Metadata: start at 1 GiB per TiB of data with 256 KiB chunks. Grow when metadata_percent crosses 50%.
Data: keep data_percent comfortably below autoextend threshold (typically 80%). Treat 90% as page-the-oncall.
Overprovision: 1.5–3× is common. Beyond that, you must have both monitoring and headroom in the VG to extend into.

Do not overprovision without free extents in the VG. Autoextend cannot grow a pool past the VG's free space. The moment the VG is full, you depend on throttling writes (impossible in most situations) to avoid exhaustion.

Monitoring: lvs and systemd

# Headline numbers
lvs -a -o +metadata_percent,data_percent,chunk_size

# Full state: pool vs origins, attributes, flags
lvs -a -o lv_name,lv_attr,lv_size,pool_lv,origin,data_percent,metadata_percent

# Also check for "out of metadata" flag
lvs -o lv_name,lv_attr | grep -E 'D|M'        # a 'D' in attr = thin pool data space exhausted

dmsetup status                                # kernel view of thin targets

The kernel exposes a dmeventd-driven lvm2-monitor service that enforces autoextend thresholds. Keep it enabled on every host:

systemctl enable --now lvm2-monitor
systemctl status lvm2-monitor
journalctl -u dm-event -u lvm2-monitor --since today

A simple host-level alerting loop (Prometheus node_exporter textfile, cron, or Ansible-delivered timer):

#!/bin/bash
# /usr/local/sbin/lvm-thin-probe.sh
set -euo pipefail
lvs --noheadings --units b -o vg_name,lv_name,lv_attr,data_percent,metadata_percent 2>/dev/null \
  | awk '$3 ~ /^t/ {
        gsub("%", "", $4); gsub("%", "", $5);
        if ($4+0 > 85 || $5+0 > 50)
          printf "ALERT %s/%s data=%s%% meta=%s%%\n", $1, $2, $4, $5
    }'

Autoextend configuration

Autoextend grows the pool (data and/or metadata) when usage crosses a threshold. Configure in /etc/lvm/lvm.conf under [activation]. Default on RHEL-family is 100/0 which effectively disables it — change this.

# /etc/lvm/lvm.conf (activation {})

activation {
    thin_pool_autoextend_threshold = 80
    thin_pool_autoextend_percent   = 20

    # Optional: metadata pool auto-extend (modern LVM)
    # thin_pool_autoextend_threshold applies to both data and metadata
    # when the pool is monitored.
}

lvmconfig --type current activation/thin_pool_autoextend_threshold
lvmconfig --type current activation/thin_pool_autoextend_percent

# Per-pool overrides (not common, but possible via lvchange)
lvchange --errorwhenfull y data/thinpool     # fail I/O instead of hanging when pool full
lvchange --monitor y data/thinpool           # ensure dmeventd is watching

--errorwhenfull y is safer than the default. By default a full pool blocks writes (the process hangs in D state). With --errorwhenfull y the kernel returns ENOSPC to the filesystem, which is usually handled more gracefully than an indefinite freeze.

Thick snapshots

A thick snapshot allocates a fixed CoW region. Once that region fills, the snapshot is invalidated (unusable). They are cheap for short-lived consistency points (backup of a quiesced DB) but expensive to keep open on busy volumes.

# Create: -s for snapshot, -L for CoW area, -n for name
lvcreate -s -L 20G -n db-snap /dev/data/db-thick

# Check CoW usage
lvs -a -o +origin,data_percent | grep db-snap

# When done (or automatically on invalidation), remove
lvremove -f /dev/data/db-snap

Sizing the CoW: during the snapshot's lifetime you need one chunk per modified block in the origin. For a database taking 5 minutes of backup with ~5% of blocks changing, 5–10% of origin size is usually safe. Too small = invalidation; too big = wasted VG space.

Thin snapshots

Thin snapshots are first-class thin LVs sharing chunks with the origin. They cost almost nothing at create time; writes to either origin or snapshot allocate from the pool. They do not invalidate when "full" — they share the same space accounting as the pool.

# Create
lvcreate -s -n app-snap-preupgrade /dev/data/app

# Mount (note: thin snapshots are writable by default; use -pr for read-only)
lvcreate -s -pr -n app-ro-snap /dev/data/app
mkdir -p /mnt/app-snap
mount -o ro /dev/data/app-ro-snap /mnt/app-snap

lvs -a -o +origin,data_percent,metadata_percent | grep app
lvremove -f /dev/data/app-snap-preupgrade

Read-only at mount ≠ read-only LV. Mounting with -o ro is a filesystem layer; the LV itself may still accept writes. Use lvcreate -s -pr (permissions read-only) for immutable snapshots.

Snapshot-based backups

The value of a snapshot for backup is a stable point-in-time image while the application keeps running. The pattern:

Quiesce the application just enough for a consistent on-disk state (DB: FLUSH TABLES WITH READ LOCK, pg_start_backup / low-level backup mode, or use a filesystem-freeze).
Create a snapshot.
Release the application.
Mount the snapshot read-only, stream to backup target, unmount, remove the snapshot.

#!/bin/bash
# /usr/local/sbin/thin-backup.sh
set -euo pipefail
VG=data
ORIGIN=db
SNAP="${ORIGIN}-bak-$(date +%Y%m%dT%H%M%S)"
MOUNT="/mnt/${SNAP}"

fsfreeze -f "/srv/${ORIGIN}"
lvcreate -s -pr -n "$SNAP" "/dev/${VG}/${ORIGIN}"
fsfreeze -u "/srv/${ORIGIN}"

mkdir -p "$MOUNT"
mount -o ro,noload "/dev/${VG}/${SNAP}" "$MOUNT"

restic -r "$BACKUP_REPO" backup --tag "lvm-snap" "$MOUNT"

umount "$MOUNT"
rmdir "$MOUNT"
lvremove -fy "/dev/${VG}/${SNAP}"

XFS and noload. Mounting an XFS snapshot containing an unclean log requires -o ro,norecovery. Since snapshots of a live filesystem almost always have a dirty log, this flag is routine. Ext4's equivalent is -o ro,noload.

Rollback procedure

Rolling back an origin to a snapshot is called merge in LVM. It works for both thick and thin snapshots.

# Create a pre-change snapshot
lvcreate -s -n app-pre /dev/data/app

# ... change goes wrong ...

# Unmount the origin first (rollback can't merge while it's open read-write)
umount /srv/app
lvconvert --merge /dev/data/app-pre

# If the origin is open, the merge is deferred to next activation.
# For thin volumes, you often have to deactivate + reactivate:
lvchange -an data/app
lvchange -ay data/app
mount /dev/data/app /srv/app

After the merge completes, the snapshot LV is removed automatically.
You can monitor progress with lvs -a -o +seg_pe_ranges,progress on the merging origin (some LVM versions use different column names).
Rolling back a thin origin does not free pool space that was shared with other snapshots — accounting gets subtle; watch data_percent.

Online extension

# Grow the pool's data area by 200G
lvextend -L +200G data/thinpool

# Grow the pool's metadata area by 512M
lvextend --poolmetadatasize +512M data/thinpool

# Grow a thin volume and then the filesystem in one step
lvextend -L +50G --resizefs /dev/data/app        # ext4/xfs both supported

# Or step-by-step
lvextend -L +50G /dev/data/app
xfs_growfs /srv/app
# or: resize2fs /dev/data/app

# After extending, verify
lvs -a -o +data_percent,metadata_percent
df -h /srv/app

XFS cannot shrink. Growing is online and trivial; shrinking an XFS filesystem requires xfsdump/mkfs.xfs/xfsrestore. Size with some thought; overprovisioning hides sizing mistakes but doesn't fix them.

Metadata repair

If a pool's metadata is corrupt (unexpected power loss, kernel bug, storage problem), LVM refuses to activate it and you see thin-pool: no free metadata space or Thin metadata device has insufficient space.

# Deactivate the pool
lvchange -an data/thinpool

# Dump and repair metadata (modern LVM wraps thin_repair for you)
lvconvert --repair data/thinpool

# After repair, activate and inspect
lvchange -ay data/thinpool
lvs -a -o +metadata_percent,data_percent
dmesg | grep -i thin

--repair does a round trip through a spare metadata LV. Keep the VG with at least metadata-size worth of free extents or the repair cannot run.

Gotchas and failure modes

Metadata exhaustion: the #1 cause of unrecoverable thin pools. Autoextend helps but only if the VG has free extents. If metadata fills and you cannot extend in time, the pool may need offline thin_repair with good odds of data loss.
Data exhaustion with default behaviour: writes hang in D state. Combined with default systemd journal on the same pool, you lose logging and SSH responsiveness.
Discards propagation: ensure issue_discards = 1 in lvm.conf and -o discard on filesystems that can benefit (XFS with appropriate kernel version), otherwise deleting files does not return chunks to the pool.
Snapshot churn on CI pools: hundreds of short-lived thin snapshots can saturate metadata faster than data. Size metadata for the workload, not just the data volume.
Encryption layers: LUKS below LVM hides discards by default; pass discard in /etc/crypttab to let deletes reach the pool.
Cloning a VM with thin LV inside a thin LV: double CoW blows up amplification. Prefer file-based VM images on thick XFS for simple setups, or a single layer of thin if you want snapshots.
Mixed chunk sizes: you cannot create a thin pool from two VGs with different chunk settings. Plan chunk size per pool up-front.
RAID underneath: a thin pool on top of mdraid or lvmraid is fine; note that resync traffic counts against physical I/O but not against data_percent.

Troubleshooting

Symptom	Cause	Fix
Writes hang; `dmesg` says `thin-pool: reached low water mark`	Pool near exhaustion	`lvextend -L +N data/thinpool`; ensure autoextend is enabled; consider `--errorwhenfull y`
`thin-pool: no free metadata space`	Metadata full	`lvextend --poolmetadatasize +1G data/thinpool`; if too late, offline `lvconvert --repair`
Pool attr shows `D` (data exhausted) or `M` (metadata)	Pool flagged unhealthy by dmeventd	Extend the relevant area; `lvchange --monitor y` after
Autoextend never fires	`lvm2-monitor` disabled, or thresholds at defaults (100%)	`systemctl enable --now lvm2-monitor`; set `thin_pool_autoextend_threshold = 80`
Snapshot activation fails after reboot	Merge was pending; origin was active at boot	`lvchange -an` the origin, then `lvchange -ay`; `lvs -a` should show the snap gone
XFS snapshot mount fails with `needs recovery`	Dirty log, as expected	Mount with `-o ro,norecovery`; never run recovery on a snapshot you plan to discard
Deleted files don't shrink `data_percent`	Discards not propagated	Check `issue_discards = 1` in `lvm.conf`; `fstrim -av`; verify LUKS `discard` if present
Rollback merge refuses to start	Origin is mounted; merge deferred	Unmount origin, `lvchange -an`, `lvchange -ay`; merge completes during activation
Chunk size "too small" warning at pool creation	Virtual size vs chunk size exceeds metadata limits	Pick a larger chunk (256K/512K) or a larger metadata LV; reconsider overprovision ratio
After snapshot removed, space still held	Filesystem never trimmed; chunks retained pending discard	`fstrim -v /mountpoint`; confirm `lvs -a -o +data_percent` drops afterwards

Cross-reference

LVM — PV/VG/LV fundamentals and thick-LV workflow.
Backup & Restore — where snapshot-based flows fit into your overall backup plan.
Postgres backup and MySQL backup — quiescing DBs before taking snapshots.
sysctl tuning — vm.dirty_*, elevator, and I/O knobs that interact with thin pools under load.
SSSD — unrelated, but often deployed on the same EL hosts that carry LVM thin pools.
DR Runbook Template — incorporate pool-exhaustion recovery into your DR plan.