LVM Thin & Snapshots

Thin pools, thin volumes, overprovisioning math, autoextend, snapshot-based backups, rollback, and the failure modes that cause data loss.

Thin LVM heuristics
  • Overprovision with a plan, not by accident. Track actual usage (data_percent) and metadata (metadata_percent) on every pool.
  • Metadata exhaustion = data loss. Watch metadata_percent at least as closely as data_percent. Metadata is roughly 0.1–0.5% of data in normal use; grow it before it fills.
  • Autoextend is a safety belt, not a plan. thin_pool_autoextend_threshold = 80, thin_pool_autoextend_percent = 20 in /etc/lvm/lvm.conf.
  • Thick snapshots age into silent invalidation when their CoW area fills. Thin snapshots don't, but they do share metadata with the parent pool.
  • Snapshots are not backups. They protect against wrong-delete and enable consistent reads for backup tooling; they do not survive loss of the underlying VG.

Thin pool vs regular LV

Regular (thick) LV
Allocates all its extents from the VG at create time. Snapshots reserve a separate copy-on-write area sized at create time.
Thin pool
A special LV containing a data sub-LV and a metadata sub-LV. Thin volumes carved from it allocate chunks only on first write.
Thin volume
A virtual-sized LV backed by a thin pool. Its virtual size can exceed the pool's physical size; real blocks are materialised on demand.
Chunk size
The allocation unit for the pool (default 64 KiB; 256 KiB or more is common for VMs/databases). Bigger chunks = less metadata, more internal fragmentation.

Use thin LVM when you want fast cheap snapshots (CI, dev environments, container overlays), when you genuinely cannot size volumes up-front, or when you need many similar volumes (VM images). Prefer thick LVs for single-purpose hosts where the cost of a pool exhaustion incident outweighs the flexibility.

Creating a thin pool and thin volumes

pvcreate /dev/sdb /dev/sdc
vgcreate data /dev/sdb /dev/sdc

# Create a 500G thin pool with explicit metadata size and chunk size
lvcreate -L 500G -T data/thinpool \
  --poolmetadatasize 1G \
  --chunksize 256K

# Carve a 200G thin volume (virtual)
lvcreate -V 200G -T data/thinpool -n app
mkfs.xfs /dev/data/app
mkdir -p /srv/app
mount /dev/data/app /srv/app

# Carve another, overprovisioned
lvcreate -V 400G -T data/thinpool -n db
mkfs.xfs /dev/data/db
mount /dev/data/db /srv/db

lvs -a -o +chunk_size,metadata_percent,data_percent

Sample lvs -a output:

LV              VG   Attr       LSize   Pool     Origin Data%  Meta%  Chunk
thinpool        data twi-aotz-- 500.00g                0.05   0.20   256.00k
[thinpool_tdata] data Twi-ao---- 500.00g                                   0
[thinpool_tmeta] data ewi-ao----   1.00g                                   0
app             data Vwi-aotz-- 200.00g thinpool        0.11                0
db              data Vwi-aotz-- 400.00g thinpool        0.00                0
Chunk-size picking. 64 KiB works for many small files; 256 KiB or 512 KiB is better for VM images and databases. You cannot change chunk size after the pool exists — plan it.

Overprovisioning math

Overprovisioning is the ratio of the sum of thin-volume virtual sizes to the pool's physical data size. A pool with 500 GiB of data and 2 TiB of thin volumes is 4× overprovisioned.

TermSymbolNotes
Pool data sizeDFrom lvs: the _tdata LV size
Pool metadata sizeMFrom lvs: the _tmeta LV size (typ. 0.1–0.5% of D)
Sum of thin virtual sizesVawk '{s+=$4} END{print s}' across thin volumes
Overprovision ratioV/D≥ 1 means some overprovisioning; 2–3× typical
Used dataUdata_percent * D / 100
Used metadataUmmetadata_percent * M / 100

Rule-of-thumb sizing:

Do not overprovision without free extents in the VG. Autoextend cannot grow a pool past the VG's free space. The moment the VG is full, you depend on throttling writes (impossible in most situations) to avoid exhaustion.

Monitoring: lvs and systemd

# Headline numbers
lvs -a -o +metadata_percent,data_percent,chunk_size

# Full state: pool vs origins, attributes, flags
lvs -a -o lv_name,lv_attr,lv_size,pool_lv,origin,data_percent,metadata_percent

# Also check for "out of metadata" flag
lvs -o lv_name,lv_attr | grep -E 'D|M'        # a 'D' in attr = thin pool data space exhausted

dmsetup status                                # kernel view of thin targets

The kernel exposes a dmeventd-driven lvm2-monitor service that enforces autoextend thresholds. Keep it enabled on every host:

systemctl enable --now lvm2-monitor
systemctl status lvm2-monitor
journalctl -u dm-event -u lvm2-monitor --since today

A simple host-level alerting loop (Prometheus node_exporter textfile, cron, or Ansible-delivered timer):

#!/bin/bash
# /usr/local/sbin/lvm-thin-probe.sh
set -euo pipefail
lvs --noheadings --units b -o vg_name,lv_name,lv_attr,data_percent,metadata_percent 2>/dev/null \
  | awk '$3 ~ /^t/ {
        gsub("%", "", $4); gsub("%", "", $5);
        if ($4+0 > 85 || $5+0 > 50)
          printf "ALERT %s/%s data=%s%% meta=%s%%\n", $1, $2, $4, $5
    }'

Autoextend configuration

Autoextend grows the pool (data and/or metadata) when usage crosses a threshold. Configure in /etc/lvm/lvm.conf under [activation]. Default on RHEL-family is 100/0 which effectively disables it — change this.

# /etc/lvm/lvm.conf (activation {})

activation {
    thin_pool_autoextend_threshold = 80
    thin_pool_autoextend_percent   = 20

    # Optional: metadata pool auto-extend (modern LVM)
    # thin_pool_autoextend_threshold applies to both data and metadata
    # when the pool is monitored.
}
lvmconfig --type current activation/thin_pool_autoextend_threshold
lvmconfig --type current activation/thin_pool_autoextend_percent

# Per-pool overrides (not common, but possible via lvchange)
lvchange --errorwhenfull y data/thinpool     # fail I/O instead of hanging when pool full
lvchange --monitor y data/thinpool           # ensure dmeventd is watching
--errorwhenfull y is safer than the default. By default a full pool blocks writes (the process hangs in D state). With --errorwhenfull y the kernel returns ENOSPC to the filesystem, which is usually handled more gracefully than an indefinite freeze.

Thick snapshots

A thick snapshot allocates a fixed CoW region. Once that region fills, the snapshot is invalidated (unusable). They are cheap for short-lived consistency points (backup of a quiesced DB) but expensive to keep open on busy volumes.

# Create: -s for snapshot, -L for CoW area, -n for name
lvcreate -s -L 20G -n db-snap /dev/data/db-thick

# Check CoW usage
lvs -a -o +origin,data_percent | grep db-snap

# When done (or automatically on invalidation), remove
lvremove -f /dev/data/db-snap

Sizing the CoW: during the snapshot's lifetime you need one chunk per modified block in the origin. For a database taking 5 minutes of backup with ~5% of blocks changing, 5–10% of origin size is usually safe. Too small = invalidation; too big = wasted VG space.

Thin snapshots

Thin snapshots are first-class thin LVs sharing chunks with the origin. They cost almost nothing at create time; writes to either origin or snapshot allocate from the pool. They do not invalidate when "full" — they share the same space accounting as the pool.

# Create
lvcreate -s -n app-snap-preupgrade /dev/data/app

# Mount (note: thin snapshots are writable by default; use -pr for read-only)
lvcreate -s -pr -n app-ro-snap /dev/data/app
mkdir -p /mnt/app-snap
mount -o ro /dev/data/app-ro-snap /mnt/app-snap

lvs -a -o +origin,data_percent,metadata_percent | grep app
lvremove -f /dev/data/app-snap-preupgrade
Read-only at mount ≠ read-only LV. Mounting with -o ro is a filesystem layer; the LV itself may still accept writes. Use lvcreate -s -pr (permissions read-only) for immutable snapshots.

Snapshot-based backups

The value of a snapshot for backup is a stable point-in-time image while the application keeps running. The pattern:

  1. Quiesce the application just enough for a consistent on-disk state (DB: FLUSH TABLES WITH READ LOCK, pg_start_backup / low-level backup mode, or use a filesystem-freeze).
  2. Create a snapshot.
  3. Release the application.
  4. Mount the snapshot read-only, stream to backup target, unmount, remove the snapshot.
#!/bin/bash
# /usr/local/sbin/thin-backup.sh
set -euo pipefail
VG=data
ORIGIN=db
SNAP="${ORIGIN}-bak-$(date +%Y%m%dT%H%M%S)"
MOUNT="/mnt/${SNAP}"

fsfreeze -f "/srv/${ORIGIN}"
lvcreate -s -pr -n "$SNAP" "/dev/${VG}/${ORIGIN}"
fsfreeze -u "/srv/${ORIGIN}"

mkdir -p "$MOUNT"
mount -o ro,noload "/dev/${VG}/${SNAP}" "$MOUNT"

restic -r "$BACKUP_REPO" backup --tag "lvm-snap" "$MOUNT"

umount "$MOUNT"
rmdir "$MOUNT"
lvremove -fy "/dev/${VG}/${SNAP}"
XFS and noload. Mounting an XFS snapshot containing an unclean log requires -o ro,norecovery. Since snapshots of a live filesystem almost always have a dirty log, this flag is routine. Ext4's equivalent is -o ro,noload.

Rollback procedure

Rolling back an origin to a snapshot is called merge in LVM. It works for both thick and thin snapshots.

# Create a pre-change snapshot
lvcreate -s -n app-pre /dev/data/app

# ... change goes wrong ...

# Unmount the origin first (rollback can't merge while it's open read-write)
umount /srv/app
lvconvert --merge /dev/data/app-pre

# If the origin is open, the merge is deferred to next activation.
# For thin volumes, you often have to deactivate + reactivate:
lvchange -an data/app
lvchange -ay data/app
mount /dev/data/app /srv/app

Online extension

# Grow the pool's data area by 200G
lvextend -L +200G data/thinpool

# Grow the pool's metadata area by 512M
lvextend --poolmetadatasize +512M data/thinpool

# Grow a thin volume and then the filesystem in one step
lvextend -L +50G --resizefs /dev/data/app        # ext4/xfs both supported

# Or step-by-step
lvextend -L +50G /dev/data/app
xfs_growfs /srv/app
# or: resize2fs /dev/data/app

# After extending, verify
lvs -a -o +data_percent,metadata_percent
df -h /srv/app
XFS cannot shrink. Growing is online and trivial; shrinking an XFS filesystem requires xfsdump/mkfs.xfs/xfsrestore. Size with some thought; overprovisioning hides sizing mistakes but doesn't fix them.

Metadata repair

If a pool's metadata is corrupt (unexpected power loss, kernel bug, storage problem), LVM refuses to activate it and you see thin-pool: no free metadata space or Thin metadata device has insufficient space.

# Deactivate the pool
lvchange -an data/thinpool

# Dump and repair metadata (modern LVM wraps thin_repair for you)
lvconvert --repair data/thinpool

# After repair, activate and inspect
lvchange -ay data/thinpool
lvs -a -o +metadata_percent,data_percent
dmesg | grep -i thin

--repair does a round trip through a spare metadata LV. Keep the VG with at least metadata-size worth of free extents or the repair cannot run.

Gotchas and failure modes

Troubleshooting

SymptomCauseFix
Writes hang; dmesg says thin-pool: reached low water mark Pool near exhaustion lvextend -L +N data/thinpool; ensure autoextend is enabled; consider --errorwhenfull y
thin-pool: no free metadata space Metadata full lvextend --poolmetadatasize +1G data/thinpool; if too late, offline lvconvert --repair
Pool attr shows D (data exhausted) or M (metadata) Pool flagged unhealthy by dmeventd Extend the relevant area; lvchange --monitor y after
Autoextend never fires lvm2-monitor disabled, or thresholds at defaults (100%) systemctl enable --now lvm2-monitor; set thin_pool_autoextend_threshold = 80
Snapshot activation fails after reboot Merge was pending; origin was active at boot lvchange -an the origin, then lvchange -ay; lvs -a should show the snap gone
XFS snapshot mount fails with needs recovery Dirty log, as expected Mount with -o ro,norecovery; never run recovery on a snapshot you plan to discard
Deleted files don't shrink data_percent Discards not propagated Check issue_discards = 1 in lvm.conf; fstrim -av; verify LUKS discard if present
Rollback merge refuses to start Origin is mounted; merge deferred Unmount origin, lvchange -an, lvchange -ay; merge completes during activation
Chunk size "too small" warning at pool creation Virtual size vs chunk size exceeds metadata limits Pick a larger chunk (256K/512K) or a larger metadata LV; reconsider overprovision ratio
After snapshot removed, space still held Filesystem never trimmed; chunks retained pending discard fstrim -v /mountpoint; confirm lvs -a -o +data_percent drops afterwards

Cross-reference