Ansible Performance
pipelining = true+ SSHControlMaster→ often 40–60% faster on large fleets.gathering = smart+ fact caching → fact collection drops from seconds per run to one-time per TTL.- Raise
forksfrom the default 5 to something that matches your runner (20–50 is typical). - Replace
with_itemsover hundreds of packages with one module call that takes a list. - Profile before you optimise:
ANSIBLE_CALLBACKS_ENABLED=profile_taskspoints at the real hot spots.
Where time actually goes
Before you tune anything, know what you are tuning. A plain ansible-playbook site.yml against 200 hosts spends time on:
- SSH handshake per task (worst case): TCP + key auth + SSH subsystem start. Order of 200–500 ms each.
- Python interpreter startup on the target for every module invocation. 100–300 ms.
- Module payload transfer: serialise module + args, write to tmpfile, execute, read back JSON. Dominated by disk/network on slow hosts.
- Fact gathering — runs
setup, which is a big module collecting hundreds of facts. 1–5 s per host. - Executor synchronisation — the controller waits for all hosts at each task (linear strategy).
Pipelining collapses (1) and (2) for most tasks. ControlMaster reuses the SSH connection. Fact caching eliminates (4) on subsequent runs. Strategy changes affect (5). Mitogen attacks (2) and (3).
Fact gathering and caching
By default, every play starts by running the setup module against every host. That's 1–5 s per host, and on a cold run most of it is wasted work.
Gather less
# At play level — use only the subsets you need
- hosts: web
gather_facts: true
gather_subset:
- '!all'
- '!min'
- network
- distribution
- os_family
Subsets: all, min (the cheapest mandatory set), hardware, network, virtual, facter, ohai, distribution, pkg_mgr, service_mgr, python, system, user. Prefix a subset with ! to exclude.
Gather smart
gathering = smart tells Ansible: "If we already have facts for this host and they haven't expired, don't re-gather them." Combined with caching:
# ansible.cfg
[defaults]
gathering = smart
fact_caching = jsonfile
fact_caching_connection = ~/.ansible/facts-cache
fact_caching_timeout = 7200
| Backend | When | Setup cost |
|---|---|---|
jsonfile | Single-controller, one engineer, laptop/runner | Zero — just a directory |
redis | Multi-runner CI, shared fact cache across jobs | A redis instance; set fact_caching_connection = host:port:db |
memcached | Same as redis; older Ansible deployments | Memcached; fact_caching_connection = server:port |
yaml | When you want to cat the cache; debug | Zero; slower to read than jsonfile for large fleets |
mongodb | Very large fleets, centralised analysis | Mongo instance; overkill for most shops |
Gather not at all
For plays that don't touch facts (pure file pushes to a known host set), turn gathering off entirely:
- hosts: edge
gather_facts: false
tasks:
- ansible.builtin.copy:
src: files/hosts.allow
dest: /etc/hosts.allow
mode: '0644'
A 200-host play with gather_facts: false can be twice as fast as one without — and you only pay for what you actually need.
SSH: pipelining, ControlMaster, ControlPersist
Pipelining
Without pipelining, Ansible SSHes to the host, mkdirs a temp dir, scps the module there, runs it, removes it. With pipelining, the module is streamed over the SSH pipe and executed by the remote Python directly — one round trip instead of four.
# ansible.cfg
[defaults]
pipelining = true
requiretty off in the target /etc/sudoers. On modern RHEL/Debian it is off by default, but locked-down images (CIS, STIG baselines) sometimes set it. Symptom: sudo: sorry, you must have a tty to run sudo. Fix: Defaults !requiretty in sudoers, or disable pipelining for that host.
ControlMaster / ControlPersist
OpenSSH's connection multiplexing. The first task opens the SSH connection; subsequent tasks against the same host reuse it. Without it, every task is a fresh handshake.
# ansible.cfg
[ssh_connection]
ssh_args = -o ControlMaster=auto -o ControlPersist=60s -o PreferredAuthentications=publickey
control_path_dir = ~/.ansible/cp
The socket goes under control_path_dir. On long-path-sensitive systems (macOS with encrypted home dirs, deep NFS) override it: some setups break when the socket path exceeds ~100 chars.
Typical improvement on a 20-task, 50-host play: wall time drops from ~8 min to ~3 min just from pipelining + ControlMaster.
Strategy plugins: linear, free, host_pinned
The strategy plugin controls when hosts proceed to the next task.
| Strategy | Behaviour | Use when |
|---|---|---|
linear (default) |
All hosts run task N, wait, then all run N+1. Log output is ordered. | Small fleets; plays with serial/max_fail_percentage; when you need coordinated rollouts. |
free |
Each host races to the end independently. Fast on hosts that finish quickly, slow hosts don't block the others. | Large heterogeneous fleets where tasks take varying time; no cross-host handlers or dependencies. |
host_pinned |
Like free, but each worker fork "owns" a host and finishes everything for it before picking up the next. Fewer connections in flight at once. |
Large fleets with strict rate limits (bastions, cloud API quotas). |
mitogen_linear / mitogen_free / mitogen_host_pinned |
The Mitogen variants, see below. | When you've installed the Mitogen strategy plugin. |
- hosts: many_hosts
strategy: free
tasks:
- ansible.builtin.package:
name: htop
state: present
free, task results land in the order hosts finish, not the order in the play. If you grep logs for task names, that's fine; if a human is reading over your shoulder, warn them.
Tuning forks
forks is the number of hosts being acted on in parallel. Default is 5 — way too low for modern work.
# ansible.cfg
[defaults]
forks = 30
Or per-run: ansible-playbook -f 50 site.yml.
How to pick a number
- Controller-bound: one Python process per fork, each running Jinja/facts/callbacks. 50 forks needs ~2–3 GB RAM on the controller.
- Target-bound: SSH connections and remote Python. If the network is the bottleneck, more forks = more contention.
- External APIs (cloud modules that
delegate_to: localhost): forks multiply your API rate. A 100-fork play with 100 hosts each making 10 API calls is 1000 API calls in a burst.
Start at forks = 20, profile, go from there. Past ~50 the wins flatten on most runners.
async / poll for long tasks
By default, a task blocks until complete. For long tasks — backups, large downloads, migrations — use async to fire and return:
# Kick off, don't wait
- name: Start the backup
ansible.builtin.command: /usr/local/bin/slow-backup
async: 3600 # max runtime in seconds
poll: 0 # don't poll; return immediately
register: backup
# Do other work here that doesn't depend on the backup...
- name: Wait for backup to finish
ansible.builtin.async_status:
jid: "{{ backup.ansible_job_id }}"
register: backup_result
until: backup_result.finished
retries: 120
delay: 30
Patterns:
poll: 0= fire-and-forget; check later withasync_status.poll: 5= poll every 5s, block until done. Use when you want async's long timeout but synchronous flow.asyncon a handler: occasionally useful for long-running restarts; setpoll:to a sane value or the handler "completes" before the restart is done.
poll: 0, then a single async_status loop with_items over the registered jids. Effectively gives you per-host task parallelism.
Loops done right
A loop: calls the module once per item. For a small list that's fine; for 300 packages it's 300 module invocations.
Bad: module-per-item
- name: Install packages (slow)
ansible.builtin.package:
name: "{{ item }}"
state: present
loop: "{{ packages }}"
Good: one module call with a list
- name: Install packages (one call)
ansible.builtin.package:
name: "{{ packages }}" # most package modules accept a list
state: present
This works for package, apt, dnf, yum, pip, firewalld (service: can be a list with a loop over services in one call), user (with loop: and a batch-friendly module — check the docs per-module).
When loop is unavoidable
Some modules really do need one call per item (e.g. authorized_key). Keep them, but consider:
- Render a file once with
templateand push that, rather than looping. - Use
assembleto concatenate fragments from a directory — idempotent and fast. - For
lineinfileover many lines, useblockinfileor (better) a full template.
loop:. with_items still works but lint will flag it; don't mix styles in one role.
Delegation patterns
run_once + delegate_to
"Run this task once across the whole batch, on a specific host." Classic uses: DB migrations, cache purges, LB config writes.
- name: Run DB migrations from the primary only
ansible.builtin.command: /usr/local/bin/myapp migrate
run_once: true
delegate_to: "{{ groups['db'] | first }}"
Without run_once, every host in the play would run it. Without delegate_to, run_once would run it on "a" host but you don't control which.
Fan-out then fan-in
- name: Per-host: compute something
ansible.builtin.command: /usr/local/bin/measure
register: measurement
- name: Aggregate on the controller
ansible.builtin.debug:
msg: "Total: {{ ansible_play_hosts | map('extract', hostvars, ['measurement','stdout']) | map('int') | sum }}"
run_once: true
delegate_to: localhost
Delegated facts
By default, facts gathered while delegated belong to the delegate. To attribute them to the original host:
- ansible.builtin.setup:
delegate_to: bastion.example.com
delegate_facts: true # facts about bastion? No — about THIS host as seen via the bastion
Mitogen
Mitogen is a third-party strategy plugin that replaces Ansible's default executor. It runs a persistent Python interpreter on the target and multiplexes module calls over a single connection, sidestepping pipelining and fork overhead. Reported wins: 2–4x on CPU-bound plays, sometimes more on huge fleets.
pip install mitogen
# ansible.cfg
[defaults]
strategy = mitogen_linear
strategy_plugins = /path/to/site-packages/ansible_mitogen/plugins/strategy
- Mitogen is unofficial. Every Ansible upgrade can break it; pin the combo.
- Certain modules and features (
async, some connection plugins,network_cli) are incompatible. - Security model differs: Mitogen's long-lived remote Python changes what "clean teardown after each task" means.
- Stack traces on failure are harder to read.
Measuring: profile_tasks, profile_roles, timer
Ansible ships callback plugins that print per-task / per-role timings. Enable one and rerun.
# ansible.cfg
[defaults]
callbacks_enabled = profile_tasks, profile_roles, timer
Or per-run:
ANSIBLE_CALLBACKS_ENABLED=profile_tasks,timer \
ansible-playbook -i inventories/dev site.yml
What each callback shows
| Callback | Shows | Useful for |
|---|---|---|
timer | Total wall-clock for the whole run | Baseline for regression tracking |
profile_tasks | Per-task duration, sorted longest-first at end of run | Finding the one task eating 90% of the time |
profile_roles | Per-role totals | Which role is the bottleneck (roll up tasks) |
Example tail of a run with profile_tasks:
Saturday 12 April 2026 14:02:41 +0000 (0:00:00.018) 0:03:12.447 *****
===============================================================================
db : Run migration ----------------------------------------------------- 42.13s
app : Install npm deps ------------------------------------------------- 28.94s
Gathering Facts -------------------------------------------------------- 14.22s
app : Compile assets --------------------------------------------------- 11.00s
nginx : Reload nginx (handler) ----------------------------------------- 03.51s
...
Now you know where to spend time. 42 s on migrations? Probably fine — it's really work. 14 s on gathering? Switch on fact caching. 28 s on npm? Cache node_modules or use a prebuilt image.
Per-host vs wall-clock
profile_tasks reports wall-clock for each task across all hosts. With strategy: linear that's the max over the batch (slowest host determines it). With strategy: free it's meaningless as a per-host signal. For per-host debugging, run against one host at a time and diff the timings.
Checklist
- [ ]
pipelining = trueinansible.cfg - [ ]
ssh_argsincludesControlMaster=auto ControlPersist=60s - [ ]
forksraised to at least 20, sized to controller RAM and runner network - [ ]
gathering = smartand afact_cachingbackend - [ ]
gather_facts: falseon plays that don't need facts - [ ]
gather_subsetused to trim fact collection where facts are needed - [ ] No
loop:over hundreds of packages — feed the module a list - [ ] Long tasks use
async+async_status - [ ] One-off coordination uses
run_once: true+delegate_to - [ ]
callbacks_enabled = profile_tasks, timerat least in CI - [ ] Wall-clock baseline tracked over time; regressions investigated
Related reading: Ansible Best Practices, Error Handling, Ansible Debugging, Inventory Patterns.