Ansible Performance

Where Ansible spends time, and the settings that actually move the needle: fact gathering, pipelining, SSH multiplexing, strategy plugins, forks, async, delegation, and profiling.

The top five wins
  • pipelining = true + SSH ControlMaster → often 40–60% faster on large fleets.
  • gathering = smart + fact caching → fact collection drops from seconds per run to one-time per TTL.
  • Raise forks from the default 5 to something that matches your runner (20–50 is typical).
  • Replace with_items over hundreds of packages with one module call that takes a list.
  • Profile before you optimise: ANSIBLE_CALLBACKS_ENABLED=profile_tasks points at the real hot spots.

Where time actually goes

Before you tune anything, know what you are tuning. A plain ansible-playbook site.yml against 200 hosts spends time on:

  1. SSH handshake per task (worst case): TCP + key auth + SSH subsystem start. Order of 200–500 ms each.
  2. Python interpreter startup on the target for every module invocation. 100–300 ms.
  3. Module payload transfer: serialise module + args, write to tmpfile, execute, read back JSON. Dominated by disk/network on slow hosts.
  4. Fact gathering — runs setup, which is a big module collecting hundreds of facts. 1–5 s per host.
  5. Executor synchronisation — the controller waits for all hosts at each task (linear strategy).

Pipelining collapses (1) and (2) for most tasks. ControlMaster reuses the SSH connection. Fact caching eliminates (4) on subsequent runs. Strategy changes affect (5). Mitogen attacks (2) and (3).

Fact gathering and caching

By default, every play starts by running the setup module against every host. That's 1–5 s per host, and on a cold run most of it is wasted work.

Gather less

# At play level — use only the subsets you need
- hosts: web
  gather_facts: true
  gather_subset:
    - '!all'
    - '!min'
    - network
    - distribution
    - os_family

Subsets: all, min (the cheapest mandatory set), hardware, network, virtual, facter, ohai, distribution, pkg_mgr, service_mgr, python, system, user. Prefix a subset with ! to exclude.

Gather smart

gathering = smart tells Ansible: "If we already have facts for this host and they haven't expired, don't re-gather them." Combined with caching:

# ansible.cfg
[defaults]
gathering = smart
fact_caching = jsonfile
fact_caching_connection = ~/.ansible/facts-cache
fact_caching_timeout = 7200
BackendWhenSetup cost
jsonfileSingle-controller, one engineer, laptop/runnerZero — just a directory
redisMulti-runner CI, shared fact cache across jobsA redis instance; set fact_caching_connection = host:port:db
memcachedSame as redis; older Ansible deploymentsMemcached; fact_caching_connection = server:port
yamlWhen you want to cat the cache; debugZero; slower to read than jsonfile for large fleets
mongodbVery large fleets, centralised analysisMongo instance; overkill for most shops

Gather not at all

For plays that don't touch facts (pure file pushes to a known host set), turn gathering off entirely:

- hosts: edge
  gather_facts: false
  tasks:
    - ansible.builtin.copy:
        src: files/hosts.allow
        dest: /etc/hosts.allow
        mode: '0644'

A 200-host play with gather_facts: false can be twice as fast as one without — and you only pay for what you actually need.

SSH: pipelining, ControlMaster, ControlPersist

Pipelining

Without pipelining, Ansible SSHes to the host, mkdirs a temp dir, scps the module there, runs it, removes it. With pipelining, the module is streamed over the SSH pipe and executed by the remote Python directly — one round trip instead of four.

# ansible.cfg
[defaults]
pipelining = true
requiretty. Pipelining needs requiretty off in the target /etc/sudoers. On modern RHEL/Debian it is off by default, but locked-down images (CIS, STIG baselines) sometimes set it. Symptom: sudo: sorry, you must have a tty to run sudo. Fix: Defaults !requiretty in sudoers, or disable pipelining for that host.

ControlMaster / ControlPersist

OpenSSH's connection multiplexing. The first task opens the SSH connection; subsequent tasks against the same host reuse it. Without it, every task is a fresh handshake.

# ansible.cfg
[ssh_connection]
ssh_args = -o ControlMaster=auto -o ControlPersist=60s -o PreferredAuthentications=publickey
control_path_dir = ~/.ansible/cp

The socket goes under control_path_dir. On long-path-sensitive systems (macOS with encrypted home dirs, deep NFS) override it: some setups break when the socket path exceeds ~100 chars.

Typical improvement on a 20-task, 50-host play: wall time drops from ~8 min to ~3 min just from pipelining + ControlMaster.

Strategy plugins: linear, free, host_pinned

The strategy plugin controls when hosts proceed to the next task.

StrategyBehaviourUse when
linear (default) All hosts run task N, wait, then all run N+1. Log output is ordered. Small fleets; plays with serial/max_fail_percentage; when you need coordinated rollouts.
free Each host races to the end independently. Fast on hosts that finish quickly, slow hosts don't block the others. Large heterogeneous fleets where tasks take varying time; no cross-host handlers or dependencies.
host_pinned Like free, but each worker fork "owns" a host and finishes everything for it before picking up the next. Fewer connections in flight at once. Large fleets with strict rate limits (bastions, cloud API quotas).
mitogen_linear / mitogen_free / mitogen_host_pinned The Mitogen variants, see below. When you've installed the Mitogen strategy plugin.
- hosts: many_hosts
  strategy: free
  tasks:
    - ansible.builtin.package:
        name: htop
        state: present
Output interleaves. With free, task results land in the order hosts finish, not the order in the play. If you grep logs for task names, that's fine; if a human is reading over your shoulder, warn them.

Tuning forks

forks is the number of hosts being acted on in parallel. Default is 5 — way too low for modern work.

# ansible.cfg
[defaults]
forks = 30

Or per-run: ansible-playbook -f 50 site.yml.

How to pick a number

Start at forks = 20, profile, go from there. Past ~50 the wins flatten on most runners.

async / poll for long tasks

By default, a task blocks until complete. For long tasks — backups, large downloads, migrations — use async to fire and return:

# Kick off, don't wait
- name: Start the backup
  ansible.builtin.command: /usr/local/bin/slow-backup
  async: 3600        # max runtime in seconds
  poll: 0            # don't poll; return immediately
  register: backup

# Do other work here that doesn't depend on the backup...

- name: Wait for backup to finish
  ansible.builtin.async_status:
    jid: "{{ backup.ansible_job_id }}"
  register: backup_result
  until: backup_result.finished
  retries: 120
  delay: 30

Patterns:

Parallel tasks on one host. Fire N tasks with poll: 0, then a single async_status loop with_items over the registered jids. Effectively gives you per-host task parallelism.

Loops done right

A loop: calls the module once per item. For a small list that's fine; for 300 packages it's 300 module invocations.

Bad: module-per-item

- name: Install packages (slow)
  ansible.builtin.package:
    name: "{{ item }}"
    state: present
  loop: "{{ packages }}"

Good: one module call with a list

- name: Install packages (one call)
  ansible.builtin.package:
    name: "{{ packages }}"     # most package modules accept a list
    state: present

This works for package, apt, dnf, yum, pip, firewalld (service: can be a list with a loop over services in one call), user (with loop: and a batch-friendly module — check the docs per-module).

When loop is unavoidable

Some modules really do need one call per item (e.g. authorized_key). Keep them, but consider:

with_items is legacy. Modern Ansible uses loop:. with_items still works but lint will flag it; don't mix styles in one role.

Delegation patterns

run_once + delegate_to

"Run this task once across the whole batch, on a specific host." Classic uses: DB migrations, cache purges, LB config writes.

- name: Run DB migrations from the primary only
  ansible.builtin.command: /usr/local/bin/myapp migrate
  run_once: true
  delegate_to: "{{ groups['db'] | first }}"

Without run_once, every host in the play would run it. Without delegate_to, run_once would run it on "a" host but you don't control which.

Fan-out then fan-in

- name: Per-host: compute something
  ansible.builtin.command: /usr/local/bin/measure
  register: measurement

- name: Aggregate on the controller
  ansible.builtin.debug:
    msg: "Total: {{ ansible_play_hosts | map('extract', hostvars, ['measurement','stdout']) | map('int') | sum }}"
  run_once: true
  delegate_to: localhost

Delegated facts

By default, facts gathered while delegated belong to the delegate. To attribute them to the original host:

- ansible.builtin.setup:
  delegate_to: bastion.example.com
  delegate_facts: true          # facts about bastion? No — about THIS host as seen via the bastion

Mitogen

Mitogen is a third-party strategy plugin that replaces Ansible's default executor. It runs a persistent Python interpreter on the target and multiplexes module calls over a single connection, sidestepping pipelining and fork overhead. Reported wins: 2–4x on CPU-bound plays, sometimes more on huge fleets.

pip install mitogen
# ansible.cfg
[defaults]
strategy = mitogen_linear
strategy_plugins = /path/to/site-packages/ansible_mitogen/plugins/strategy
Caveats.
  • Mitogen is unofficial. Every Ansible upgrade can break it; pin the combo.
  • Certain modules and features (async, some connection plugins, network_cli) are incompatible.
  • Security model differs: Mitogen's long-lived remote Python changes what "clean teardown after each task" means.
  • Stack traces on failure are harder to read.
Use it if you've measured a real problem that pipelining + ControlMaster didn't solve. Otherwise, the stock stack is fine.

Measuring: profile_tasks, profile_roles, timer

Ansible ships callback plugins that print per-task / per-role timings. Enable one and rerun.

# ansible.cfg
[defaults]
callbacks_enabled = profile_tasks, profile_roles, timer

Or per-run:

ANSIBLE_CALLBACKS_ENABLED=profile_tasks,timer \
  ansible-playbook -i inventories/dev site.yml

What each callback shows

CallbackShowsUseful for
timerTotal wall-clock for the whole runBaseline for regression tracking
profile_tasksPer-task duration, sorted longest-first at end of runFinding the one task eating 90% of the time
profile_rolesPer-role totalsWhich role is the bottleneck (roll up tasks)

Example tail of a run with profile_tasks:

Saturday 12 April 2026  14:02:41 +0000 (0:00:00.018)       0:03:12.447 *****
===============================================================================
db : Run migration ----------------------------------------------------- 42.13s
app : Install npm deps ------------------------------------------------- 28.94s
Gathering Facts -------------------------------------------------------- 14.22s
app : Compile assets --------------------------------------------------- 11.00s
nginx : Reload nginx (handler) ----------------------------------------- 03.51s
...

Now you know where to spend time. 42 s on migrations? Probably fine — it's really work. 14 s on gathering? Switch on fact caching. 28 s on npm? Cache node_modules or use a prebuilt image.

Per-host vs wall-clock

profile_tasks reports wall-clock for each task across all hosts. With strategy: linear that's the max over the batch (slowest host determines it). With strategy: free it's meaningless as a per-host signal. For per-host debugging, run against one host at a time and diff the timings.

Checklist

Related reading: Ansible Best Practices, Error Handling, Ansible Debugging, Inventory Patterns.