Ansible Performance

Where Ansible spends time, and the settings that actually move the needle: fact gathering, pipelining, SSH multiplexing, strategy plugins, forks, async, delegation, and profiling.

The top five wins

pipelining = true + SSH ControlMaster → often 40–60% faster on large fleets.
gathering = smart + fact caching → fact collection drops from seconds per run to one-time per TTL.
Raise forks from the default 5 to something that matches your runner (20–50 is typical).
Replace with_items over hundreds of packages with one module call that takes a list.
Profile before you optimise: ANSIBLE_CALLBACKS_ENABLED=profile_tasks points at the real hot spots.

On this page

Where time actually goes
Fact gathering and caching
SSH: pipelining, ControlMaster, ControlPersist
Strategy plugins: linear, free, host_pinned
Tuning forks
async / poll for long tasks
Loops done right
Delegation patterns
Mitogen
Measuring: profile_tasks, profile_roles, timer
Checklist

Where time actually goes

Before you tune anything, know what you are tuning. A plain ansible-playbook site.yml against 200 hosts spends time on:

SSH handshake per task (worst case): TCP + key auth + SSH subsystem start. Order of 200–500 ms each.
Python interpreter startup on the target for every module invocation. 100–300 ms.
Module payload transfer: serialise module + args, write to tmpfile, execute, read back JSON. Dominated by disk/network on slow hosts.
Fact gathering — runs setup, which is a big module collecting hundreds of facts. 1–5 s per host.
Executor synchronisation — the controller waits for all hosts at each task (linear strategy).

Pipelining collapses (1) and (2) for most tasks. ControlMaster reuses the SSH connection. Fact caching eliminates (4) on subsequent runs. Strategy changes affect (5). Mitogen attacks (2) and (3).

Fact gathering and caching

By default, every play starts by running the setup module against every host. That's 1–5 s per host, and on a cold run most of it is wasted work.

Gather less

# At play level — use only the subsets you need
- hosts: web
  gather_facts: true
  gather_subset:
    - '!all'
    - '!min'
    - network
    - distribution
    - os_family

Subsets: all, min (the cheapest mandatory set), hardware, network, virtual, facter, ohai, distribution, pkg_mgr, service_mgr, python, system, user. Prefix a subset with ! to exclude.

Gather smart

gathering = smart tells Ansible: "If we already have facts for this host and they haven't expired, don't re-gather them." Combined with caching:

# ansible.cfg
[defaults]
gathering = smart
fact_caching = jsonfile
fact_caching_connection = ~/.ansible/facts-cache
fact_caching_timeout = 7200

Backend	When	Setup cost
`jsonfile`	Single-controller, one engineer, laptop/runner	Zero — just a directory
`redis`	Multi-runner CI, shared fact cache across jobs	A redis instance; set `fact_caching_connection = host:port:db`
`memcached`	Same as redis; older Ansible deployments	Memcached; `fact_caching_connection = server:port`
`yaml`	When you want to cat the cache; debug	Zero; slower to read than jsonfile for large fleets
`mongodb`	Very large fleets, centralised analysis	Mongo instance; overkill for most shops

Gather not at all

For plays that don't touch facts (pure file pushes to a known host set), turn gathering off entirely:

- hosts: edge
  gather_facts: false
  tasks:
    - ansible.builtin.copy:
        src: files/hosts.allow
        dest: /etc/hosts.allow
        mode: '0644'

A 200-host play with gather_facts: false can be twice as fast as one without — and you only pay for what you actually need.

SSH: pipelining, ControlMaster, ControlPersist

Pipelining

Without pipelining, Ansible SSHes to the host, mkdirs a temp dir, scps the module there, runs it, removes it. With pipelining, the module is streamed over the SSH pipe and executed by the remote Python directly — one round trip instead of four.

# ansible.cfg
[defaults]
pipelining = true

requiretty. Pipelining needs requiretty off in the target /etc/sudoers. On modern RHEL/Debian it is off by default, but locked-down images (CIS, STIG baselines) sometimes set it. Symptom: sudo: sorry, you must have a tty to run sudo. Fix: Defaults !requiretty in sudoers, or disable pipelining for that host.

ControlMaster / ControlPersist

OpenSSH's connection multiplexing. The first task opens the SSH connection; subsequent tasks against the same host reuse it. Without it, every task is a fresh handshake.

# ansible.cfg
[ssh_connection]
ssh_args = -o ControlMaster=auto -o ControlPersist=60s -o PreferredAuthentications=publickey
control_path_dir = ~/.ansible/cp

The socket goes under control_path_dir. On long-path-sensitive systems (macOS with encrypted home dirs, deep NFS) override it: some setups break when the socket path exceeds ~100 chars.

Typical improvement on a 20-task, 50-host play: wall time drops from ~8 min to ~3 min just from pipelining + ControlMaster.

Strategy plugins: linear, free, host_pinned

The strategy plugin controls when hosts proceed to the next task.

Strategy	Behaviour	Use when
`linear` (default)	All hosts run task N, wait, then all run N+1. Log output is ordered.	Small fleets; plays with `serial`/`max_fail_percentage`; when you need coordinated rollouts.
`free`	Each host races to the end independently. Fast on hosts that finish quickly, slow hosts don't block the others.	Large heterogeneous fleets where tasks take varying time; no cross-host handlers or dependencies.
`host_pinned`	Like `free`, but each worker fork "owns" a host and finishes everything for it before picking up the next. Fewer connections in flight at once.	Large fleets with strict rate limits (bastions, cloud API quotas).
`mitogen_linear` / `mitogen_free` / `mitogen_host_pinned`	The Mitogen variants, see below.	When you've installed the Mitogen strategy plugin.

- hosts: many_hosts
  strategy: free
  tasks:
    - ansible.builtin.package:
        name: htop
        state: present

Output interleaves. With free, task results land in the order hosts finish, not the order in the play. If you grep logs for task names, that's fine; if a human is reading over your shoulder, warn them.

Tuning forks

forks is the number of hosts being acted on in parallel. Default is 5 — way too low for modern work.

# ansible.cfg
[defaults]
forks = 30

Or per-run: ansible-playbook -f 50 site.yml.

How to pick a number

Controller-bound: one Python process per fork, each running Jinja/facts/callbacks. 50 forks needs ~2–3 GB RAM on the controller.
Target-bound: SSH connections and remote Python. If the network is the bottleneck, more forks = more contention.
External APIs (cloud modules that delegate_to: localhost): forks multiply your API rate. A 100-fork play with 100 hosts each making 10 API calls is 1000 API calls in a burst.

Start at forks = 20, profile, go from there. Past ~50 the wins flatten on most runners.

async / poll for long tasks

By default, a task blocks until complete. For long tasks — backups, large downloads, migrations — use async to fire and return:

# Kick off, don't wait
- name: Start the backup
  ansible.builtin.command: /usr/local/bin/slow-backup
  async: 3600        # max runtime in seconds
  poll: 0            # don't poll; return immediately
  register: backup

# Do other work here that doesn't depend on the backup...

- name: Wait for backup to finish
  ansible.builtin.async_status:
    jid: "{{ backup.ansible_job_id }}"
  register: backup_result
  until: backup_result.finished
  retries: 120
  delay: 30

Patterns:

poll: 0 = fire-and-forget; check later with async_status.
poll: 5 = poll every 5s, block until done. Use when you want async's long timeout but synchronous flow.
async on a handler: occasionally useful for long-running restarts; set poll: to a sane value or the handler "completes" before the restart is done.

Parallel tasks on one host. Fire N tasks with poll: 0, then a single async_status loop with_items over the registered jids. Effectively gives you per-host task parallelism.

Loops done right

A loop: calls the module once per item. For a small list that's fine; for 300 packages it's 300 module invocations.

Bad: module-per-item

- name: Install packages (slow)
  ansible.builtin.package:
    name: "{{ item }}"
    state: present
  loop: "{{ packages }}"

Good: one module call with a list

- name: Install packages (one call)
  ansible.builtin.package:
    name: "{{ packages }}"     # most package modules accept a list
    state: present

This works for package, apt, dnf, yum, pip, firewalld (service: can be a list with a loop over services in one call), user (with loop: and a batch-friendly module — check the docs per-module).

When loop is unavoidable

Some modules really do need one call per item (e.g. authorized_key). Keep them, but consider:

Render a file once with template and push that, rather than looping.
Use assemble to concatenate fragments from a directory — idempotent and fast.
For lineinfile over many lines, use blockinfile or (better) a full template.

with_items is legacy. Modern Ansible uses loop:. with_items still works but lint will flag it; don't mix styles in one role.

Delegation patterns

run_once + delegate_to

"Run this task once across the whole batch, on a specific host." Classic uses: DB migrations, cache purges, LB config writes.

- name: Run DB migrations from the primary only
  ansible.builtin.command: /usr/local/bin/myapp migrate
  run_once: true
  delegate_to: "{{ groups['db'] | first }}"

Without run_once, every host in the play would run it. Without delegate_to, run_once would run it on "a" host but you don't control which.

Fan-out then fan-in

- name: Per-host: compute something
  ansible.builtin.command: /usr/local/bin/measure
  register: measurement

- name: Aggregate on the controller
  ansible.builtin.debug:
    msg: "Total: {{ ansible_play_hosts | map('extract', hostvars, ['measurement','stdout']) | map('int') | sum }}"
  run_once: true
  delegate_to: localhost

Delegated facts

By default, facts gathered while delegated belong to the delegate. To attribute them to the original host:

- ansible.builtin.setup:
  delegate_to: bastion.example.com
  delegate_facts: true          # facts about bastion? No — about THIS host as seen via the bastion

Mitogen

Mitogen is a third-party strategy plugin that replaces Ansible's default executor. It runs a persistent Python interpreter on the target and multiplexes module calls over a single connection, sidestepping pipelining and fork overhead. Reported wins: 2–4x on CPU-bound plays, sometimes more on huge fleets.

pip install mitogen

# ansible.cfg
[defaults]
strategy = mitogen_linear
strategy_plugins = /path/to/site-packages/ansible_mitogen/plugins/strategy

Caveats.

Mitogen is unofficial. Every Ansible upgrade can break it; pin the combo.
Certain modules and features (async, some connection plugins, network_cli) are incompatible.
Security model differs: Mitogen's long-lived remote Python changes what "clean teardown after each task" means.
Stack traces on failure are harder to read.

Use it if you've measured a real problem that pipelining + ControlMaster didn't solve. Otherwise, the stock stack is fine.

Measuring: profile_tasks, profile_roles, timer

Ansible ships callback plugins that print per-task / per-role timings. Enable one and rerun.

# ansible.cfg
[defaults]
callbacks_enabled = profile_tasks, profile_roles, timer

Or per-run:

ANSIBLE_CALLBACKS_ENABLED=profile_tasks,timer \
  ansible-playbook -i inventories/dev site.yml

What each callback shows

Callback	Shows	Useful for
`timer`	Total wall-clock for the whole run	Baseline for regression tracking
`profile_tasks`	Per-task duration, sorted longest-first at end of run	Finding the one task eating 90% of the time
`profile_roles`	Per-role totals	Which role is the bottleneck (roll up tasks)

Example tail of a run with profile_tasks:

Saturday 12 April 2026  14:02:41 +0000 (0:00:00.018)       0:03:12.447 *****
===============================================================================
db : Run migration ----------------------------------------------------- 42.13s
app : Install npm deps ------------------------------------------------- 28.94s
Gathering Facts -------------------------------------------------------- 14.22s
app : Compile assets --------------------------------------------------- 11.00s
nginx : Reload nginx (handler) ----------------------------------------- 03.51s
...

Now you know where to spend time. 42 s on migrations? Probably fine — it's really work. 14 s on gathering? Switch on fact caching. 28 s on npm? Cache node_modules or use a prebuilt image.

Per-host vs wall-clock

profile_tasks reports wall-clock for each task across all hosts. With strategy: linear that's the max over the batch (slowest host determines it). With strategy: free it's meaningless as a per-host signal. For per-host debugging, run against one host at a time and diff the timings.

Checklist

[ ] pipelining = true in ansible.cfg
[ ] ssh_args includes ControlMaster=auto ControlPersist=60s
[ ] forks raised to at least 20, sized to controller RAM and runner network
[ ] gathering = smart and a fact_caching backend
[ ] gather_facts: false on plays that don't need facts
[ ] gather_subset used to trim fact collection where facts are needed
[ ] No loop: over hundreds of packages — feed the module a list
[ ] Long tasks use async + async_status
[ ] One-off coordination uses run_once: true + delegate_to
[ ] callbacks_enabled = profile_tasks, timer at least in CI
[ ] Wall-clock baseline tracked over time; regressions investigated