Ansible Testing

How to catch Ansible bugs before a customer does: syntax, lint, YAML style, Molecule scenarios, collection sanity, and wiring it all into CI.

The test pyramid for Ansible
  • yamllint — formatting. Fast. Runs in under a second.
  • ansible-playbook --syntax-check — the YAML is a valid playbook.
  • ansible-lint — Ansible-specific semantic rules (idempotency smells, deprecated syntax).
  • --check --diff — dry-run against a real inventory; catches missing vars and no-longer-valid templates.
  • Molecule — spins up a container/VM and actually converges the role. The only layer that proves idempotency.
  • ansible-test sanity — for collections, mandatory before publishing to Galaxy.

Run the first four on every MR. Run Molecule on role changes. Run ansible-test sanity on tag releases of a collection.

Syntax check

--syntax-check parses every referenced file and resolves roles, includes, and imports, but executes nothing. It is the cheapest gate you have.

ansible-playbook -i inventories/dev site.yml --syntax-check

Things it catches:

Things it does not catch:

Gotcha: --syntax-check does not honour --tags. It always resolves everything. If a role is referenced but broken, syntax-check will fail even if you would never run that role.

--check --diff semantics (and limits)

--check runs the playbook in no-op mode. Each module reports what it would change. --diff prints file-level diffs for tasks that support it (copy, template, lineinfile, blockinfile).

ansible-playbook -i inventories/dev site.yml --check --diff

What --check actually does

Making a command run in check-mode

- name: Verify the service is healthy (read-only)
  ansible.builtin.command: systemctl is-active nginx
  check_mode: false      # run even in --check
  changed_when: false    # never report a change
  register: svc
  failed_when: svc.rc != 0

When --check lies

A task that depends on the result of an earlier command will see an empty register in check-mode, and may look like it would run when in reality the earlier command would have set skip=true. The canonical trap:

- ansible.builtin.command: /usr/bin/find-something
  register: finding

- ansible.builtin.template:
    src: config.j2
    dest: /etc/app.conf
  when: finding.stdout == "expected"

In --check, finding.stdout is undefined → the when: fails noisily. Either add check_mode: false to the command, or write the when: defensively: when: finding.stdout | default('') == "expected".

Pattern: run --check --diff in CI against dev, not prod. Dev is close enough in shape to reveal problems but nothing is on the line if the dry-run accidentally converges (e.g. a task that forgot check_mode: true on a destructive read).

yamllint

Catches formatting drift. Run it first — it is the fastest and the most deterministic. Install and run:

pip install yamllint
yamllint .

Configure it with a .yamllint in the repo root:

# .yamllint
---
extends: default
rules:
  line-length:
    max: 160
    level: warning
  truthy:
    allowed-values: ['true', 'false']
    check-keys: false       # Ansible has "yes"/"no" keys in historical playbooks
  comments:
    min-spaces-from-content: 1
  indentation:
    spaces: 2
    indent-sequences: true
    check-multi-line-strings: false
  braces:
    max-spaces-inside: 1    # Jinja {{ var }} has a space
  octal-values:
    forbid-implicit-octal: true
    forbid-explicit-octal: false
ignore: |
  .venv/
  molecule/*/.ansible/
  collections/
Tip: the two rules that bite new users are truthy (don't use bare yes/no in modern playbooks) and octal-values (a file mode of 644 without the leading 0 or quotes is a bug — it is decimal, not octal).

ansible-lint

ansible-lint understands Ansible semantics. It knows that command: without creates: is probably non-idempotent, that when: should not use {{ }}, and that shell is rarely what you actually want.

pip install "ansible-lint>=24.0"
ansible-lint                # lint cwd
ansible-lint playbooks/site.yml roles/
ansible-lint --profile production   # strict profile

Profiles

ansible-lint ships with ordered profiles. Each profile includes every rule in the lower ones.

ProfileForExample rules turned on
minExperiments, throwaway codeSyntax, basic YAML
basicA role you expect humans to readname[missing], risky-file-permissions
moderateRoles going into productionno-changed-when, var-naming, jinja[spacing]
safetyAnything touching prod systemsno-handler, risky-shell-pipe, partial-become
sharedCollections published to Galaxygalaxy, meta-no-info
productionThe full barEverything, including fqcn (force FQCNs)

Configuration file

# .ansible-lint
---
profile: production

exclude_paths:
  - .cache/
  - collections/
  - molecule/*/files/

# Rules you genuinely want off, not ones you're "getting to later"
skip_list:
  - experimental          # skip rules tagged experimental
  - yaml[line-length]     # already handled by yamllint

# Rules demoted from error to warning
warn_list:
  - fqcn[action-core]

# Treat these tags as ok even though the default profile flags them
enable_list:
  - no-log-password

# Offline mode speeds CI up and stops calls to Galaxy
offline: true

# Mock undefined roles/modules when linting a partial checkout
mock_roles:
  - company.shared.base
mock_modules:
  - company.shared.proprietary_module

Common rules and how to fix them

RuleMeaningFix
name[missing]Task has no name:Add a name. Every task.
name[casing]Name should start with a capital lettername: Install nginx, not install nginx
fqcn[action-core]Using copy: instead of ansible.builtin.copy:Use FQCN everywhere
no-changed-whencommand/shell without changed_when: or creates:Add one; the task is probably non-idempotent without it
no-handlerTask does restart-like work outside a handlerMove to handlers/main.yml, notify: it
risky-file-permissionsmode: missingSet an explicit mode ('0644' with quotes)
risky-shell-pipeshell: with | without pipefailUse args: executable: /bin/bash and set -o pipefail;
var-namingVar not snake_case, or role var not role-prefixedRename; see variables
jinja[spacing]{{var}} instead of {{ var }}Add spaces inside braces
partial-becomeTask sets become_user without become: trueAdd both or neither

Ignoring a specific violation

Inline per task — prefer this over adding to skip_list:

- name: Rotate a secret with the vendor CLI
  ansible.builtin.command: /usr/local/bin/vendor rotate
  register: rotation
  changed_when: "'rotated' in rotation.stdout"
  # noqa: risky-shell-pipe  # vendor CLI requires a shell redirect, reviewed 2026-04

Ignores in .ansible-lint-ignore — a file of <path> <rule> lines — are for rules you genuinely cannot fix in one commit (e.g. a long legacy role). Treat them as a bug list, not a parking lot.

Molecule

Molecule is the only layer that proves a role works (as opposed to "parses" or "has no smells"). It creates an ephemeral container/VM, runs the role against it, then asserts on the result.

pip install "molecule>=6" "molecule-plugins[docker]>=23" ansible-core
cd roles/nginx
molecule init scenario default -d docker

This creates roles/nginx/molecule/default/ with:

molecule/default/
├── converge.yml       # the play that applies the role
├── molecule.yml       # driver, platforms, provisioner, verifier config
├── verify.yml         # assertions after convergence
└── create.yml / destroy.yml   # optional, driver-specific

molecule.yml — the important knobs

---
driver:
  name: docker           # docker | podman | delegated | vagrant

platforms:
  - name: rocky9
    image: rockylinux:9
    pre_build_image: true
    command: /sbin/init              # required for systemd inside a container
    privileged: true
    volumes:
      - /sys/fs/cgroup:/sys/fs/cgroup:rw
    cgroupns_mode: host
  - name: debian12
    image: debian:12
    command: /sbin/init
    privileged: true

provisioner:
  name: ansible
  config_options:
    defaults:
      interpreter_python: auto_silent
      callbacks_enabled: profile_tasks
  inventory:
    group_vars:
      all:
        nginx_listen_port: 8080

verifier:
  name: ansible          # use ansible tasks in verify.yml; 'testinfra' is legacy

scenario:
  test_sequence:
    - dependency
    - lint
    - cleanup
    - destroy
    - syntax
    - create
    - prepare
    - converge
    - idempotence         # ← the one that catches bad tasks
    - verify
    - cleanup
    - destroy

Drivers at a glance

DriverGood forCaveats
dockerFast iteration, stateless roles, CIsystemd needs privileged and init; no kernel changes; no firewalld in some images
podmanRootless CI, RHEL-ish shopscgroups v2 quirks; need --systemd=always-equivalent
delegatedYou manage the target yourself (existing VM, cloud)You write create.yml/destroy.yml; most flexible, least magic
vagrantReal VMs, kernel-level testsSlow; needs VirtualBox/libvirt locally; rarely used in CI

Verifier with ansible

# molecule/default/verify.yml
---
- name: Verify
  hosts: all
  gather_facts: false
  tasks:
    - name: Check nginx is running
      ansible.builtin.service_facts:

    - name: Assert service is active
      ansible.builtin.assert:
        that:
          - ansible_facts.services['nginx.service'].state == 'running'

    - name: Fetch the index page
      ansible.builtin.uri:
        url: http://localhost:8080/
        status_code: 200
        return_content: true
      register: home

    - name: Assert the template rendered our marker
      ansible.builtin.assert:
        that: "'managed by ansible' in home.content"

Running scenarios

molecule test                  # full sequence above, for the default scenario
molecule converge              # create + apply, leave container up for debugging
molecule login -h rocky9       # exec into the running container
molecule verify                # re-run verify only
molecule destroy               # teardown
molecule test -s upgrade       # a different scenario

Multiple scenarios

One scenario per interesting behaviour: default, upgrade, tls, cluster. Share a molecule/shared/ directory of helper plays — molecule.yml can reference them via provisioner.playbooks.converge.

ansible-test sanity for collections

If you ship a collection (see Ansible Collection), ansible-test enforces what Galaxy and Automation Hub require. Run from the root of the collection (the directory containing galaxy.yml).

# from inside ~/.ansible/collections/ansible_collections/myorg/mycoll
ansible-test sanity --docker default -v
ansible-test units --docker default -v       # pytest against plugins/
ansible-test integration --docker default -v # runs roles/tests in containers

Sanity runs a bundle of sub-tests: pep8, pylint, validate-modules (checks DOCUMENTATION, EXAMPLES, RETURN blocks), import, yamllint, and more. Skip a test only by adding it to tests/sanity/ignore-<version>.txt:

# tests/sanity/ignore-2.17.txt
plugins/modules/legacy_foo.py validate-modules:invalid-documentation
plugins/modules/legacy_foo.py pylint:disallowed-name

Testing inventory filters and plugins

Inventory bugs are the worst bugs — you run the play against the wrong hosts. Sanity-check before you commit:

# What does my pattern actually resolve to?
ansible -i inventories/prod web --list-hosts
ansible -i inventories/prod 'web:&eu:!canary' --list-hosts

# Dump the full tree
ansible-inventory -i inventories/prod --graph
ansible-inventory -i inventories/prod --host web01.example.com

# Machine-readable for scripts
ansible-inventory -i inventories/prod --list | jq '._meta.hostvars["web01"].ansible_host'

Assert group membership in CI before you let a merge land. A shell one-liner will do:

set -euo pipefail
expected=$(sort tests/expected-web-hosts.txt)
actual=$(ansible-inventory -i inventories/prod --graph web \
         | awk '/^ *\|--/{print $2}' | sort)
diff <(echo "$expected") <(echo "$actual")

For dynamic inventory plugins (AWS, GCP, Proxmox), run ansible-inventory --graph against a known-stable account/tag set and snapshot the output. A diff in review is a human checkpoint on whether a cloud filter change is intentional.

GitLab CI skeleton

Glue all of the above into .gitlab-ci.yml. Stages run fastest-first so a formatting bug fails in 10 seconds, not 10 minutes. See GitLab CI/CD for the wider picture.

# .gitlab-ci.yml
stages:
  - lint
  - check
  - molecule
  - publish

default:
  image: python:3.12-slim
  before_script:
    - pip install --quiet
        "ansible-core>=2.17"
        "ansible-lint>=24"
        "yamllint>=1.35"
    - ansible-galaxy collection install -r requirements.yml -p collections/

yamllint:
  stage: lint
  script:
    - yamllint .

ansible-lint:
  stage: lint
  script:
    - ansible-lint --offline --profile production

syntax-check:
  stage: check
  script:
    - ansible-playbook -i inventories/dev site.yml --syntax-check

check-diff-dev:
  stage: check
  script:
    - ansible-playbook -i inventories/dev site.yml --check --diff
  rules:
    - if: $CI_PIPELINE_SOURCE == "merge_request_event"

molecule:
  stage: molecule
  image: quay.io/ansible/molecule:latest
  services:
    - docker:dind
  variables:
    DOCKER_HOST: tcp://docker:2375
  parallel:
    matrix:
      - ROLE: [nginx, postgresql, app]
  script:
    - cd roles/$ROLE
    - molecule test
  rules:
    - changes:
        - roles/$ROLE/**/*

collection-sanity:
  stage: check
  image: quay.io/ansible/ansible-test:latest
  script:
    - cd collections/ansible_collections/myorg/mycoll
    - ansible-test sanity --docker default
  rules:
    - changes:
        - "collections/**/*"
Pattern: let Molecule jobs run in parallel across roles with GitLab's parallel: matrix:. A single molecule test is 30–90 seconds; ten in parallel on shared runners is cheaper than one sequential molecule test per role.

Troubleshooting

SymptomLikely causeFix
yaml[truthy] on yes/noLegacy booleansUse true/false, or relax the rule in .yamllint
ERROR! Syntax Error while loading YAML. mapping values are not allowedUnquoted colon inside a stringQuote the value: msg: "error: foo"
fqcn[action-core] on every taskBare module namesPrefix with ansible.builtin. (bulk-replace in an editor)
risky-file-permissionscopy/template with no mode:Set mode: '0644' (quoted, leading zero)
no-changed-when on a commandTask has no notion of "did anything change"Add creates:, or changed_when: on a stdout match
Molecule: Failed to start DockerDinD not privilegedAdd services: [docker:dind] and DOCKER_HOST=tcp://docker:2375
Molecule: container exits immediatelyImage has no /sbin/initUse an image tagged with systemd, or switch command: to sleep infinity for non-service roles
Molecule: idempotence fails on a templateTemplate output not deterministic (timestamp, dict ordering)Sort keys in Jinja ({{ d | dictsort }}); remove timestamps from rendered content
Molecule: idempotence fails on a commandMissing changed_when:/creates:Fix the task — this is the test working as designed
--check fails with "undefined variable" on a registered varThe earlier command was skipped in check-modeAdd check_mode: false on the source task or default('') on the consumer
ansible-test sanity: validate-modules:missing-mainModule missing def main():Add the canonical entrypoint (see custom modules)
ansible-lint silent on role with obvious smellsLint run from wrong cwd; path excludedRun from repo root; check exclude_paths in .ansible-lint
CI lint passes, local lint failsVersion driftPin versions in CI's before_script and in a dev requirements-dev.txt

Related reading: Ansible Best Practices, Ansible Debugging, Ansible Deploy Flow, GitLab MR review.