Ansible Testing
- yamllint — formatting. Fast. Runs in under a second.
- ansible-playbook --syntax-check — the YAML is a valid playbook.
- ansible-lint — Ansible-specific semantic rules (idempotency smells, deprecated syntax).
- --check --diff — dry-run against a real inventory; catches missing vars and no-longer-valid templates.
- Molecule — spins up a container/VM and actually converges the role. The only layer that proves idempotency.
- ansible-test sanity — for collections, mandatory before publishing to Galaxy.
Run the first four on every MR. Run Molecule on role changes. Run ansible-test sanity on tag releases of a collection.
Syntax check
--syntax-check parses every referenced file and resolves roles, includes, and imports, but executes nothing. It is the cheapest gate you have.
ansible-playbook -i inventories/dev site.yml --syntax-check
Things it catches:
- YAML parse errors (indentation, unquoted colons, tab characters).
- Unknown top-level keys (
task:instead oftasks:). - Missing roles and included files — paths are resolved.
when:/loop:mistyped aswith:inside a task block.
Things it does not catch:
- Undefined variables — Jinja is not rendered until execution.
- Module argument errors — modules are not loaded.
- Missing hosts in inventory — the inventory is parsed but not evaluated against patterns.
--syntax-check does not honour --tags. It always resolves everything. If a role is referenced but broken, syntax-check will fail even if you would never run that role.
--check --diff semantics (and limits)
--check runs the playbook in no-op mode. Each module reports what it would change. --diff prints file-level diffs for tasks that support it (copy, template, lineinfile, blockinfile).
ansible-playbook -i inventories/dev site.yml --check --diff
What --check actually does
- Real modules (
copy,template,package,service,user, etc.) honour check-mode and do not mutate. shellandcommandare skipped by default in check-mode. Their register variables are empty, so downstreamwhen:clauses may behave oddly.- Handlers are notified but do not actually run (the triggering task did not really change anything).
Making a command run in check-mode
- name: Verify the service is healthy (read-only)
ansible.builtin.command: systemctl is-active nginx
check_mode: false # run even in --check
changed_when: false # never report a change
register: svc
failed_when: svc.rc != 0
When --check lies
A task that depends on the result of an earlier command will see an empty register in check-mode, and may look like it would run when in reality the earlier command would have set skip=true. The canonical trap:
- ansible.builtin.command: /usr/bin/find-something
register: finding
- ansible.builtin.template:
src: config.j2
dest: /etc/app.conf
when: finding.stdout == "expected"
In --check, finding.stdout is undefined → the when: fails noisily. Either add check_mode: false to the command, or write the when: defensively: when: finding.stdout | default('') == "expected".
--check --diff in CI against dev, not prod. Dev is close enough in shape to reveal problems but nothing is on the line if the dry-run accidentally converges (e.g. a task that forgot check_mode: true on a destructive read).
yamllint
Catches formatting drift. Run it first — it is the fastest and the most deterministic. Install and run:
pip install yamllint
yamllint .
Configure it with a .yamllint in the repo root:
# .yamllint
---
extends: default
rules:
line-length:
max: 160
level: warning
truthy:
allowed-values: ['true', 'false']
check-keys: false # Ansible has "yes"/"no" keys in historical playbooks
comments:
min-spaces-from-content: 1
indentation:
spaces: 2
indent-sequences: true
check-multi-line-strings: false
braces:
max-spaces-inside: 1 # Jinja {{ var }} has a space
octal-values:
forbid-implicit-octal: true
forbid-explicit-octal: false
ignore: |
.venv/
molecule/*/.ansible/
collections/
truthy (don't use bare yes/no in modern playbooks) and octal-values (a file mode of 644 without the leading 0 or quotes is a bug — it is decimal, not octal).
ansible-lint
ansible-lint understands Ansible semantics. It knows that command: without creates: is probably non-idempotent, that when: should not use {{ }}, and that shell is rarely what you actually want.
pip install "ansible-lint>=24.0"
ansible-lint # lint cwd
ansible-lint playbooks/site.yml roles/
ansible-lint --profile production # strict profile
Profiles
ansible-lint ships with ordered profiles. Each profile includes every rule in the lower ones.
| Profile | For | Example rules turned on |
|---|---|---|
min | Experiments, throwaway code | Syntax, basic YAML |
basic | A role you expect humans to read | name[missing], risky-file-permissions |
moderate | Roles going into production | no-changed-when, var-naming, jinja[spacing] |
safety | Anything touching prod systems | no-handler, risky-shell-pipe, partial-become |
shared | Collections published to Galaxy | galaxy, meta-no-info |
production | The full bar | Everything, including fqcn (force FQCNs) |
Configuration file
# .ansible-lint
---
profile: production
exclude_paths:
- .cache/
- collections/
- molecule/*/files/
# Rules you genuinely want off, not ones you're "getting to later"
skip_list:
- experimental # skip rules tagged experimental
- yaml[line-length] # already handled by yamllint
# Rules demoted from error to warning
warn_list:
- fqcn[action-core]
# Treat these tags as ok even though the default profile flags them
enable_list:
- no-log-password
# Offline mode speeds CI up and stops calls to Galaxy
offline: true
# Mock undefined roles/modules when linting a partial checkout
mock_roles:
- company.shared.base
mock_modules:
- company.shared.proprietary_module
Common rules and how to fix them
| Rule | Meaning | Fix |
|---|---|---|
name[missing] | Task has no name: | Add a name. Every task. |
name[casing] | Name should start with a capital letter | name: Install nginx, not install nginx |
fqcn[action-core] | Using copy: instead of ansible.builtin.copy: | Use FQCN everywhere |
no-changed-when | command/shell without changed_when: or creates: | Add one; the task is probably non-idempotent without it |
no-handler | Task does restart-like work outside a handler | Move to handlers/main.yml, notify: it |
risky-file-permissions | mode: missing | Set an explicit mode ('0644' with quotes) |
risky-shell-pipe | shell: with | without pipefail | Use args: executable: /bin/bash and set -o pipefail; |
var-naming | Var not snake_case, or role var not role-prefixed | Rename; see variables |
jinja[spacing] | {{var}} instead of {{ var }} | Add spaces inside braces |
partial-become | Task sets become_user without become: true | Add both or neither |
Ignoring a specific violation
Inline per task — prefer this over adding to skip_list:
- name: Rotate a secret with the vendor CLI
ansible.builtin.command: /usr/local/bin/vendor rotate
register: rotation
changed_when: "'rotated' in rotation.stdout"
# noqa: risky-shell-pipe # vendor CLI requires a shell redirect, reviewed 2026-04
Ignores in .ansible-lint-ignore — a file of <path> <rule> lines — are for rules you genuinely cannot fix in one commit (e.g. a long legacy role). Treat them as a bug list, not a parking lot.
Molecule
Molecule is the only layer that proves a role works (as opposed to "parses" or "has no smells"). It creates an ephemeral container/VM, runs the role against it, then asserts on the result.
pip install "molecule>=6" "molecule-plugins[docker]>=23" ansible-core
cd roles/nginx
molecule init scenario default -d docker
This creates roles/nginx/molecule/default/ with:
molecule/default/
├── converge.yml # the play that applies the role
├── molecule.yml # driver, platforms, provisioner, verifier config
├── verify.yml # assertions after convergence
└── create.yml / destroy.yml # optional, driver-specific
molecule.yml — the important knobs
---
driver:
name: docker # docker | podman | delegated | vagrant
platforms:
- name: rocky9
image: rockylinux:9
pre_build_image: true
command: /sbin/init # required for systemd inside a container
privileged: true
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:rw
cgroupns_mode: host
- name: debian12
image: debian:12
command: /sbin/init
privileged: true
provisioner:
name: ansible
config_options:
defaults:
interpreter_python: auto_silent
callbacks_enabled: profile_tasks
inventory:
group_vars:
all:
nginx_listen_port: 8080
verifier:
name: ansible # use ansible tasks in verify.yml; 'testinfra' is legacy
scenario:
test_sequence:
- dependency
- lint
- cleanup
- destroy
- syntax
- create
- prepare
- converge
- idempotence # ← the one that catches bad tasks
- verify
- cleanup
- destroy
Drivers at a glance
| Driver | Good for | Caveats |
|---|---|---|
docker | Fast iteration, stateless roles, CI | systemd needs privileged and init; no kernel changes; no firewalld in some images |
podman | Rootless CI, RHEL-ish shops | cgroups v2 quirks; need --systemd=always-equivalent |
delegated | You manage the target yourself (existing VM, cloud) | You write create.yml/destroy.yml; most flexible, least magic |
vagrant | Real VMs, kernel-level tests | Slow; needs VirtualBox/libvirt locally; rarely used in CI |
Verifier with ansible
# molecule/default/verify.yml
---
- name: Verify
hosts: all
gather_facts: false
tasks:
- name: Check nginx is running
ansible.builtin.service_facts:
- name: Assert service is active
ansible.builtin.assert:
that:
- ansible_facts.services['nginx.service'].state == 'running'
- name: Fetch the index page
ansible.builtin.uri:
url: http://localhost:8080/
status_code: 200
return_content: true
register: home
- name: Assert the template rendered our marker
ansible.builtin.assert:
that: "'managed by ansible' in home.content"
Running scenarios
molecule test # full sequence above, for the default scenario
molecule converge # create + apply, leave container up for debugging
molecule login -h rocky9 # exec into the running container
molecule verify # re-run verify only
molecule destroy # teardown
molecule test -s upgrade # a different scenario
Multiple scenarios
One scenario per interesting behaviour: default, upgrade, tls, cluster. Share a molecule/shared/ directory of helper plays — molecule.yml can reference them via provisioner.playbooks.converge.
ansible-test sanity for collections
If you ship a collection (see Ansible Collection), ansible-test enforces what Galaxy and Automation Hub require. Run from the root of the collection (the directory containing galaxy.yml).
# from inside ~/.ansible/collections/ansible_collections/myorg/mycoll
ansible-test sanity --docker default -v
ansible-test units --docker default -v # pytest against plugins/
ansible-test integration --docker default -v # runs roles/tests in containers
Sanity runs a bundle of sub-tests: pep8, pylint, validate-modules (checks DOCUMENTATION, EXAMPLES, RETURN blocks), import, yamllint, and more. Skip a test only by adding it to tests/sanity/ignore-<version>.txt:
# tests/sanity/ignore-2.17.txt
plugins/modules/legacy_foo.py validate-modules:invalid-documentation
plugins/modules/legacy_foo.py pylint:disallowed-name
Testing inventory filters and plugins
Inventory bugs are the worst bugs — you run the play against the wrong hosts. Sanity-check before you commit:
# What does my pattern actually resolve to?
ansible -i inventories/prod web --list-hosts
ansible -i inventories/prod 'web:&eu:!canary' --list-hosts
# Dump the full tree
ansible-inventory -i inventories/prod --graph
ansible-inventory -i inventories/prod --host web01.example.com
# Machine-readable for scripts
ansible-inventory -i inventories/prod --list | jq '._meta.hostvars["web01"].ansible_host'
Assert group membership in CI before you let a merge land. A shell one-liner will do:
set -euo pipefail
expected=$(sort tests/expected-web-hosts.txt)
actual=$(ansible-inventory -i inventories/prod --graph web \
| awk '/^ *\|--/{print $2}' | sort)
diff <(echo "$expected") <(echo "$actual")
For dynamic inventory plugins (AWS, GCP, Proxmox), run ansible-inventory --graph against a known-stable account/tag set and snapshot the output. A diff in review is a human checkpoint on whether a cloud filter change is intentional.
GitLab CI skeleton
Glue all of the above into .gitlab-ci.yml. Stages run fastest-first so a formatting bug fails in 10 seconds, not 10 minutes. See GitLab CI/CD for the wider picture.
# .gitlab-ci.yml
stages:
- lint
- check
- molecule
- publish
default:
image: python:3.12-slim
before_script:
- pip install --quiet
"ansible-core>=2.17"
"ansible-lint>=24"
"yamllint>=1.35"
- ansible-galaxy collection install -r requirements.yml -p collections/
yamllint:
stage: lint
script:
- yamllint .
ansible-lint:
stage: lint
script:
- ansible-lint --offline --profile production
syntax-check:
stage: check
script:
- ansible-playbook -i inventories/dev site.yml --syntax-check
check-diff-dev:
stage: check
script:
- ansible-playbook -i inventories/dev site.yml --check --diff
rules:
- if: $CI_PIPELINE_SOURCE == "merge_request_event"
molecule:
stage: molecule
image: quay.io/ansible/molecule:latest
services:
- docker:dind
variables:
DOCKER_HOST: tcp://docker:2375
parallel:
matrix:
- ROLE: [nginx, postgresql, app]
script:
- cd roles/$ROLE
- molecule test
rules:
- changes:
- roles/$ROLE/**/*
collection-sanity:
stage: check
image: quay.io/ansible/ansible-test:latest
script:
- cd collections/ansible_collections/myorg/mycoll
- ansible-test sanity --docker default
rules:
- changes:
- "collections/**/*"
parallel: matrix:. A single molecule test is 30–90 seconds; ten in parallel on shared runners is cheaper than one sequential molecule test per role.
Troubleshooting
| Symptom | Likely cause | Fix |
|---|---|---|
yaml[truthy] on yes/no | Legacy booleans | Use true/false, or relax the rule in .yamllint |
ERROR! Syntax Error while loading YAML. mapping values are not allowed | Unquoted colon inside a string | Quote the value: msg: "error: foo" |
fqcn[action-core] on every task | Bare module names | Prefix with ansible.builtin. (bulk-replace in an editor) |
risky-file-permissions | copy/template with no mode: | Set mode: '0644' (quoted, leading zero) |
no-changed-when on a command | Task has no notion of "did anything change" | Add creates:, or changed_when: on a stdout match |
Molecule: Failed to start Docker | DinD not privileged | Add services: [docker:dind] and DOCKER_HOST=tcp://docker:2375 |
| Molecule: container exits immediately | Image has no /sbin/init | Use an image tagged with systemd, or switch command: to sleep infinity for non-service roles |
Molecule: idempotence fails on a template | Template output not deterministic (timestamp, dict ordering) | Sort keys in Jinja ({{ d | dictsort }}); remove timestamps from rendered content |
Molecule: idempotence fails on a command | Missing changed_when:/creates: | Fix the task — this is the test working as designed |
--check fails with "undefined variable" on a registered var | The earlier command was skipped in check-mode | Add check_mode: false on the source task or default('') on the consumer |
ansible-test sanity: validate-modules:missing-main | Module missing def main(): | Add the canonical entrypoint (see custom modules) |
ansible-lint silent on role with obvious smells | Lint run from wrong cwd; path excluded | Run from repo root; check exclude_paths in .ansible-lint |
| CI lint passes, local lint fails | Version drift | Pin versions in CI's before_script and in a dev requirements-dev.txt |
Related reading: Ansible Best Practices, Ansible Debugging, Ansible Deploy Flow, GitLab MR review.