Learn Ansible — hands-on tutorial
- Spin up two Linux hosts. Anything works: two cloud VMs, two LXC containers, two VMs in Vagrant/libvirt. They need SSH and
python3. - On each target host: a user with passwordless sudo and your SSH public key in
~/.ssh/authorized_keys. - On your workstation:
python3 -m pip install --user ansible ansible-lint yamllint molecule[ansible] mitogen(mitogen optional, used once in Lab 6). - Work in a git repository from Lab 1. Every lab is a commit. You will want to diff.
- After each lab there is a What could go wrong? box listing the most common real-world failures. Read it even if nothing went wrong for you.
Lab 1 — One-file playbook
Goal: install nginx on one target host and drop a page the host responds with.
Project layout (this is all you need):
learn-ansible/
├── ansible.cfg
├── inventory.ini
└── site.yml
# ansible.cfg
[defaults]
inventory = ./inventory.ini
host_key_checking = False
stdout_callback = yaml
forks = 10
# inventory.ini
web1 ansible_host=10.0.0.11 ansible_user=ansible
# site.yml
- name: Serve a single page from web1
hosts: web1
become: true
tasks:
- name: Install nginx
ansible.builtin.package:
name: nginx
state: present
- name: Drop our index page
ansible.builtin.copy:
dest: /usr/share/nginx/html/index.html
content: "Hello from {{ inventory_hostname }}\n"
mode: '0644'
- name: Ensure nginx is running
ansible.builtin.service:
name: nginx
state: started
enabled: true
Run it and verify:
ansible-playbook site.yml
curl http://10.0.0.11
# Hello from web1
Commit it: git add . && git commit -m "lab 1: single-host nginx".
Failed to connect to the host via ssh— your inventory user is wrong, the key is not installed on the target, oransible_hostis unreachable. Test with plainssh ansible@10.0.0.11before blaming Ansible.sudo: a password is required—becomeneeds passwordless sudo, or you need to pass--ask-become-pass./usr/bin/python3: not foundon very old targets — addansible_python_interpreter=/usr/libexec/platform-python(RHEL) or installpython3out-of-band.
Lab 2 — Inventory with groups and group_vars
Add a second host, group them, and pull values out of the playbook into per-group variables.
learn-ansible/
├── ansible.cfg
├── inventory/
│ ├── hosts.ini
│ └── group_vars/
│ ├── all.yml
│ └── web.yml
└── site.yml
# ansible.cfg (updated)
[defaults]
inventory = ./inventory
host_key_checking = False
stdout_callback = yaml
forks = 10
# inventory/hosts.ini
[web]
web1 ansible_host=10.0.0.11
web2 ansible_host=10.0.0.12
[web:vars]
ansible_user=ansible
# inventory/group_vars/all.yml
site_operator: "ops@example.com"
# inventory/group_vars/web.yml
http_port: 80
index_body: |
Hello from {{ inventory_hostname }} (operator: {{ site_operator }})
# site.yml (now uses vars)
- name: Serve a page from the web group
hosts: web
become: true
tasks:
- ansible.builtin.package: { name: nginx, state: present }
- ansible.builtin.copy:
dest: /usr/share/nginx/html/index.html
content: "{{ index_body }}"
mode: '0644'
- ansible.builtin.service: { name: nginx, state: started, enabled: true }
ansible-inventory --graph
# @all:
# |--@ungrouped:
# |--@web:
# | |--web1
# | |--web2
ansible-playbook site.yml
curl http://10.0.0.11 http://10.0.0.12
group_vars. Anything a human should be able to change without editing a role goes here. Anything internal to the role goes in roles/<name>/defaults/main.yml (next lab).
- You put vars in
inventory.ini[web:vars]instead ofgroup_vars/web.yml— fine for ONE thing, but it becomes a mess. Keep ini for hostnames, YAML for vars. - You named the file
group_vars/web/main.ymlwith noweb.yml. Ansible supports a directory per group too; both styles are fine but pick one.
See also: Project Structure, Variable Precedence.
Lab 3 — Refactor into a role
Time to split things up. Roles are the unit of reuse.
learn-ansible/
├── ansible.cfg
├── inventory/ ...
├── roles/
│ └── web/
│ ├── defaults/main.yml
│ ├── handlers/main.yml
│ ├── meta/main.yml
│ ├── tasks/main.yml
│ ├── templates/index.html.j2
│ └── vars/main.yml
└── site.yml
# roles/web/defaults/main.yml
# Safe-to-override defaults. These are the role's public interface.
web_http_port: 80
web_index_body: "Hello from {{ inventory_hostname }}"
web_server_tokens: "off"
# roles/web/vars/main.yml
# Internal, high-precedence vars the user should not override.
_web_package_name: "nginx"
_web_service_name: "nginx"
_web_docroot: "/usr/share/nginx/html"
# roles/web/tasks/main.yml
- name: Install nginx
ansible.builtin.package:
name: "{{ _web_package_name }}"
state: present
- name: Render the index page
ansible.builtin.template:
src: index.html.j2
dest: "{{ _web_docroot }}/index.html"
mode: '0644'
notify: reload nginx
- name: Ensure nginx is running
ansible.builtin.service:
name: "{{ _web_service_name }}"
state: started
enabled: true
# roles/web/handlers/main.yml
- name: reload nginx
ansible.builtin.service:
name: "{{ _web_service_name }}"
state: reloaded
# roles/web/templates/index.html.j2
<h1>{{ web_index_body }}</h1>
<p>Operator: {{ site_operator | default('ops@example.com') }}</p>
<p>Port: {{ web_http_port }}</p>
# roles/web/meta/main.yml
galaxy_info:
author: you
description: Minimal nginx with a single index page.
license: MIT
min_ansible_version: "2.14"
dependencies: []
# site.yml
- hosts: web
become: true
roles:
- web
ansible-playbook site.yml --check --diff
ansible-playbook site.yml
_ (_web_package_name) so a reader instantly knows "do not override this from outside". Public vars: web_*. This convention is covered in Best Practices & Refactoring.
Lab 4 — Handlers and templates
Goal: change the page body, observe the handler firing, and learn the flush_handlers escape hatch.
Edit roles/web/defaults/main.yml:
web_index_body: "Hello, version 2"
ansible-playbook site.yml --diff
# TASK [web : Render the index page] ********
# --- /usr/share/nginx/html/index.html
# +++ /usr/share/nginx/html/index.html
# - <h1>Hello from web1</h1>
# + <h1>Hello, version 2</h1>
# RUNNING HANDLER [web : reload nginx] ********
Now add a second task that must run after the reload but within the same play. Use meta: flush_handlers:
# roles/web/tasks/main.yml (add at end)
- name: Flush handlers so the reload happens before the smoke test
ansible.builtin.meta: flush_handlers
- name: Smoke test the page
ansible.builtin.uri:
url: "http://{{ inventory_hostname }}:{{ web_http_port }}/"
return_content: true
register: smoke
delegate_to: localhost
become: false
- name: Show what the server returned
ansible.builtin.debug:
msg: "{{ smoke.content | regex_search('<h1>(.+)</h1>', '\\1') }}"
- Two handlers with the same name silently collapse into one. Use
listen:to group them safely. - A handler on a dead service ("service not found") errors — make sure the install task ran before the notify.
- Multi-role plays: if role A's handler depends on role B's task, you need
meta: flush_handlersexplicitly between them orforce_handlers: trueat play level.
See also: Handlers & Templates, Jinja2.
Lab 5 — Secrets with ansible-vault
Add a secret (a fake API key) and make the role write it into a file.
mkdir -p inventory/group_vars/web
ansible-vault create inventory/group_vars/web/vault.yml
# editor opens; paste:
# ---
# web_api_key: "supersekret-abc123"
Reference it from the non-vault file and bind the two together:
# inventory/group_vars/web/main.yml
http_port: 80
# Point at the vaulted value via a simple-name indirection:
api_key: "{{ web_api_key }}"
# roles/web/tasks/main.yml (add)
- name: Install the API key for nginx
ansible.builtin.copy:
dest: /etc/nginx/api.key
content: "{{ api_key }}"
owner: root
group: root
mode: '0600'
no_log: true
ansible-playbook site.yml --ask-vault-pass
# or
echo 'mypass' > ~/.vault-pass && chmod 600 ~/.vault-pass
ansible-playbook site.yml --vault-password-file ~/.vault-pass
inventory/<env>/group_vars/web/vault.yml with a different password per env. Prod vault password lives somewhere a CI runner can read; dev vault password can live on disk.
Lab 6 — Idempotency lab
We will deliberately write a non-idempotent task and then fix it. Add this to roles/web/tasks/main.yml:
- name: (BAD) append a line to nginx.conf every run
ansible.builtin.shell: echo "# run marker" >> /etc/nginx/nginx.conf
ansible-playbook site.yml # changed
ansible-playbook site.yml # STILL CHANGED — this is the bug
ssh web1 "grep -c '# run marker' /etc/nginx/nginx.conf" # grows each run
Three idiomatic fixes. Pick the right one for the situation:
# Fix 1: use the real module. 90% of the time this is the answer.
- name: Ensure marker is present exactly once
ansible.builtin.lineinfile:
path: /etc/nginx/nginx.conf
line: "# run marker"
state: present
# Fix 2: if you genuinely must shell out, make idempotency explicit
- name: Ensure marker is present exactly once (shell)
ansible.builtin.shell: |
grep -qxF "# run marker" /etc/nginx/nginx.conf || echo "# run marker" >> /etc/nginx/nginx.conf
register: marker
changed_when: "'run marker' not in marker.stdout and marker.rc == 0 and 'marker added' in marker.stdout"
# ^ this gets fiddly — that's exactly why fix 1 exists
# Fix 3: creates / removes
- name: Write marker file once
ansible.builtin.copy:
dest: /etc/nginx/conf.d/marker
content: "# run marker\n"
# copy is idempotent by content hash; no extra logic needed
Now repeatedly run ansible-playbook site.yml and expect zero changed tasks. That is the definition of idempotent and the goal for every task you write.
- Using
shell/commandwhere a real module exists is the #1 source of non-idempotence.ansible-lint(next lab) catches this. register+changed_when: false"fixes" the lint but leaves the actual bug. Fix the task, not the report.--check --diffis the fastest way to detect non-idempotent tasks: idempotent tasks show no diff on the second run.
See also: Best Practices — Idempotency.
Lab 7 — CI with ansible-lint and check-mode
Add a .gitlab-ci.yml that does on every MR: yamllint → ansible-lint → syntax-check → check-mode against a dev environment.
# .gitlab-ci.yml
stages: [lint, check, dryrun]
image: python:3.12-slim
before_script:
- pip install --quiet ansible==9.* ansible-lint==24.* yamllint==1.*
yamllint:
stage: lint
script:
- yamllint .
ansible-lint:
stage: lint
script:
- ansible-lint
syntax-check:
stage: check
script:
- ansible-playbook site.yml --syntax-check -i inventory/hosts.ini
check-mode:
stage: dryrun
when: manual
rules:
- if: $CI_MERGE_REQUEST_IID
script:
- mkdir -p ~/.ssh && echo "$DEV_SSH_KEY" > ~/.ssh/id_ed25519 && chmod 600 ~/.ssh/id_ed25519
- ssh-keyscan 10.0.0.11 10.0.0.12 >> ~/.ssh/known_hosts
- echo "$VAULT_PASS" > vp && chmod 600 vp
- ansible-playbook site.yml --check --diff --vault-password-file vp
# .yamllint (repo root)
extends: default
rules:
line-length: {max: 160}
truthy: {allowed-values: ['true', 'false']}
# .ansible-lint (repo root)
exclude_paths:
- .git/
- .venv/
skip_list:
- yaml[line-length]
Mask $DEV_SSH_KEY and $VAULT_PASS in GitLab → Settings → CI/CD → Variables. Now every MR gets linted + syntax-checked automatically, and anyone can click "check-mode" to dry-run against dev.
See also: CI for Ansible, GitLab CI/CD Pipelines, Ansible Testing.
Lab 8 — Capstone: multi-role stack
Build a tiny three-tier app: nginx reverse-proxying to gunicorn, backed by postgresql. Tags let us apply just one layer at a time. Environments (dev, prod) share roles but differ in inventory/vars.
learn-ansible/
├── ansible.cfg
├── site.yml
├── inventories/
│ ├── dev/
│ │ ├── hosts.ini
│ │ └── group_vars/...
│ └── prod/
│ ├── hosts.ini
│ └── group_vars/...
├── roles/
│ ├── base/ (timezone, ntp, sshd hardening)
│ ├── db/ (postgresql)
│ ├── app/ (gunicorn + systemd unit)
│ └── proxy/ (nginx as the edge)
└── .gitlab-ci.yml
# site.yml
- hosts: all
become: true
roles:
- { role: base, tags: ['base'] }
- hosts: db
become: true
roles:
- { role: db, tags: ['db'] }
- hosts: app
become: true
roles:
- { role: app, tags: ['app'] }
- hosts: proxy
become: true
roles:
- { role: proxy, tags: ['proxy', 'edge'] }
# Apply everything to dev
ansible-playbook -i inventories/dev site.yml
# Only roll out the app layer
ansible-playbook -i inventories/dev site.yml --tags app
# Only roll out the edge
ansible-playbook -i inventories/prod site.yml --tags edge
# Single host (e.g. prod 2 of 5)
ansible-playbook -i inventories/prod site.yml --limit proxy2
Gate prod with serial: for rolling changes:
# one play from site.yml, prod-specific
- hosts: app
become: true
serial: "25%" # update 25% of app hosts at a time
max_fail_percentage: 10 # bail if more than 10% of that batch fails
roles:
- { role: app, tags: ['app'] }
roles/app/ changes dev behaviour, you know prod is next in line. Keep inventories thin.
Where to go next
- Best Practices & Refactoring — the rules you were nudged toward above, expanded with refactoring recipes and anti-patterns.
- Project Structure — the full production repo layout.
- Variable Precedence — the chain from role defaults to
--extra-vars. - Testing — Molecule, ansible-test, mock inventories.
- Error Handling — block/rescue/always, failure budgets.
- Performance — fact caching, pipelining, strategies, forks.
- Ansible Collection — when to go from roles to a full collection.