Terraform Basics
- State is the source of truth. If you lose it, Terraform forgets every resource it manages — plan accordingly (remote backend, versioning, lock).
- Pin your providers and Terraform version in
required_providers. Floating versions have broken production before and will again. terraform planis read-only. Run it in CI on every MR, even without apply. Drift shows up there.- A module is just a directory. Don't over-engineer — inline resources until a second caller needs them.
lifecycle { prevent_destroy = true }is the cheapest way to avoid a rm-rf-prod moment on RDS / S3 buckets.- Terraform is for long-lived infrastructure. App deploys, queue messages, short-lived resources: use the right tool instead.
Anatomy of a config
A Terraform project is one or more .tf files in a directory. Canonical split:
project/
├── versions.tf # terraform {} block: required_version, required_providers
├── providers.tf # provider "aws" {} etc.
├── variables.tf # variable "foo" {}
├── main.tf # resources
├── outputs.tf # output "bar" {}
└── terraform.tfvars # values for variables (gitignored if sensitive)
The filenames are convention — Terraform concatenates every .tf in the directory before parsing. Order within a file is irrelevant; Terraform builds a dependency graph from references.
Providers
# versions.tf
terraform {
required_version = "~> 1.9"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.60"
}
cloudflare = {
source = "cloudflare/cloudflare"
version = "~> 4.40"
}
}
}
# providers.tf
provider "aws" {
region = "eu-west-1"
# Auth comes from the environment: AWS_PROFILE or AWS_ACCESS_KEY_ID/…
# Never put static keys in code.
}
provider "aws" {
alias = "useast1"
region = "us-east-1" # Second provider instance (e.g. for ACM + CloudFront)
}
- Pin by minor (
~> 5.60means>= 5.60.0, < 6.0.0). Pinning the major only is asking for a breaking change on a Tuesday. - Auth via environment. Static credentials in
providerblocks end up in state, git, and the operator's shell history. - Aliases let you target multiple regions/accounts from one config:
provider = aws.useast1on the resource.
Resources and data sources
A resource is something Terraform creates and manages. A data source is something Terraform reads but does not own.
data "aws_ami" "ubuntu" {
most_recent = true
owners = ["099720109477"] # Canonical
filter {
name = "name"
values = ["ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-*"]
}
}
resource "aws_instance" "web" {
ami = data.aws_ami.ubuntu.id
instance_type = "t3.small"
tags = {
Name = "web-01"
ManagedBy = "terraform"
}
}
Reference syntax:
resource_type.name.attrfor a resource (aws_instance.web.id)data.resource_type.name.attrfor a data sourcevar.foo,local.bar,module.baz.outfor variables, locals, module outputs
Always tag (or label) everything with ManagedBy = "terraform". The first time someone clicks a button in the console on an unlabelled resource, it silently diverges from state and you have minutes of terraform apply to undo.
Variables and precedence
variable "region" {
type = string
description = "AWS region for primary resources."
default = "eu-west-1"
}
variable "instance_count" {
type = number
default = 1
validation {
condition = var.instance_count >= 1 && var.instance_count <= 10
error_message = "instance_count must be 1..10."
}
}
variable "db_password" {
type = string
sensitive = true # redacts in plan/apply output and outputs
}
Variable precedence, lowest to highest (higher wins):
defaultin the variable blockterraform.tfvars*.auto.tfvars(alphabetical)-var-file=…on the CLI-var=foo=baron the CLI- Environment variables:
TF_VAR_foo=bar
Gitignore any .tfvars file containing secrets, or keep secrets out of .tfvars entirely and use TF_VAR_db_password from a CI secret store.
Outputs
output "web_ip" {
value = aws_instance.web.public_ip
description = "Public IPv4 of the web instance."
}
output "db_password" {
value = aws_db_instance.main.password
sensitive = true # redacts in CLI output; still plaintext in state
}
Outputs are the public interface of a root module or a child module. They are also how you move values between state files (terraform_remote_state data source, or — better — published to a central parameter store).
State and backends
State is a JSON file mapping Terraform's view of the world to the real resources. Local state (terraform.tfstate) is fine for learning; for anything real you want a remote backend that offers three things:
- Durability — the file lives somewhere versioned and backed up.
- Locking — only one
applyat a time against the same state. - Access control — who can read (and therefore see secrets in) state.
AWS: S3 + DynamoDB
terraform {
backend "s3" {
bucket = "mycorp-tfstate-prod"
key = "platform/network.tfstate"
region = "eu-west-1"
dynamodb_table = "tf-lock" # provides state locking
encrypt = true
}
}
Bucket versioning on, SSE-KMS, IAM read restricted to your CI role and the platform team. The DynamoDB table has a single LockID string key.
GCS
terraform {
backend "gcs" {
bucket = "mycorp-tfstate-prod"
prefix = "platform/network" # object-level lock is native in GCS
}
}
Terraform Cloud / HCP Terraform
terraform {
cloud {
organization = "mycorp"
workspaces { name = "platform-network-prod" }
}
}
Gives you state, locking, remote runs, policy checks (Sentinel/OPA), and a UI. The free tier covers small teams.
GitLab-managed HTTP backend
terraform {
backend "http" {
address = "https://gitlab.example.com/api/v4/projects/42/terraform/state/prod"
lock_address = "https://gitlab.example.com/api/v4/projects/42/terraform/state/prod/lock"
unlock_address = "https://gitlab.example.com/api/v4/projects/42/terraform/state/prod/lock"
lock_method = "POST"
unlock_method = "DELETE"
retry_wait_min = "5"
}
}
Auth via TF_HTTP_USERNAME and TF_HTTP_PASSWORD (a personal access token or a project job token). GitLab stores state per project with native locking.
Modules
A module is any directory with .tf files. There are two kinds by role:
- Root module — the directory you run
terraform applyin. - Child module — called from another module via
module "x" { source = "..." }.
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "~> 5.8" # registry, semver-pinned
name = "prod"
cidr = "10.20.0.0/16"
azs = ["eu-west-1a", "eu-west-1b", "eu-west-1c"]
private_subnets = ["10.20.1.0/24", "10.20.2.0/24", "10.20.3.0/24"]
public_subnets = ["10.20.101.0/24", "10.20.102.0/24", "10.20.103.0/24"]
}
module "app" {
source = "git::ssh://git@gitlab.example.com/infra/modules/app.git//terraform?ref=v1.4.2"
vpc_id = module.vpc.vpc_id
}
Sources:
- Terraform Registry (
namespace/name/provider) — public or private. Always pinversion. - Git (
git::ssh://…orgit::https://…). Pin with?ref=v1.4.2(a tag, not a branch). - Local path (
./modules/foo) for modules in the same repo.
Plan, apply, drift
terraform init # download providers, set up backend
terraform plan -out=tfplan
terraform apply tfplan # apply exactly what was planned, no surprises
terraform destroy # tear down everything in this state
Drift is when the real world differs from state. The canonical check is terraform plan on an unchanged config:
terraform plan -detailed-exitcode
# 0 = no diff; 1 = error; 2 = non-empty diff (drift or a pending change)
Run this on a schedule in CI and alert on exit code 2. The most common causes of drift are humans clicking in the console, auto-scaling changes the provider's resource doesn't track well, and IAM bits that other roles mutate.
fmt, validate, workspaces
terraform fmt -recursive # canonical whitespace; run as a pre-commit hook
terraform validate # syntactic + basic semantic checks
terraform providers # show the dependency tree
terraform graph | dot -Tpng > graph.png
Workspaces let one backend store multiple states:
terraform workspace new staging
terraform workspace select staging
terraform.workspace # reference as ${terraform.workspace} in HCL
envs/prod/, envs/dev/) with separate state files. Workspaces are good for ephemeral per-PR stacks.
Lifecycle
resource "aws_s3_bucket" "logs" {
bucket = "mycorp-logs-prod"
lifecycle {
prevent_destroy = true # apply will refuse to delete this
create_before_destroy = true # for resources that can't be replaced in place
ignore_changes = [tags["LastTouchedBy"]] # stop chasing a field another system owns
replace_triggered_by = [aws_launch_template.web.latest_version]
}
}
- prevent_destroy — belt-and-braces for databases, buckets, DNS zones. To actually delete, you edit the code first.
- create_before_destroy — for names that must stay unique (launch templates, IAM roles with assumed policies). Terraform creates the replacement, swaps references, then deletes the old.
- ignore_changes — for attributes a different system writes (autoscaling-driven counts, labels from an operator). Without this, every apply fights the other system.
- replace_triggered_by — force a replace when a dependency changes (e.g. bumping an AMI forces new instances).
When NOT to use Terraform
| Scenario | Why Terraform is wrong | What to use |
|---|---|---|
| App deployments (Docker image rollouts) | Terraform wants to own a resource's lifecycle; frequent redeploys thrash state and locks. | CD tool (ArgoCD, Flux, GitLab CI docker push + restart). |
| Short-lived resources (per-PR envs of hundreds of objects) | State grows faster than you can clean it up. | A thin script + the cloud CLI, or ephemeral workspaces torn down on PR close. |
| In-container config (install packages, copy files) | Terraform is about resources, not OS state. | Ansible (best practices), Packer (golden images). |
| Data mutation (DB rows, queue messages) | Terraform diff-and-apply against row-level data is a bad fit and dangerous. | Migrations (Flyway, sqitch, schema-diff tools). |
| Cluster-managed objects (Kubernetes Pods, HPAs that auto-scale) | Terraform will fight the controller constantly. | Manifests via GitOps; let Kubernetes own Kubernetes. |
Next: Terraform + Cloudflare for a concrete real project, and Packer for the OS-image side of IaC.