Containers 101

A container is a Linux process with namespaces and cgroups, not a tiny VM. Images, registries, OCI, layers, union filesystems, rootless, the security boundary, image digests, and a minimal Dockerfile.

If you only remember six things
  • A container is a process the kernel lies to about what it can see and use. That is namespaces (what it sees) plus cgroups (how much it gets).
  • Image ≠ container. Image is the filesystem + metadata on disk; container is a running (or stopped) instance of it.
  • latest is a lie. Pin by digest (@sha256:…) in anything you care about.
  • Containers share the host kernel. They are a boundary, not a wall. If you need a wall, use a VM.
  • Rootless is the default you should be starting from in 2026, not an advanced setting.
  • Every RUN in a Dockerfile is a layer. Order them by churn rate: slow-changing things first, source code last.

What a container actually is

A container is not a virtual machine. It is a regular Linux process that the kernel has been asked to isolate using two features:

That is it. There is no hypervisor. There is no "container daemon" in the kernel. When you run docker run alpine sh, you get a process that is still running on your host's kernel — it just thinks it's alone on a very small machine.

You can prove this to yourself without a container runtime:

# On any Linux host with util-linux:
sudo unshare --pid --fork --mount-proc --uts --net --ipc /bin/bash
# You are now in a new PID, mount, UTS, net, and IPC namespace.
# ps -ef      -> shows only bash
# hostname    -> changeable without affecting host
# ip addr     -> shows only lo (no interfaces until you build one)

That shell is, in every sense that matters, a container. A container runtime like Podman or Docker wraps this with an image, a working directory, cgroup limits, a writeable layer, optional networking, and a lifecycle API — but the underlying kernel mechanism is the same six lines above.

Contrast with VMs. A VM has its own kernel and its own hardware abstractions (virtual CPU, NIC, disk). Containers share the host kernel. That is what makes them fast to start and cheap to run, and it is also the source of every meaningful security caveat below.

Image vs container vs registry

The three things people confuse the most:

ThingWhat it isLives on
ImageAn immutable, content-addressed filesystem + JSON metadata (entrypoint, env, exposed ports).Disk (local cache) or a registry.
ContainerA running (or stopped) instance of an image, plus a writeable top layer and runtime config.The host where it was created.
RegistryA content-addressed store that serves images over HTTPS using the OCI distribution spec.A URL. Docker Hub, GHCR, ECR, a self-hosted Harbor, a local registry:2.

Think of it the way you'd think of executables: the image is the binary on disk, the container is the running process, and the registry is the package mirror that shipped you the binary.

OCI: the standard that makes it all interchangeable

The Open Container Initiative is the spec that makes "containers" a portable concept. Three documents matter:

If a tool says "OCI-compatible", it means: any registry can serve its images, any runtime can run them, and any build tool can produce them. This is why Podman can pull a Docker-built image from Docker Hub and run it under crun. They are not Docker images; they are OCI images that Docker also happens to produce.

Layers, union filesystems, and caching

An image is a stack of tarballs called layers. Each layer adds or removes files relative to the layer below it. At runtime, a union filesystem (usually overlayfs) merges them into a single view and adds one writeable layer on top — that layer is the container's ephemeral state.

┌──────────────────────────┐  writeable layer (container)
├──────────────────────────┤  COPY app.jar       (image layer, your code)
├──────────────────────────┤  RUN apk add curl   (image layer)
├──────────────────────────┤  FROM alpine:3.20   (image layer, base OS)
└──────────────────────────┘

Layers are content-addressed by the SHA-256 of their tarball, which means:

Rootless, and why it matters

"Rootless" means the container runtime itself, and the container processes, run as an unprivileged user on the host. It uses user namespaces to give the container an ID range that looks like root inside but is a regular user outside:

# Inside the container:
$ id
uid=0(root) gid=0(root)

# On the host, the same process:
$ ps -eo user,pid,cmd | grep myapp
alice   31234   /usr/bin/myapp    # not root on the host

The mapping is configured in /etc/subuid and /etc/subgid. A user gets an allocated range of sub-UIDs (typically 65,536 per user); container UID 0 maps to, say, host UID 100000.

Why you want this. If a container process escapes its namespaces, it lands as an unprivileged host user, not as host root. That is a meaningful defence in depth and it costs you almost nothing. Podman is rootless by default. Docker has a rootless mode; use it unless you have a specific reason not to.

The security boundary (honestly)

A shared kernel is a shared attack surface. A kernel bug that lets an in-container process break isolation affects every container on the host. That is not hypothetical — it happens — and the mitigations are layered:

Containers do not replace VMs for hostile multi-tenancy. If you are running untrusted customer code, use a VM boundary (Firecracker, Kata Containers, a real hypervisor). Containers are great for isolating your own software from itself.

Digests vs tags: latest is a lie

A tag (myimage:1.2.3) is a mutable pointer. The registry owner can repoint it tomorrow. A digest (myimage@sha256:abc123…) is the image's content hash — it cannot be changed without changing the digest.

# Get the current digest for a tag
docker buildx imagetools inspect nginx:1.27-alpine

# Or after pulling:
docker inspect --format '{{index .RepoDigests 0}}' nginx:1.27-alpine

# Pin by digest in production:
# FROM nginx@sha256:1234abcd...

A minimal Dockerfile and the rules it obeys

# Pin a specific base by tag; pin by digest for prod.
FROM python:3.12-slim-bookworm

# Create a non-root user early so subsequent steps can chown to it.
RUN groupadd --system app && useradd --system --gid app --home /app app

WORKDIR /app

# Dependencies first — they change least often, maximising layer cache hits.
COPY --chown=app:app requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Source code last — it changes on every build.
COPY --chown=app:app src/ ./src/

USER app
EXPOSE 8080

# Exec form (no shell) is required for correct signal handling.
ENTRYPOINT ["python", "-m", "src.app"]
CMD ["--port", "8080"]

The rules this Dockerfile follows:

Multi-stage builds

Multi-stage builds let you use a heavy toolchain to produce a binary, then copy only the binary into a small runtime image. The intermediate layers never ship to production.

# --- build stage ---------------------------------------------------
FROM golang:1.22-alpine AS build
WORKDIR /src
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 go build -ldflags="-s -w" -o /out/app ./cmd/app

# --- runtime stage -------------------------------------------------
FROM gcr.io/distroless/static:nonroot
COPY --from=build /out/app /app
USER nonroot:nonroot
ENTRYPOINT ["/app"]

The resulting image is the Go binary plus the distroless base (~2 MB). The Go toolchain, module cache, and source tree never leave the build stage. This is the pattern for Go, Rust, Java (build on JDK, run on JRE), C/C++, and most compiled languages.

Target a stage. docker build --target build . stops at the build stage — useful for running tests against the build image in CI without rebuilding.

Docker and Podman side-by-side

Podman is a near drop-in replacement for the Docker CLI. Most commands are identical; the defaults differ (Podman is rootless and daemonless).

TaskDockerPodman
Run an image interactivelydocker run --rm -it alpine shpodman run --rm -it alpine sh
Build from a Dockerfiledocker build -t myapp .podman build -t myapp .
List running containersdocker pspodman ps
List imagesdocker imagespodman images
Pull by digestdocker pull nginx@sha256:…podman pull nginx@sha256:…
Log in to a registrydocker login ghcr.iopodman login ghcr.io
Exec into a containerdocker exec -it web shpodman exec -it web sh
Stop and removedocker rm -f webpodman rm -f web
Inspect JSONdocker inspect webpodman inspect web
View a container's resourcesdocker statspodman stats
Map a port-p 8080:80-p 8080:80 (rootless can't bind <1024 without capabilities)
Generate systemd units(third party)podman generate systemd, or Quadlet
Auto-update(Watchtower, third party)podman auto-update + label

The two big behavioural differences:

Next up: Podman basics for the daily workflow, Docker Compose for multi-container local dev, and Kubernetes Light when one host isn't enough.