Kubernetes Light

The Kubernetes you actually use day-to-day: kubectl, the three workload kinds, the four service types, probes that don't lie, requests vs limits, ConfigMaps, Secrets, Ingress, and a debug checklist for pods that won't start.

If you only remember six things

A Pod is the unit of scheduling. Almost nothing in Kubernetes creates bare pods — they are created by a controller (Deployment, StatefulSet, DaemonSet, Job).
readiness controls traffic; liveness controls restarts; startup controls the initial grace window. Wire them up separately — not all three to the same endpoint.
requests schedule the pod; limits cap it at runtime. If they are equal the pod is Guaranteed QoS and won't be evicted under pressure.
Secrets are not encrypted by default — they are base64-encoded on etcd. Turn on encryption-at-rest, or integrate an external secrets manager.
Memorise one Deployment YAML, one Service, and one ConfigMap. 80% of what you write is variation on those three.
When a pod is broken, the debug order is: kubectl get pods → describe → logs (and --previous) → events → exec. Never start with exec.

On this page

kubectl: the daily commands
Deployments vs StatefulSets vs DaemonSets
Services: ClusterIP, NodePort, LoadBalancer, headless
Probes: liveness, readiness, startup
Resources: requests, limits, QoS
ConfigMaps and Secrets
Ingress basics
The minimum YAML you should memorise
Debug checklist when a pod won't start

kubectl: the daily commands

# Context / namespace
kubectl config get-contexts
kubectl config use-context prod
kubectl config set-context --current --namespace=app

# What's running
kubectl get pods -o wide                      # across one namespace
kubectl get pods -A                           # all namespaces
kubectl get deploy,svc,ingress                # multi-kind list
kubectl get events --sort-by=.lastTimestamp   # what just happened

# Drill into one pod
kubectl describe pod web-7c8f-xyz
kubectl logs web-7c8f-xyz
kubectl logs web-7c8f-xyz -c sidecar          # specific container
kubectl logs web-7c8f-xyz --previous          # previous container (CrashLoopBackOff)
kubectl logs -l app=web --tail=50 --all-containers

# Exec for real debugging
kubectl exec -it web-7c8f-xyz -- sh
kubectl cp web-7c8f-xyz:/var/log/app.log ./app.log

# Port-forward to a service or a specific pod
kubectl port-forward svc/web 8080:80
kubectl port-forward pod/db-0 5432:5432

# Apply / delete
kubectl apply -f ./manifests/
kubectl apply -k ./overlays/prod/             # kustomize
kubectl delete -f ./manifests/deploy.yaml
kubectl rollout restart deploy/web
kubectl rollout status deploy/web
kubectl rollout undo deploy/web --to-revision=3

Install kubectx/kubens. Switching context and namespace with one keystroke removes 90% of the "oh no I ran it against prod" moments.

Deployments vs StatefulSets vs DaemonSets

Controller	What it's for	Pod identity	Storage	When to reach for it
Deployment	Stateless replicas that are interchangeable	Random: `web-7c8f-abc12`	Shared or none	Web apps, APIs, workers. The default.
StatefulSet	Pods that need stable identity or per-pod storage	Ordered: `db-0`, `db-1`, `db-2`	One PVC per replica, retained across restarts	Databases, message brokers, anything that writes to disk and clusters.
DaemonSet	One pod per node	Node-scoped: `fluent-bit-<node>`	Usually `hostPath`	Log collectors, metrics agents, CNI plugins, node-level storage.
Job / CronJob	Run to completion (once, or on a schedule)	Disposable	Ephemeral	Migrations, backups, scheduled reports.

Rules of thumb:

Default to a Deployment. Only escalate to StatefulSet if the app actually needs stable names or per-pod PVCs.
"We want DNS records per replica" is a StatefulSet + headless Service.
"Exactly one per node" is a DaemonSet; tolerations and nodeSelector/nodeAffinity control which nodes.
A CronJob is a Job with a schedule. Set concurrencyPolicy: Forbid unless overlapping runs are actually safe.

Services: ClusterIP, NodePort, LoadBalancer, headless

Type	What it does	Reached from
`ClusterIP` (default)	Virtual IP inside the cluster, round-robins over pod endpoints	Inside the cluster only
`NodePort`	Same, plus opens a fixed high port on every node	Anything that can reach a node IP + port
`LoadBalancer`	ClusterIP + asks the cloud provider for a real LB with an external IP	Internet (or your load balancer's network)
Headless (`clusterIP: None`)	No VIP; DNS returns one A record per ready pod	Clients that need to talk to specific pods (StatefulSet members)
`ExternalName`	DNS CNAME to an external hostname	Ingress from inside to outside without hardcoded URLs

You rarely create a LoadBalancer per app in production — you front everything through one Ingress controller (itself a LoadBalancer) and route by hostname/path.

Probes: liveness, readiness, startup

Three different questions, three different probes:

readinessProbe — "should this pod receive traffic?" Controls Service endpoint membership. A pod can be running but not ready (JVM warming, DB migration running) — don't send it requests yet.
livenessProbe — "is this pod healthy enough to keep alive?" Failure → restart the container. Use sparingly; a liveness probe that flaps will make a running app restart-loop under load.
startupProbe — "has this pod finished starting up?" Until it passes, liveness/readiness don't run. Use for slow-starting apps (legacy Java, anything doing schema migrations on boot).

startupProbe:
  httpGet: { path: /healthz, port: 8080 }
  failureThreshold: 30
  periodSeconds: 5        # up to 150s grace before liveness kicks in

readinessProbe:
  httpGet: { path: /ready, port: 8080 }    # should check DB/cache reachability
  periodSeconds: 5
  failureThreshold: 3

livenessProbe:
  httpGet: { path: /livez, port: 8080 }    # cheap: "am I deadlocked?"
  periodSeconds: 10
  failureThreshold: 3

Probes that lie. A liveness probe that checks DB connectivity will restart your pod every time the DB blinks, cascading the outage. Keep /livez local and cheap. Keep /ready honest: it should go false when the DB is unreachable so traffic stops going to that pod.

Resources: requests, limits, QoS

resources:
  requests:
    cpu: "250m"          # 0.25 of a core; used by the scheduler
    memory: "256Mi"
  limits:
    cpu: "1"             # cgroup cap; throttled past this
    memory: "512Mi"      # OOM-killed past this

requests are the scheduling contract. The scheduler picks a node with at least this much free. The pod doesn't get this amount immediately — it gets its fair share, but it is guaranteed not to be crowded off.
limits are the runtime cap. CPU above limit is throttled (latency spikes); memory above limit is OOM-killed (container dies).
QoS classes are derived:
- Guaranteed — requests == limits on every container. Last to be evicted.
- Burstable — requests < limits. Evicted before Guaranteed under pressure.
- BestEffort — no requests or limits. First to die.

Memory: request == limit. Memory is not compressible, so bursting over request doesn't help — it just makes you OOM-kill prone. Setting them equal gives predictable behaviour and Guaranteed QoS. CPU is compressible, so request < limit is fine.

ConfigMaps and Secrets

apiVersion: v1
kind: ConfigMap
metadata: { name: app-config }
data:
  LOG_LEVEL: info
  app.yaml: |
    server:
      port: 8080
---
apiVersion: v1
kind: Secret
metadata: { name: db-creds }
type: Opaque
stringData:
  DB_USER: app
  DB_PASS: changeme

spec:
  containers:
    - name: app
      envFrom:
        - configMapRef: { name: app-config }
        - secretRef:    { name: db-creds }
      volumeMounts:
        - { name: config, mountPath: /etc/app }
  volumes:
    - name: config
      configMap: { name: app-config, items: [{ key: app.yaml, path: app.yaml }] }

Secrets are base64, not encryption. Anyone with get secret in the namespace, or direct read on etcd, has the plaintext. Mitigations, in order of strength: enable encryption-at-rest for the secrets resource; use the External Secrets Operator or CSI Secret Store with a real secrets manager; use Vault with the Kubernetes auth method and short-lived dynamic credentials.

Ingress basics

An Ingress is a declarative HTTP router. It does nothing on its own — it's interpreted by an Ingress controller (ingress-nginx, Traefik, HAProxy, Envoy-based like Contour or the Gateway API). Install one; then Ingress objects work.

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: web
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt
spec:
  ingressClassName: nginx
  tls:
    - hosts: [app.example.com]
      secretName: app-tls
  rules:
    - host: app.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: web
                port: { number: 80 }

For anything more ambitious than path/host routing (traffic splitting, header-based routing, gRPC), look at the Gateway API. It is the successor to Ingress and is GA in modern clusters.

The minimum YAML you should memorise

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web
  labels: { app: web }
spec:
  replicas: 3
  selector:
    matchLabels: { app: web }
  template:
    metadata:
      labels: { app: web }
    spec:
      containers:
        - name: web
          image: ghcr.io/example/web@sha256:abc123...
          ports: [{ containerPort: 8080 }]
          resources:
            requests: { cpu: 100m, memory: 128Mi }
            limits:   { cpu: 500m, memory: 256Mi }
          readinessProbe:
            httpGet: { path: /ready, port: 8080 }
          livenessProbe:
            httpGet: { path: /livez, port: 8080 }
          envFrom:
            - configMapRef: { name: web-config }
            - secretRef:    { name: web-secrets }
---
apiVersion: v1
kind: Service
metadata:
  name: web
spec:
  selector: { app: web }
  ports:
    - { port: 80, targetPort: 8080 }

If you can write that from memory, you can operate most Kubernetes workloads. Everything else is variation.

Debug checklist when a pod won't start

kubectl get pods — what does the STATUS column say?
- Pending — scheduler hasn't placed it. Usually insufficient resources or a nodeSelector nothing matches.
- ContainerCreating — stuck pulling the image, mounting a volume, or waiting on a webhook.
- CrashLoopBackOff — started and died repeatedly. Check logs --previous.
- ImagePullBackOff — can't pull. Wrong registry, missing imagePullSecret, rate-limited by Docker Hub.
- Error / OOMKilled — inspect with describe for last exit code and reason.
kubectl describe pod <name> — the Events section at the bottom tells you 80% of the time: "FailedScheduling: 0/3 nodes are available: insufficient memory", "MountVolume.SetUp failed for volume …", "Readiness probe failed: HTTP 500".
kubectl logs <pod> — current container.
kubectl logs <pod> --previous — the container that just crashed; this is where the real error lives in a CrashLoopBackOff.
kubectl get events --sort-by=.lastTimestamp -n <ns> — cluster-wide view when an image pull failed silently or a webhook rejected the object.
kubectl exec -it <pod> -- sh — last resort. Only useful once the pod stays up long enough. For a crashing pod, launch an ephemeral container with kubectl debug.

kubectl debug -it web-7c8f-xyz \
  --image=busybox:1.36 --target=web -- sh

Once the pod is up, you don't debug from a running prod pod — you capture artefacts (kubectl cp, kubectl logs) and reproduce locally with kind or a dev namespace.

Deliberately skipped here: Helm, operators, Istio, ArgoCD. Those are real but not "light". Start with raw YAML (or Kustomize) until you feel the pain they solve.