SSH Certificate Authorities
authorized_keys problem
- Onboarding: copy a new pubkey to every host the user needs.
- Offboarding: find and remove it from every host — and you will miss one.
- Rotation: "every engineer please re-issue your key" is a three-week ticket.
- Host trust-on-first-use: every engineer blindly types "yes" on first connect.
Why SSH certs
An SSH certificate is a public key plus metadata (principals, validity window, options, serial number) signed by a CA key. OpenSSH has supported certificates since 5.4 (2010), but most shops still do authorized_keys because the docs are thin. It is worth the investment — three changes to sshd_config and one helper script replace your entire key-distribution problem.
| Problem | authorized_keys | SSH CA |
|---|---|---|
| Add a user | Push key to N hosts | Sign one cert; user presents it |
| Remove a user | Remove key from N hosts, pray | Let their cert expire (minutes), or publish KRL |
| Time-bound access | Cron-remove the key? | Built in: -V +8h |
| Per-command restriction | Prefix in authorized_keys, per host | Options baked into the cert |
| Host-identity spoof | TOFU: users say yes | Host cert signed by host CA; clients trust CA once |
Anatomy of an SSH certificate
ssh-keygen -L -f alice-cert.pub
# alice-cert.pub:
# Type: ssh-ed25519-cert-v01@openssh.com user certificate
# Public key: ED25519-CERT SHA256:8j...
# Signing CA: ED25519 SHA256:XY... (using ssh-ed25519)
# Key ID: "alice@example.com"
# Serial: 4712
# Valid: from 2026-04-23T10:00:00 to 2026-04-23T18:00:00
# Principals:
# alice
# ops
# Critical Options:
# source-address 10.10.0.0/16,192.0.2.5
# Extensions:
# permit-pty
# permit-user-rc
Key fields:
- Type:
userorhost. sshd treats them differently; you cannot use a user cert to authenticate a host or vice versa. - Key ID: a label for your logs. Use the user's email or the workflow's name. Shows up in
LogLevel VERBOSEserver logs. - Serial: an integer you chose at signing time. Essential for KRL revocation.
- Valid from/to: issue-time and expiry. Minutes-to-hours for user certs, days-to-months for host certs.
- Principals: the list of usernames (user cert) or hostnames (host cert) this certificate is valid for.
- Critical Options / Extensions: what the cert is allowed to do (
permit-pty,no-port-forwarding,force-command=...,source-address=...).
User CA: clients trust it, servers verify
Users present certificates; servers verify them against a CA key they trust.
1. Generate the user CA (once, offline-ish)
ssh-keygen -t ed25519 -f /etc/ssh/ca/users_ca -C "User CA (do not copy private key)"
chmod 400 /etc/ssh/ca/users_ca
# Public half is /etc/ssh/ca/users_ca.pub — that is what you distribute.
Keep the private key on a signing host (or better: in Vault / an HSM / a YubiKey). It never touches user workstations.
2. Every server trusts it
Put the CA public key on every host and add one line to sshd_config:
install -m 0644 /etc/ssh/ca/users_ca.pub /etc/ssh/ca_users.pub
# /etc/ssh/sshd_config
TrustedUserCAKeys /etc/ssh/ca_users.pub
# Optional, for principal-based access control:
AuthorizedPrincipalsFile /etc/ssh/auth_principals/%u
Reload sshd (systemctl reload sshd). Roll this out with Ansible.
3. Sign a user certificate
Alice generates her own keypair (the private key never leaves her laptop). She sends her public key to the signing service, which returns a cert:
ssh-keygen -s /etc/ssh/ca/users_ca \
-I "alice@example.com" \
-n alice,ops \
-V +8h \
-z 4712 \
alice.pub
# -> writes alice-cert.pub next to alice.pub
Alice now has alice (her pubkey), alice-cert.pub (the certificate), and her existing ~/.ssh/. She runs ssh-add ~/.ssh/alice — ssh-agent picks up the cert automatically because it sits next to the key.
Flags:
-s CAKEY— CA private key to sign with.-I ID— Key ID (what shows up in logs).-n user1,user2— Principals: usernames the cert can authenticate as.-V +8h— Validity window.+8h,20260423000000:20260430235959,-5m:+1dare all valid.-z SERIAL— Certificate serial number. Required for KRL revocation. Use a monotonic counter.-O OPTION— Pile onsource-address=...,force-command=...,no-port-forwarding,no-pty.
4. Principal-based access
A cert with principals alice,ops will authenticate as either user if the target host allows it. Per-host control is AuthorizedPrincipalsFile:
# /etc/ssh/auth_principals/root
admins
ops
# /etc/ssh/auth_principals/alice
alice
ops
Now anyone with principal admins or ops can ssh root@host; anyone with alice or ops can ssh alice@host. Access control becomes "what's in the cert's principal list, matched against auth_principals/<user>". Manage those files via config management, not per-cert.
Host CA: servers present, clients verify
Inverse direction. A host CA signs each server's host key. Clients trust the host CA; the TOFU prompt never appears.
1. Generate the host CA
ssh-keygen -t ed25519 -f /etc/ssh/ca/hosts_ca -C "Host CA"
chmod 400 /etc/ssh/ca/hosts_ca
2. Sign each host's key
On the signing host, with the host's own ssh_host_ed25519_key.pub:
ssh-keygen -s /etc/ssh/ca/hosts_ca \
-I "web-01.prod" \
-h \
-n web-01.prod.example.internal,web-01,10.2.3.4 \
-V +52w \
-z 1001 \
ssh_host_ed25519_key.pub
# writes ssh_host_ed25519_key-cert.pub
-h makes it a host certificate. Principals are the DNS names and IPs clients might use.
Drop the cert next to the host key and point sshd at it:
# /etc/ssh/sshd_config
HostKey /etc/ssh/ssh_host_ed25519_key
HostCertificate /etc/ssh/ssh_host_ed25519_key-cert.pub
3. Clients trust the host CA
In ~/.ssh/known_hosts (or /etc/ssh/ssh_known_hosts for the whole machine):
@cert-authority *.prod.example.internal,*.stage.example.internal ssh-ed25519 AAAAC3Nza... HostCA
That one line replaces every host-key fingerprint for matching names. New hosts "just work" the moment they have a cert. Re-imaging a host does not trigger a "REMOTE HOST IDENTIFICATION HAS CHANGED!" warning because the CA identity is what's trusted.
Principals, serials, options
Principals in practice
- User certs: principals = target usernames. Use team names (
ops,dba) plus the person's login, not just the login. - Host certs: principals = valid names/IPs. Include all DNS names and commonly-used IPs; the client must match one.
Options
ssh-keygen -s users_ca -I "ci-deploy@example" \
-n deploy \
-V +15m \
-O force-command="/opt/deploy/run.sh" \
-O no-port-forwarding \
-O no-agent-forwarding \
-O no-pty \
-O source-address="10.20.0.0/16" \
ci_deploy.pub
This cert can be presented from the CI subnet only, can only run /opt/deploy/run.sh, cannot open a shell, expires in 15 minutes. All enforced by sshd, no per-host config.
Serials
Every signed cert should have a unique serial. Keep a counter on the signing service; persist it. Serials are the only way to revoke individual certs with a KRL; without them you can only revoke a whole CA key.
Short-lived certs via Vault or a signing service
Signing on a dedicated machine is fine for a lab; for production you want an API: the user proves who they are (SSO, OIDC, SAML) and gets back a cert with the right principals and a short TTL.
HashiCorp Vault ssh secrets engine
# One-time setup on Vault
vault secrets enable -path=ssh-client-signer ssh
vault write ssh-client-signer/config/ca generate_signing_key=true
# Publish the CA pubkey to every host's sshd as TrustedUserCAKeys
vault read -field=public_key ssh-client-signer/config/ca \
> /etc/ssh/ca_users.pub
# Define a role: max TTL, allowed principals, options
vault write ssh-client-signer/roles/ops -<<'EOF'
{
"allow_user_certificates": true,
"allowed_users": "*",
"allowed_extensions": "permit-pty,permit-port-forwarding",
"default_extensions": { "permit-pty": "" },
"key_type": "ca",
"default_user": "ops",
"ttl": "8h",
"max_ttl": "24h"
}
EOF
Users authenticate to Vault (OIDC usually), then:
vault write -field=signed_key ssh-client-signer/sign/ops \
public_key=@$HOME/.ssh/id_ed25519.pub \
valid_principals=alice,ops \
> $HOME/.ssh/id_ed25519-cert.pub
A helper wraps it: vault ssh -role=ops -mode=ca user@host signs and exec's ssh in one go.
Rolling your own
For lightweight setups, a small HTTPS service behind your SSO works: it authenticates the user, maps their identity to principals, invokes ssh-keygen -s, returns the signed cert. Pin the CA key in a smartcard slot; log every issuance with serial, key-id, principals, and requester IP.
Revocation with KRLs
A Key Revocation List (KRL) is a binary file listing revoked key fingerprints and/or certificate serials. sshd consults it before accepting a cert.
Building a KRL
# Input file lists what to revoke (plain text)
cat > /etc/ssh/krl.spec <<'EOF'
# Revoke by serial (range or single)
serial: 4712
serial: 5000-5010
# Revoke by cert Key ID
id: dave@example.com
# Revoke by raw key fingerprint
sha256: SHA256:QGc5iw...
EOF
# Generate / update the KRL
ssh-keygen -kf /etc/ssh/revoked_keys \
-s /etc/ssh/ca/users_ca \
/etc/ssh/krl.spec
# Update (merge new entries into an existing KRL)
ssh-keygen -u -kf /etc/ssh/revoked_keys \
-s /etc/ssh/ca/users_ca \
/etc/ssh/krl.spec
# /etc/ssh/sshd_config
RevokedKeys /etc/ssh/revoked_keys
Distribute /etc/ssh/revoked_keys via config management. sshd reloads it on SIGHUP. If the file is missing and configured, sshd refuses all logins — fail-closed. Ship the file before you ship the config line, or together.
Gotchas
- OpenSSH < 7.2 host certs don't support ed25519 CA keys. If you still have old clients, use
rsa-sha2-512on the CA. - Clock skew. A cert's
ValidAfteris slightly in the future on a host with a fast clock. Login fails with "certificate not yet valid". Run chrony (chrony). - Agent forwarding with short certs. Agent-forwarded identities keep the cert; if it expires mid-chain the next hop fails. Either refresh or make the jump box
-A-free and hop with per-hop certs. - Windows OpenSSH. Supported since Win10 1809. The signing workflow is identical; path separators differ.
- Mixing CA and
authorized_keys. Fine during migration. Plan to remove the legacy file — otherwise an old key keeps working past its revocation. - Host CA without updating all clients. The old host-key fingerprints in users'
known_hostswill conflict. Ship a scrub script: remove old entries, add the@cert-authorityline. - Principals vs Match users.
Match Userinsshd_configruns after authentication; principals happen during. Use principals for the coarse "who gets in as whom" decision. - Committing the CA private key to Ansible. Do not. Sign in a service; distribute only the public key through config management.
ssh -i cert.pub user@hostdoes not work.-itakes the private key; OpenSSH picks up the cert automatically if it is at<privkey>-cert.pub.
See also: SSH Keys, PKI Design, FreeIPA.