Operations

Audit log

Every decision emits one JSON line on stdout (log/slog):

certificate_issued — the matched rule, principals, key ID, TTL, the client key fingerprint, and well-known identity claims (repository, ref, run_id, …).
certificate_denied — a stable machine-readable reason code plus a human-readable detail, with the same identity attributes when the token was verified.

Both carry the request_id returned to the caller, so a support request (“my deploy was denied, request_id …”) maps to exactly one audit event. The key ID embeds the repository / run ID, so an sshd log entry on a target server can be traced back to the exact GitHub Actions run.

Deny reason codes:

Reason	Meaning
`bad_request`	malformed body, body too large, or wrong method
`invalid_public_key`	key unparsable, wrong type, or a certificate
`missing_token`	no usable `Authorization: Bearer` header
`token_invalid`	JWT verification failed (signature, issuer, expiry, …)
`no_rule_matched`	deny by default: nothing matched
`multiple_rules_matched`	exactly-one-match violated; the detail lists the rules
`key_id_invalid`	key ID expansion failed (missing claim, bad characters, too long)
`policy_disabled`	emergency stop is active
`signing_error`	internal signing failure

Where the log ends up: journald (systemd), docker compose logs (Compose), CloudWatch Logs (Lambda), Cloud Logging (Cloud Run).

Policy reload

SIGHUP reloads the policy file. If the new file is invalid, the server keeps the current policy and logs an error — a broken reload neither stops nor loosens issuance.

systemctl reload oidc-ssh-ca                  # systemd
docker compose kill -s HUP oidc-ssh-ca        # docker compose
kill -HUP <pid>                               # anywhere else

Lambda and Cloud Run have no reload; deploying a new zip / revision is the equivalent, with the same fail-safe (a bad policy fails the new instance, the old one keeps serving).

Emergency stop

To stop all issuance immediately:

Set disabled: true at the top of policy.yaml and reload. The server answers 503 to every request while staying up. (Platform shortcuts: reserved concurrency 0 on Lambda; removing the allUsers invoker binding on Cloud Run.)
Wait out defaults.max_valid_for_seconds (default 900 s / 15 minutes). After that, no valid certificate exists anywhere — there is nothing to revoke.
Only if the CA key itself may have leaked: remove the CA public key from TrustedUserCAKeys on the target servers and rotate the key.

CA key rotation

TrustedUserCAKeys may list multiple keys, so rotation is zero-downtime:

Generate the new CA key; append its public key to the target servers’ TrustedUserCAKeys file (both keys are now trusted).
Swap the key on the CA and restart.
After the old certificates’ TTL has passed, remove the old public key from the servers.