Kubernetes can only heal what it can measure. Teach it your app’s health with liveness, readiness, and startup probes so it restarts what is stuck and routes around what is not ready.
Why: Kubernetes does not know your app is healthy just because the process is running. Probes tell it. A liveness probe answers "is it stuck and needs a restart?". A readiness probe answers "can it take traffic right now?". A startup probe answers "has it finished booting?" and protects slow starters from the other two. Each is checked on a schedule.
liveness ──▶ fails ──▶ Kubernetes RESTARTS the container
readiness ──▶ fails ──▶ pod removed from Service endpoints (no traffic)
startup ──▶ until it passes, liveness & readiness are pausedWhy: a process can be running yet deadlocked — up but not working. A liveness probe periodically checks an endpoint; when it fails enough times, Kubernetes kills and restarts the container. initialDelaySeconds gives the app time to boot before the first check; periodSeconds sets the interval.
spec:
containers:
- name: app
image: myapp:1.0
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 10
periodSeconds: 10
failureThreshold: 3 # 3 strikes, then restartWhy: during startup, or while a dependency is down, your app is alive but should not receive requests. A readiness probe controls exactly that: while it fails, the pod is pulled from its Service's endpoints, so no traffic reaches it — without restarting it. This is what makes rolling updates actually zero-downtime.
spec:
containers:
- name: app
image: myapp:1.0
readinessProbe:
httpGet:
path: /ready
port: 8080
periodSeconds: 5 # checked often so traffic gates quicklyWhy: a JVM or a service that loads a large model can take a minute to start. A liveness probe with a short delay would keep killing it mid-boot. A startup probe runs first and, until it passes, suspends the other two — giving a slow app up to failureThreshold × periodSeconds to come up before normal probing begins.
spec:
containers:
- name: app
image: slow-starter:1.0
startupProbe:
httpGet:
path: /healthz
port: 8080
failureThreshold: 30 # 30 × 10s = up to 5 minutes to boot
periodSeconds: 10