Match capacity to demand automatically — add pod replicas under load with the Horizontal Pod Autoscaler, right-size requests with the VPA, and grow the cluster itself with the Cluster Autoscaler.
Why: scaling happens at three levels and they solve different problems. The Horizontal Pod Autoscaler (HPA) adds or removes pod replicas as load changes. The Vertical Pod Autoscaler (VPA) adjusts each pod's CPU/memory requests to fit reality. The Cluster Autoscaler adds or removes whole nodes when pods cannot fit. You usually run the HPA always, and the cluster autoscaler on a cloud cluster.
HPA ──▶ more / fewer POD replicas (handle changing traffic)
VPA ──▶ right-size each pod's REQUESTS (stop over/under-provisioning)
Cluster ──▶ more / fewer NODES (make room when pods don't fit)Why: the HPA watches a metric — most often CPU — and changes the replica count to keep it near a target. Here it holds average CPU at 50% of requested, scaling between 2 and 10 pods. Note: it needs the metrics-server installed and the Deployment must declare CPU requests, or the HPA has nothing to measure against.
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: web
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50Apply the HPA, then generate load and watch it scale up; stop the load and it scales back down after a cool-off. The TARGETS column shows current vs. target utilization, and REPLICAS shows it adjusting in real time.
kubectl apply -f hpa.yamlWatch it adjust replicas as load changes
kubectl get hpa web --watchIn another terminal, hammer the Service to drive CPU up
kubectl run load --rm -it --image=busybox -- \
sh -c "while true; do wget -qO- http://web; done"Why: the HPA changes how MANY pods; the VPA changes how BIG they are, recommending or applying better CPU/memory requests so you stop guessing. The Cluster Autoscaler runs on the cloud provider — when pods stay Pending because no node has room, it adds a node; when nodes sit idle, it removes them. Note: do not run the VPA and HPA on the same CPU metric — they fight.
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: web
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: web
updatePolicy:
updateMode: "Auto" # apply right-sized requests automatically