← Kubernetes Course12 / 16

Autoscaling

Match capacity to demand automatically — add pod replicas under load with the Horizontal Pod Autoscaler, right-size requests with the VPA, and grow the cluster itself with the Cluster Autoscaler.

Ad 728×90

The three kinds of autoscaling

Why: scaling happens at three levels and they solve different problems. The Horizontal Pod Autoscaler (HPA) adds or removes pod replicas as load changes. The Vertical Pod Autoscaler (VPA) adjusts each pod's CPU/memory requests to fit reality. The Cluster Autoscaler adds or removes whole nodes when pods cannot fit. You usually run the HPA always, and the cluster autoscaler on a cloud cluster.

HPA      ──▶ more / fewer POD replicas      (handle changing traffic)
  VPA      ──▶ right-size each pod's REQUESTS  (stop over/under-provisioning)
  Cluster  ──▶ more / fewer NODES              (make room when pods don't fit)

Horizontal Pod Autoscaler

Why: the HPA watches a metric — most often CPU — and changes the replica count to keep it near a target. Here it holds average CPU at 50% of requested, scaling between 2 and 10 pods. Note: it needs the metrics-server installed and the Deployment must declare CPU requests, or the HPA has nothing to measure against.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 50

Watch the HPA react

Apply the HPA, then generate load and watch it scale up; stop the load and it scales back down after a cool-off. The TARGETS column shows current vs. target utilization, and REPLICAS shows it adjusting in real time.

kubectl apply -f hpa.yaml

Watch it adjust replicas as load changes

kubectl get hpa web --watch

In another terminal, hammer the Service to drive CPU up

kubectl run load --rm -it --image=busybox -- \
  sh -c "while true; do wget -qO- http://web; done"

Vertical and Cluster autoscaling

Why: the HPA changes how MANY pods; the VPA changes how BIG they are, recommending or applying better CPU/memory requests so you stop guessing. The Cluster Autoscaler runs on the cloud provider — when pods stay Pending because no node has room, it adds a node; when nodes sit idle, it removes them. Note: do not run the VPA and HPA on the same CPU metric — they fight.

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: web
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web
  updatePolicy:
    updateMode: "Auto"         # apply right-sized requests automatically