Make your app survive traffic spikes and failures: define a VM Scale Set that grows and shrinks automatically with autoscale rules, and spread traffic across instances with a load balancer and health probes.
One VM cannot handle a traffic spike or survive its own failure. The fix: run several identical VMs that a load balancer spreads traffic across, and let a Scale Set keep the right number healthy. Why: your app stays up when a VM dies and grows when demand rises — automatically.
This lesson builds, in order: 1. a VM Scale Set (a fleet of identical VMs from one model) 2. a Load Balancer (created automatically with the scale set) 3. Autoscale rules (change the VM count based on load)
echo "Reuse learn-rg and my-vnet from earlier lessons."A Virtual Machine Scale Set (VMSS) launches and manages a group of identical VMs from one definition. Why: it is the unit of scaling — you set a capacity and Azure keeps that many running, replacing any that fail. Creating one with a public load balancer wires the front door at the same time.
Create a scale set of 2 VMs behind a new load balancer
az vmss create \
--resource-group learn-rg \
--name web-vmss \
--image Ubuntu2204 \
--vm-sku Standard_B1s \
--instance-count 2 \
--vnet-name my-vnet --subnet public \
--admin-username azureuser --generate-ssh-keys \
--upgrade-policy-mode automaticThe load balancer gives clients one address and spreads requests across healthy instances. A health probe pings each instance; if one stops replying it is pulled from rotation. Why: clients never hit a broken VM, and you can replace instances without anyone noticing.
Add a health probe on port 80
az network lb probe create --resource-group learn-rg \
--lb-name web-vmssLB --name http-probe \
--protocol Http --port 80 --path /Add a rule forwarding inbound port 80 to the backend pool, using the probe
az network lb rule create --resource-group learn-rg \
--lb-name web-vmssLB --name http-rule \
--protocol Tcp --frontend-port 80 --backend-port 80 \
--probe-name http-probe \
--backend-pool-name web-vmssLBBEPoolAn autoscale setting changes the instance count based on a metric. You set a min, a max, and rules: add a VM when average CPU is high, remove one when it is low. Why: you pay for capacity only when traffic justifies it, and the app keeps up during spikes.
Define the min/max/default for the scale set
az monitor autoscale create --resource-group learn-rg \
--resource web-vmss --resource-type Microsoft.Compute/virtualMachineScaleSets \
--name web-autoscale --min-count 2 --max-count 6 --count 2Scale OUT by 1 when average CPU exceeds 70%
az monitor autoscale rule create --resource-group learn-rg \
--autoscale-name web-autoscale \
--condition "Percentage CPU > 70 avg 5m" --scale out 1Scale IN by 1 when average CPU drops below 30%
az monitor autoscale rule create --resource-group learn-rg \
--autoscale-name web-autoscale \
--condition "Percentage CPU < 30 avg 5m" --scale in 1