← Azure Course9 / 13

Azure Monitor — Metrics, Logs & Alerts

See what your infrastructure is doing: read built-in metrics, collect and query logs in a Log Analytics workspace, and raise alerts with action groups when something crosses a threshold.

Ad 728×90

What Azure Monitor covers

Azure Monitor is the eyes and ears of Azure: metrics (numbers over time, like CPU), logs (detailed records you query), and alerts (notifications when something crosses a line). Why: without it you are flying blind — you cannot fix what you cannot see.

List the metric names available for a resource (here, a VM)

az monitor metrics list-definitions \
  --resource /subscriptions/SUB_ID/resourceGroups/learn-rg/providers/Microsoft.Compute/virtualMachines/my-vm \
  --query '[].name.value' --output tsv | head

Metrics — numbers over time

A metric is a time series — CPU %, disk IO, request count. Azure publishes many automatically for every resource. Why read them: they tell you whether a VM is overloaded or a database is saturated. You can chart them or query exact values.

Read average CPU of a VM over the last hour

az monitor metrics list \
  --resource /subscriptions/SUB_ID/resourceGroups/learn-rg/providers/Microsoft.Compute/virtualMachines/my-vm \
  --metric "Percentage CPU" \
  --interval PT5M --aggregation Average --output table

Log Analytics — collect and query logs

A Log Analytics workspace is where logs from your resources are collected and queried with KQL (Kusto Query Language). Why: it is the one place to search what actually happened across VMs, apps, and services — far more powerful than reading log files on each server.

Create a workspace

az monitor log-analytics workspace create \
  --resource-group learn-rg --workspace-name learn-logs

Run a KQL query (e.g. recent heartbeat records from connected machines)

WS_ID=$(az monitor log-analytics workspace show \
  --resource-group learn-rg --workspace-name learn-logs \
  --query customerId --output tsv)

az monitor log-analytics query --workspace $WS_ID \
  --analytics-query "Heartbeat | take 10"

Action groups — who gets notified

An action group is a reusable list of who/what to notify when an alert fires — email, SMS, a webhook, or a function. Why separate from the alert: you define "the on-call team" once and attach it to many alerts, so changing the recipients is a one-place edit.

Create an action group that emails the ops team

az monitor action-group create \
  --resource-group learn-rg --name ops-team \
  --short-name ops \
  --action email oncall ops@example.com

Alert rules — get told when something is wrong

A metric alert watches a metric and fires (notifying its action group) when it crosses a threshold for a set time. Why: you find out about high CPU or errors before users complain. The condition uses the same metric names you listed earlier.

Alert when a VM's average CPU stays above 80% for 5 minutes

az monitor metrics alert create \
  --resource-group learn-rg --name high-cpu \
  --scopes /subscriptions/SUB_ID/resourceGroups/learn-rg/providers/Microsoft.Compute/virtualMachines/my-vm \
  --condition "avg Percentage CPU > 80" \
  --window-size 5m --evaluation-frequency 1m \
  --action ops-team