See what your infrastructure is doing: read built-in metrics, collect and query logs in a Log Analytics workspace, and raise alerts with action groups when something crosses a threshold.
Azure Monitor is the eyes and ears of Azure: metrics (numbers over time, like CPU), logs (detailed records you query), and alerts (notifications when something crosses a line). Why: without it you are flying blind — you cannot fix what you cannot see.
List the metric names available for a resource (here, a VM)
az monitor metrics list-definitions \
--resource /subscriptions/SUB_ID/resourceGroups/learn-rg/providers/Microsoft.Compute/virtualMachines/my-vm \
--query '[].name.value' --output tsv | headA metric is a time series — CPU %, disk IO, request count. Azure publishes many automatically for every resource. Why read them: they tell you whether a VM is overloaded or a database is saturated. You can chart them or query exact values.
Read average CPU of a VM over the last hour
az monitor metrics list \
--resource /subscriptions/SUB_ID/resourceGroups/learn-rg/providers/Microsoft.Compute/virtualMachines/my-vm \
--metric "Percentage CPU" \
--interval PT5M --aggregation Average --output tableA Log Analytics workspace is where logs from your resources are collected and queried with KQL (Kusto Query Language). Why: it is the one place to search what actually happened across VMs, apps, and services — far more powerful than reading log files on each server.
Create a workspace
az monitor log-analytics workspace create \
--resource-group learn-rg --workspace-name learn-logsRun a KQL query (e.g. recent heartbeat records from connected machines)
WS_ID=$(az monitor log-analytics workspace show \
--resource-group learn-rg --workspace-name learn-logs \
--query customerId --output tsv)az monitor log-analytics query --workspace $WS_ID \
--analytics-query "Heartbeat | take 10"An action group is a reusable list of who/what to notify when an alert fires — email, SMS, a webhook, or a function. Why separate from the alert: you define "the on-call team" once and attach it to many alerts, so changing the recipients is a one-place edit.
Create an action group that emails the ops team
az monitor action-group create \
--resource-group learn-rg --name ops-team \
--short-name ops \
--action email oncall ops@example.comA metric alert watches a metric and fires (notifying its action group) when it crosses a threshold for a set time. Why: you find out about high CPU or errors before users complain. The condition uses the same metric names you listed earlier.
Alert when a VM's average CPU stays above 80% for 5 minutes
az monitor metrics alert create \
--resource-group learn-rg --name high-cpu \
--scopes /subscriptions/SUB_ID/resourceGroups/learn-rg/providers/Microsoft.Compute/virtualMachines/my-vm \
--condition "avg Percentage CPU > 80" \
--window-size 5m --evaluation-frequency 1m \
--action ops-team