See what your infrastructure is doing: read metrics, search and filter logs across services with Cloud Logging, route logs with sinks, and raise alerts when something crosses a threshold.
Google Cloud's operations suite has two halves: Cloud Monitoring (metrics — numbers over time, plus alerts) and Cloud Logging (the searchable record of what happened). Why: without them you are flying blind — you cannot fix what you cannot see.
Logging and Monitoring are on by default. List recent log entries:
gcloud logging read "severity>=WARNING" --limit 10 --freshness 1hCloud Logging collects logs from every service into one place you query with a filter language. Why: instead of SSHing into each VM, you search all logs at once — "errors from this VM in the last hour" — with a single query.
Find errors from one VM in the last hour
gcloud logging read \
'resource.type="gce_instance" AND severity>=ERROR' \
--limit 20 --freshness 1h \
--format "table(timestamp, resource.labels.instance_id, textPayload)"Stream logs live as they arrive
gcloud logging tail 'resource.type="cloud_run_revision"'A log sink exports matching logs to a destination — a Cloud Storage bucket (cheap archive), BigQuery (analysis), or Pub/Sub (real-time processing). Why: you keep logs long-term for compliance, or feed them into dashboards and alerts beyond the default retention.
Archive all WARNING+ logs to a Cloud Storage bucket
gcloud logging sinks create warn-archive \
storage.googleapis.com/learn-uploads-7f3k \
--log-filter "severity>=WARNING"Grant the sink's writer identity permission to write to the bucket (the create command prints the service account to authorize)
A metric is a time series — CPU %, request count, log-based counts. Google publishes many automatically for every resource. Why read them: they tell you whether a VM is overloaded or errors are spiking. You can even define a "log-based metric" that counts matching log lines.
List available metric types for Compute Engine
gcloud monitoring metrics-descriptors list \
--filter 'metric.type=starts_with("compute.googleapis.com")' \
--format "value(type)" 2>/dev/null | headCreate a log-based metric counting ERROR log lines
gcloud logging metrics create error_count \
--description "Count of ERROR logs" \
--log-filter "severity>=ERROR"An alerting policy watches a metric and notifies a channel (email, SMS, Slack, PagerDuty) when it crosses a threshold. Why: you find out about high CPU or a flood of errors before users complain. You first create a notification channel, then the policy that uses it.
Create an email notification channel
gcloud beta monitoring channels create \
--display-name "Ops email" \
--type email \
--channel-labels email_address=ops@example.comThen create an alerting policy referencing that channel (via a policy JSON file with --policy-from-file), e.g. "VM CPU > 80% for 5m".