Keep a busy service healthy. Measure it with metrics (prometheus-client), watch it with Prometheus, trace requests with OpenTelemetry, and survive failure with graceful degradation, rate limiting, and a circuit breaker.
Why: instrumentation means adding small bits of code that measure what your app is doing — how many requests it handles, how long they take, how many fail. You cannot improve or fix what you cannot see. These measurements are called metrics, and prometheus-client is the standard Python library for recording them.
# Install: pip install prometheus-client
# metrics.py — define what you want to measure
from prometheus_client import Counter, Histogram
# A counter only ever goes up — perfect for "how many requests"
REQUEST_COUNT = Counter(
'http_requests_total',
'Total HTTP requests',
['method', 'path', 'status'],
)
# A histogram records a distribution — perfect for "how long did it take"
REQUEST_DURATION = Histogram(
'http_request_duration_seconds',
'Request duration in seconds',
['method', 'path'],
)Why: monitoring is watching those metrics over time and alerting a human when something looks wrong. The usual setup: your app exposes its numbers at a /metrics URL, a tool called Prometheus visits that URL every few seconds to record them, and Grafana draws the graphs and fires alerts (for example, "error rate above 5% for 5 minutes"). Your only job inside the app is to expose /metrics and update the numbers.
# main.py — measure every request and expose the numbers
import time
from fastapi import FastAPI, Request
from fastapi.responses import Response
from prometheus_client import generate_latest, CONTENT_TYPE_LATEST
from metrics import REQUEST_COUNT, REQUEST_DURATION
app = FastAPI()
@app.middleware('http')
async def measure(request: Request, call_next):
start = time.perf_counter()
response = await call_next(request)
REQUEST_DURATION.labels(request.method, request.url.path).observe(
time.perf_counter() - start
)
REQUEST_COUNT.labels(
request.method, request.url.path, response.status_code
).inc()
return response
# Prometheus scrapes (visits) this endpoint on a schedule
@app.get('/metrics')
def metrics():
return Response(generate_latest(), media_type=CONTENT_TYPE_LATEST)Why: telemetry is data your app emits about itself so you can understand it from the outside. The most useful kind at scale is a trace — a record that follows one request as it hops between services, showing where the time went. OpenTelemetry (often shortened to OTel) is the vendor-neutral standard; it auto-instruments common libraries, so you get traces without rewriting your code.
# Install: pip install opentelemetry-sdk opentelemetry-instrumentation-fastapi
# tracing.py — wire tracing into your FastAPI app
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor, ConsoleSpanExporter
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor
trace.set_tracer_provider(TracerProvider())
# ConsoleSpanExporter just prints traces; swap for an OTLP exporter in production
trace.get_tracer_provider().add_span_processor(
BatchSpanProcessor(ConsoleSpanExporter())
)
def setup_tracing(app):
# Auto-creates a trace for every request — no per-route code needed
FastAPIInstrumentor.instrument_app(app)Why: graceful degradation means that when one piece breaks, the whole app does not. If a non-essential dependency (say, a recommendations service) is down, you return a sensible fallback instead of failing the page. The pattern: wrap the risky call in try/except and always have a plan B.
# A product page that still works when recommendations are down
import logging
logger = logging.getLogger(__name__)
async def get_product_page(product_id: int):
product = await db.get_product(product_id) # essential — let it raise
try:
recommendations = await recommender.fetch(product_id) # nice-to-have
except Exception as err:
logger.warning('recommendations unavailable: %s', err)
recommendations = [] # fall back instead of failing the whole page
return {'product': product, 'recommendations': recommendations}Why: throttling (also called rate limiting) caps how many requests one client can make in a window of time. It protects your app from being overwhelmed — whether by a buggy client stuck in a retry loop or an abusive one. slowapi adds rate limiting to FastAPI and returns HTTP 429 (Too Many Requests) once a caller goes over the limit.
# Install: pip install slowapi
# main.py — limit how often each client can call an endpoint
from fastapi import FastAPI, Request
from slowapi import Limiter, _rate_limit_exceeded_handler
from slowapi.errors import RateLimitExceeded
from slowapi.util import get_remote_address
limiter = Limiter(key_func=get_remote_address) # track callers by IP
app = FastAPI()
app.state.limiter = limiter
app.add_exception_handler(RateLimitExceeded, _rate_limit_exceeded_handler)
@app.get('/api/items')
@limiter.limit('100/minute') # max 100 requests per IP per minute
def list_items(request: Request):
return {'items': []}Why: when a downstream service is failing, hammering it with retries makes things worse and ties up your own resources waiting for timeouts. A circuit breaker watches the failure rate and, once it crosses a threshold, "trips" — it stops calling the broken service for a while and fails fast (or returns a fallback) instead. After a cool-down it lets one test request through to check whether the service recovered. The circuitbreaker library makes this a one-line decorator.
# Install: pip install circuitbreaker
from circuitbreaker import circuit
# After 5 failures the circuit "opens" and calls fail fast for 30 seconds,
# instead of hanging while the broken service times out.
@circuit(failure_threshold=5, recovery_timeout=30)
def call_payment_api(order):
return payment_client.charge(order) # raises on failure
def charge(order):
try:
return call_payment_api(order)
except Exception:
# Plan B while the circuit is open
return {'status': 'queued', 'note': 'payment delayed'}