Skip to main content

Zero-glue FastAPI observability with security presets and runtime controls

Project description

fastapi-observer

Sponsor

Zero-glue observability for FastAPI.

fastapi-observer gives you structured JSON logs, request correlation, Prometheus metrics, OpenTelemetry tracing, security redaction presets, and runtime controls in one install step and one function call.

Supported Python versions: 3.10 to 3.14


Why This Package Exists

Most FastAPI services eventually need the same observability plumbing:

  • Structured JSON logging
  • Request and trace correlation
  • Metrics for dashboards and alerts
  • OpenTelemetry setup
  • Redaction/sanitization for sensitive data
  • Runtime controls for incident response

Teams usually implement this as custom glue code in every service. That costs engineering time and creates drift between services.

fastapi-observer replaces this repeated wiring with a consistent, secure-by-default setup.


Sponsor

If this library saves you engineering time, you can support maintenance here:

buymeacoffee.com/FYbPCSu


What You Get Immediately

After one call to install_observability():

Capability Included Default
Structured JSON logs Yes Enabled
Request ID correlation Yes Enabled
Trace/span IDs in logs Yes (with OTel) Off until OTel enabled
Prometheus /metrics Yes Off until metrics_enabled=True
Sensitive-data redaction Yes Enabled
Security presets (strict, pci, gdpr) Yes Available
Runtime control endpoint Yes Off until enabled
Plugin hooks for enrichment/hooks Yes Available

Install

# Core (logging + metrics + security)
pip install fastapi-observer

# Prometheus metrics support
pip install "fastapi-observer[prometheus]"

# OpenTelemetry tracing/logs support
pip install "fastapi-observer[otel]"

# Everything
pip install "fastapi-observer[all]"

Import path:

import fastapiobserver

5-Minute Quick Start

from fastapi import FastAPI
from fastapiobserver import ObservabilitySettings, install_observability

app = FastAPI()

settings = ObservabilitySettings(
    app_name="orders-api",
    service="orders",
    environment="production",
    version="0.1.0",
    metrics_enabled=True,
)

install_observability(app, settings)


@app.get("/orders/{order_id}")
def get_order(order_id: int) -> dict[str, int]:
    return {"order_id": order_id}

Run:

uvicorn main:app --reload

Now you have:

  • Structured request logs on every request
  • Request ID propagation
  • Sanitized event payloads
  • Prometheus metrics at /metrics

Security Defaults and Presets

Default protections

Protection Default Why
Body logging OFF Avoid leaking request/response secrets
Sensitive key masking ON Protect fields like password, token, secret
Sensitive header masking ON Protect authorization, cookie, x-api-key
Query string in logged path Excluded Prevent accidental token leakage
Request ID trust boundary Trusted CIDRs only Prevent spoofed correlation IDs

Presets for regulated environments

from fastapiobserver import SecurityPolicy

# Strictest option: drop sensitive values and keep minimal safe headers
strict_policy = SecurityPolicy.from_preset("strict")

# PCI-focused redaction fields
pci_policy = SecurityPolicy.from_preset("pci")

# GDPR-focused hashed PII fields
gdpr_policy = SecurityPolicy.from_preset("gdpr")

Use a preset in installation:

install_observability(app, settings, security_policy=SecurityPolicy.from_preset("pci"))

Allowlist-only logging (audit-style)

If your compliance model is "log only approved fields", use allowlists:

from fastapiobserver import SecurityPolicy

policy = SecurityPolicy(
    header_allowlist=("x-request-id", "content-type", "user-agent"),
    event_key_allowlist=("method", "path", "status_code"),
)

Body capture media-type guard

policy = SecurityPolicy(
    log_request_body=True,
    body_capture_media_types=("application/json",),
)

Runtime Control Plane (No Restart)

Use runtime controls when you need higher log verbosity or different trace sampling during an incident.

export OBSERVABILITY_CONTROL_TOKEN="replace-me"
from fastapiobserver import RuntimeControlSettings, install_observability

runtime_control = RuntimeControlSettings(enabled=True)
install_observability(app, settings, runtime_control_settings=runtime_control)

Inspect current runtime values:

curl -X GET http://localhost:8000/_observability/control \
  -H "Authorization: Bearer replace-me"

Update runtime values:

curl -X POST http://localhost:8000/_observability/control \
  -H "Authorization: Bearer replace-me" \
  -H "Content-Type: application/json" \
  -d '{"log_level":"DEBUG","trace_sampling_ratio":0.25}'

What changes immediately:

  • Root logger level (and uvicorn loggers)
  • Dynamic OTel trace sampling ratio

OpenTelemetry (Traces + Optional OTLP Logs)

from fastapiobserver import OTelLogsSettings, OTelSettings, install_observability

otel_settings = OTelSettings(
    enabled=True,
    service_name="orders-api",
    service_version="2.0.0",
    environment="production",
    otlp_endpoint="http://localhost:4317",
    protocol="grpc",                  # or "http/protobuf"
    trace_sampling_ratio=1.0,
    extra_resource_attributes={
        "k8s.namespace": "prod",
        "team": "backend",
    },
)

otel_logs_settings = OTelLogsSettings(
    enabled=True,
    logs_mode="both",                 # "local_json", "otlp", or "both"
    otlp_endpoint="http://localhost:4317",
    protocol="grpc",
)

install_observability(
    app,
    settings,
    otel_settings=otel_settings,
    otel_logs_settings=otel_logs_settings,
)

Design details:

  • Reuses an externally configured tracer provider if one already exists.
  • Injects trace IDs into application logs for log-trace correlation.
  • Supports runtime sampling updates through the control plane.
  • Sends OTel logs in OTLP mode with the same sanitization policy.

What install_observability() Wires Up

  1. Structured logging pipeline (JSON formatter + bounded async queue handler).
  2. Metrics backend and /metrics endpoint when metrics are enabled.
  3. OTel tracing setup when OTel is enabled.
  4. Request logging middleware with sanitization and context cleanup.
  5. Runtime control endpoint when runtime control is enabled.

Request path lifecycle (high-level):

Request arrives
  -> request ID / trace context resolved
  -> app handler executes
  -> response classified (ok/client_error/server_error/exception)
  -> payload sanitized by policy
  -> log emitted + metrics recorded
  -> context cleared

Example JSON Log Event

{
  "timestamp": "2026-02-18T10:30:00.000000+00:00",
  "level": "INFO",
  "logger": "fastapiobserver.middleware",
  "message": "request.completed",
  "app_name": "orders-api",
  "service": "orders",
  "environment": "production",
  "version": "0.1.0",
  "log_schema_version": "1.0.0",
  "library": "fastapiobserver",
  "request_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "trace_id": "0af7651916cd43dd8448eb211c80319c",
  "span_id": "b7ad6b7169203331",
  "event": {
    "method": "GET",
    "path": "/orders/42",
    "status_code": 200,
    "duration_ms": 3.456,
    "client_ip": "10.0.0.1",
    "error_type": "ok"
  }
}

Production Deployment Guide

This section is deployment-first. A new engineer should be able to ship this stack without reading the source code.

Reference architecture

flowchart LR
  A["FastAPI services (fastapi-observer)"] --> C["OTel Collector"]
  C --> D["Tempo (traces)"]
  C --> E["Loki (logs)"]
  A --> F["Prometheus (/metrics scrape)"]
  F --> G["Grafana"]
  D --> G
  E --> G

Minimal collector config

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  memory_limiter:
    limit_mib: 512
    spike_limit_mib: 128
    check_interval: 5s
  batch:
    send_batch_size: 512
    timeout: 5s

exporters:
  otlphttp/tempo:
    endpoint: http://tempo:4318
  otlphttp/loki:
    endpoint: http://loki:3100/otlp

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [otlphttp/tempo]
    logs:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [otlphttp/loki]

Rollout strategy

  1. Baseline current service SLOs before migration (latency, error rate, availability).
  2. Enable fastapi-observer in one service with conservative settings (no body capture).
  3. Run canary rollout (5-10% traffic) and compare: latency p95, 5xx rate, and log/traces pipeline health.
  4. Expand rollout to all replicas/services after 24-48h stable canary.
  5. Enable advanced controls in phases: security presets, allowlists, runtime control plane, OTLP logs mode.

Failure modes and expected behavior

Failure mode Expected behavior Immediate action
OTel Collector down App still serves traffic; local logs still available if OTEL_LOGS_MODE=both Fail over Collector or temporarily switch to local-json mode
Tempo down Traces unavailable; logs/metrics continue Restore Tempo, keep incident correlation via logs
Loki down Logs unavailable in Grafana; metrics/traces continue Restore Loki, use app stdout logs temporarily
Prometheus down No metrics/alerts; app traffic unaffected Restore Prometheus and alertmanager path
High cardinality on paths Prometheus pressure increases Use route templates and exclude noisy paths
Spoofed forwarded headers Incorrect client IP/request ID trust Tighten OBS_TRUSTED_CIDRS and proxy chain config

SLO and alert checklist

Recommended SLOs:

  • Availability: >= 99.9% over 30 days
  • p95 latency: < 500ms for core APIs
  • 5xx rate: < 1% per service
  • Error-budget burn alerting: fast burn (1h), slow burn (6h)

Starter alert queries:

# 5xx rate per service (5 minutes)
sum(rate(http_requests_total{status_code=~"5.."}[5m])) by (service)

# p95 latency per service
histogram_quantile(
  0.95,
  sum(rate(http_request_duration_seconds_bucket[5m])) by (le, service)
)

# Traffic drop detection
sum(rate(http_requests_total[5m])) by (service)

Incident playbook (first 15 minutes)

  1. Confirm blast radius in Grafana: affected services, status codes, latency shifts, deployment changes.
  2. Increase signal quality without restart: use runtime control plane to raise log level and tracing sample ratio.
  3. Identify dependency failures: check Collector, Loki, Tempo, Prometheus health and ingestion queues.
  4. Mitigate: roll back latest app change, scale affected service, or disable expensive capture options.
  5. Verify recovery: p95 + 5xx return to baseline, trace volume normalized, alert clears.

Kubernetes quickstart (copy/paste)

Use the bundled manifests:

kubectl kustomize --load-restrictor=LoadRestrictionsNone examples/k8s | kubectl apply -f -
kubectl -n observability rollout status deployment/app-a
kubectl -n observability rollout status deployment/app-b
kubectl -n observability rollout status deployment/app-c
kubectl -n observability rollout status deployment/otel-collector
kubectl -n observability rollout status deployment/prometheus
kubectl -n observability rollout status deployment/loki
kubectl -n observability rollout status deployment/tempo
kubectl -n observability rollout status deployment/grafana
kubectl -n observability rollout status deployment/traffic-generator
kubectl -n observability port-forward svc/grafana 3000:3000

Open http://localhost:3000.
Full guide: kubernetes.md


Examples

The examples/ directory contains runnable demos:

Example What it shows
basic_app.py Minimal setup and request logging
security_presets_app.py Preset-based security policy
allowlist_app.py Allowlist-only sanitization
otel_app.py OTel tracing and resource attributes
k8s/ Kubernetes-native stack with Prometheus + Loki + Tempo + Grafana
full_stack/ Docker Compose stack: 3 FastAPI services + Grafana + Prometheus + Loki + Tempo

Run an example:

uvicorn examples.basic_app:app --reload

Dashboard Screenshots (Full-Stack Demo)

From examples/full_stack, these are real Grafana views generated by fastapi-observer telemetry:

Overview panels (latency heatmap, route throughput, errors, CPU/memory):

FastAPI Observer dashboard overview

Percentiles, request rate, and structured JSON logs in Loki:

FastAPI Observer dashboard logs and percentiles


Environment Variables

The library supports configuration from code and env vars. Below are the most relevant env vars by area.

Identity and logging

Variable Default Description
APP_NAME app Namespace for app-level identity
SERVICE_NAME api Service label for logs/metrics
ENVIRONMENT development Environment label
APP_VERSION 0.0.0 Service version
LOG_LEVEL INFO Root log level
LOG_DIR - Optional file log directory
LOG_QUEUE_MAX_SIZE 10000 Max in-memory records in core log queue
LOG_QUEUE_OVERFLOW_POLICY drop_oldest Queue overflow behavior: drop_oldest, drop_newest, block
LOG_QUEUE_BLOCK_TIMEOUT_SECONDS 1.0 Timeout used by block policy before dropping newest
REQUEST_ID_HEADER x-request-id Incoming request ID header
RESPONSE_REQUEST_ID_HEADER x-request-id Response request ID header

Metrics

Variable Default Description
METRICS_ENABLED false Enable metrics backend
METRICS_PATH /metrics Metrics endpoint path
METRICS_EXCLUDE_PATHS /metrics,/health,/healthz,/docs,/openapi.json Skip metrics for noisy endpoints
METRICS_EXEMPLARS_ENABLED false Enable exemplars where supported
METRICS_FORMAT negotiate prometheus, openmetrics, or negotiate

Security and trust boundary

Variable Default Description
OBS_REDACTION_PRESET - strict, pci, gdpr
OBS_REDACTED_FIELDS built-in list CSV keys to redact
OBS_REDACTED_HEADERS built-in list CSV headers to redact
OBS_REDACTION_MODE mask mask, hash, drop
OBS_MASK_TEXT *** Mask replacement text
OBS_LOG_REQUEST_BODY false Enable request body logging
OBS_LOG_RESPONSE_BODY false Enable response body logging
OBS_MAX_BODY_LENGTH 256 Max captured body bytes
OBS_HEADER_ALLOWLIST - CSV headers allowed in logs
OBS_EVENT_KEY_ALLOWLIST - CSV event keys allowed in logs
OBS_BODY_CAPTURE_MEDIA_TYPES - CSV allowed media types for body capture
OBS_TRUSTED_PROXY_ENABLED true Enable trusted-proxy policy
OBS_TRUSTED_CIDRS RFC1918 + loopback CSV trusted CIDRs
OBS_HONOR_FORWARDED_HEADERS false Trust forwarded headers

Notes:

  • OBS_HEADER_ALLOWLIST, OBS_EVENT_KEY_ALLOWLIST, and OBS_BODY_CAPTURE_MEDIA_TYPES accept none, null, or unset to clear values.

OpenTelemetry tracing/log export

Variable Default Description
OTEL_ENABLED false Enable tracing instrumentation
OTEL_SERVICE_NAME SERVICE_NAME OTel service name override
OTEL_SERVICE_VERSION APP_VERSION OTel service version override
OTEL_ENVIRONMENT ENVIRONMENT OTel environment override
OTEL_EXPORTER_OTLP_ENDPOINT - OTLP endpoint
OTEL_EXPORTER_OTLP_PROTOCOL grpc grpc or http/protobuf
OTEL_TRACE_SAMPLING_RATIO 1.0 Initial trace sampling ratio
OTEL_EXTRA_RESOURCE_ATTRIBUTES - CSV key=value pairs
OTEL_EXCLUDED_URLS auto-derived CSV excluded paths for tracing
OTEL_LOGS_ENABLED false Enable OTLP log export
OTEL_LOGS_MODE local_json local_json, otlp, both
OTEL_LOGS_ENDPOINT - OTLP logs endpoint
OTEL_LOGS_PROTOCOL grpc grpc or http/protobuf

Runtime control plane

Variable Default Description
OBS_RUNTIME_CONTROL_ENABLED false Enable runtime control endpoint
OBS_RUNTIME_CONTROL_PATH /_observability/control Control endpoint path
OBS_RUNTIME_CONTROL_TOKEN_ENV_VAR OBSERVABILITY_CONTROL_TOKEN Name of env var containing bearer token
OBSERVABILITY_CONTROL_TOKEN - Bearer token value used for auth

Optional Logtail sink

Variable Default Description
LOGTAIL_ENABLED false Enable Better Stack Logtail sink
LOGTAIL_SOURCE_TOKEN - Logtail source token
LOGTAIL_BATCH_SIZE 50 Batch size for shipping
LOGTAIL_FLUSH_INTERVAL 2.0 Flush interval (seconds)

Advanced Operations

Middleware ordering for body capture

If body capture is enabled, install observability before other middleware:

from fastapi.middleware.cors import CORSMiddleware
from fastapiobserver import SecurityPolicy, install_observability

install_observability(app, settings, security_policy=SecurityPolicy(log_request_body=True))
app.add_middleware(CORSMiddleware, allow_origins=["*"])

Multi-worker Gunicorn with Prometheus

export PROMETHEUS_MULTIPROC_DIR=/tmp/prometheus-metrics
rm -rf "$PROMETHEUS_MULTIPROC_DIR"
mkdir -p "$PROMETHEUS_MULTIPROC_DIR"

gunicorn.conf.py:

from fastapiobserver import mark_prometheus_process_dead


def child_exit(server, worker):
    mark_prometheus_process_dead(worker.pid)

Bounded queue and overflow policy

Use queue controls to define behavior under sustained log pressure:

settings = ObservabilitySettings(
    app_name="orders-api",
    service="orders",
    environment="production",
    log_queue_max_size=20000,
    log_queue_overflow_policy="drop_oldest",  # or "drop_newest" / "block"
    log_queue_block_timeout_seconds=0.5,
)

Queue pressure metrics exposed on /metrics (Prometheus mode):

  • fastapiobserver_log_queue_size
  • fastapiobserver_log_queue_capacity
  • fastapiobserver_log_queue_enqueued_total
  • fastapiobserver_log_queue_dropped_total{reason="drop_oldest|drop_newest"}
  • fastapiobserver_log_queue_blocked_total
  • fastapiobserver_log_queue_block_timeouts_total

Plugin Hooks

Extend behavior without editing package internals:

from fastapiobserver import register_log_enricher, register_metric_hook


def add_git_sha(payload: dict) -> dict:
    payload["git_sha"] = "abc123"
    return payload


def track_slow_requests(request, response, duration):
    if duration > 1.0:
        print(f"slow request: {request.url.path} {duration:.2f}s")


register_log_enricher("git_sha", add_git_sha)
register_metric_hook("slow_requests", track_slow_requests)

Plugin failures are isolated and do not crash request handling.


OTel Test Coverage

Repository integration tests include:

  • tests/test_otel_log_correlation.py: verifies trace/span IDs in logs map to real spans.
  • tests/test_otlp_export_integration.py: validates OTLP HTTP export with local collector fixtures.

Release Tracks

  • 0.1.x: secure-by-default core
  • 0.2.x: OTel interoperability, security presets, allowlists
  • 1.0.0: dynamic runtime controls and plugin stability

Current release version: 0.1.2

Changelog Policy

Breaking changes must be listed under a Breaking Changes section in CHANGELOG.md.


Packaging and Publishing (Maintainers)

1) Build distributions

python -m pip install --upgrade pip build
python -m build

2) Upload to TestPyPI

python -m pip install --upgrade twine
python -m twine upload --repository testpypi dist/*

3) Validate install from TestPyPI

python -m pip install \
  --extra-index-url https://test.pypi.org/simple/ \
  fastapi-observer

4) Upload to production PyPI

python -m twine upload dist/*

Local Git Hook (Recommended)

git config core.hooksPath .githooks

The pre-push hook runs:

  • uv run ruff check
  • uv run mypy src
  • uv run pytest -q

Roadmap Tracking

See NEXT_STEPS.md for the active 0.2.0 roadmap and release checklist.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastapi_observer-0.1.2.tar.gz (59.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fastapi_observer-0.1.2-py3-none-any.whl (45.7 kB view details)

Uploaded Python 3

File details

Details for the file fastapi_observer-0.1.2.tar.gz.

File metadata

  • Download URL: fastapi_observer-0.1.2.tar.gz
  • Upload date:
  • Size: 59.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for fastapi_observer-0.1.2.tar.gz
Algorithm Hash digest
SHA256 832b1fcd07fbcc577d89df853b175c18df63d08b0985d01f42810a517f5483f1
MD5 ecac1136d820f0e305179591fcfec362
BLAKE2b-256 5cf2b61f4fc000fff3c2aee24da18a50529eece1df81da846884b8a13e2bd0f8

See more details on using hashes here.

File details

Details for the file fastapi_observer-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for fastapi_observer-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 4464477c1ea64cc949ff9f5a45248b0703f7507de937d5c9198c409b61065e37
MD5 dc30b17ec6b6d54d7b2f895aa6db78d5
BLAKE2b-256 638ac63c940d37b64929596ecf71c8b90c4b7cef2817edd478777d3179aa5f22

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page