SLO + error-budget tracker for Python services (FastAPI middleware + Prometheus exporter)
Project description
slo-budget-tracker
SLO + error-budget tracker for Python services — drop-in FastAPI middleware, Prometheus exporter, and a small standalone library you can wire into any ASGI app or background worker.
Built around the math in the Google SRE Workbook: one rolling window, multi-window burn-rate alerts (defaults to 1h + 6h at burn rate ≥ 14.4), and an explicit error-budget remaining gauge so dashboards stop lying about reliability.
Why
Most "SLO dashboards" you find in the wild conflate availability with uptime and surface neither error budget nor burn rate. You can't tell, at a glance, whether the freshly deployed service is burning the next 30 days of error budget in the next 30 minutes. This library makes that visible by default.
Two things matter:
- Error budget remaining — a
[1.0 → ≤0]ratio on every dashboard. - Burn rate —
(1 − actual_success_ratio) / (1 − target), sampled at short windows so fast-burn incidents page before the budget is spent.
Install
pip install slo-budget-tracker
# or, with the FastAPI extras:
pip install "slo-budget-tracker[fastapi]"
Python 3.11+. Single runtime dep: prometheus-client.
Quick start — standalone library
from slo_budget_tracker import SLODefinition, SLOTracker
tracker = SLOTracker(
SLODefinition(
name="availability",
target=0.999, # three nines
window_seconds=30 * 24 * 3600, # 30-day rolling window
burn_rate_windows=(3600, 21600), # alert on 1h and 6h
burn_rate_threshold=14.4, # SRE workbook fast-burn page
)
)
# Hot path — O(1)
tracker.record_success()
tracker.record_failure()
snap = tracker.snapshot()
print(f"success ratio: {snap.success_ratio:.4f}")
print(f"budget left: {snap.error_budget_remaining:.2%}")
print(f"burn rate: {snap.burn_rate:.2f}")
if snap.is_budget_exhausted:
print("Freeze deploys.")
for alert in tracker.check_burn_rate():
print(f"FAST BURN over {alert.window_seconds}s: {alert.burn_rate:.1f}x budget")
FastAPI middleware
SLOMiddleware auto-classifies every HTTP response — by default 5xx and unhandled exceptions are failures, everything else is a success. Override with your own classifier when 4xx (or specific routes) should burn budget.
from fastapi import FastAPI
from fastapi.responses import Response
from slo_budget_tracker import (
PrometheusExporter,
SLODefinition,
SLOMiddleware,
SLORegistry,
)
registry = SLORegistry()
registry.define(SLODefinition(name="availability", target=0.999))
registry.define(SLODefinition(name="freshness", target=0.99))
app = FastAPI()
app.add_middleware(SLOMiddleware, registry=registry, slo_name="availability")
exporter = PrometheusExporter(registry)
@app.get("/metrics")
async def metrics() -> Response:
body, content_type = exporter.render()
return Response(content=body, media_type=content_type)
@app.get("/slo")
async def slo_snapshot() -> dict[str, object]:
return {"slos": [s.__dict__ for s in registry.snapshot_all()]}
Point your Prometheus scrape at /metrics and you get:
slo_target{slo="availability"} 0.999
slo_success_ratio{slo="availability"} 0.9991
slo_error_budget_remaining{slo="availability"} 0.42
slo_burn_rate{slo="availability",window_seconds="3600"} 2.1
slo_burn_rate{slo="availability",window_seconds="21600"} 0.8
slo_breached{slo="availability"} 0.0
Custom classification
Default: anything < 500 and no exception is a success. Want 4xx to burn budget? Pass classify=:
app.add_middleware(
SLOMiddleware,
registry=registry,
slo_name="availability",
classify=lambda status, exc: exc is None and status < 400,
)
The classifier receives (status_code, exception_or_None) and returns True for success.
API surface
| Object | Purpose |
|---|---|
SLODefinition |
Frozen dataclass: name, target, window, burn-rate windows + threshold. Validates at construction. |
SLOTracker |
Records observations, computes snapshots and burn-rate alerts. |
SLORegistry |
Holds many named trackers; supports snapshot_all() and check_burn_rates(). |
SLOMiddleware |
ASGI middleware that auto-records HTTP outcomes against a tracker. |
PrometheusExporter |
Renders the registry as Prometheus text format on demand. |
Observation |
(timestamp, success) event. |
SLOSnapshot |
Point-in-time view: ratios, failures, budget remaining, burn rate. |
BurnRateAlert |
One short window has crossed the configured threshold. |
BurnRateSample |
One short-window measurement attached to a snapshot. |
Burn-rate math
error_budget = (1 - target) * total_requests_in_window
budget_used = failures_in_window
remaining_pct = (error_budget - budget_used) / error_budget
burn_rate(short_window) = (1 - success_ratio(short_window)) / (1 - target)
A burn_rate == 1.0 means the service is failing at exactly the rate the SLO allows. burn_rate == 14.4 means the next 30-day budget is being eaten in ~2 days. The default threshold of 14.4 follows the SRE Workbook fast-burn page.
Storage backends
The default InMemoryStore keeps a thread-safe deque trimmed to the window. For services pushing > ~100 rps you'll want a sampling or bucketed backend — wire one in by passing store= to SLOTracker. The protocol is small:
class ObservationStore(Protocol):
def record(self, observation: Observation) -> None: ...
def window(self, now: float, seconds: int) -> list[Observation]: ...
def trim(self, before: float) -> None: ...
def __len__(self) -> int: ...
A Redis sorted-set backend is on the roadmap (ZADD/ZREMRANGEBYSCORE); contributions welcome.
Tests
pip install -e ".[dev]"
ruff check src tests && ruff format --check src tests
mypy src
pytest -v
The CI matrix runs Python 3.11 / 3.12 / 3.13.
Related work in this ecosystem
This is part of the Platform Reliability Stack — small, focused libraries that compose into a production reliability story:
- procurement-decision-api — drafts AI Procurement Decision Cards from vendor Suite documents.
- reliability-toolkit-rs — async rate-limit + circuit-breaker + retry + bulkhead in Rust (coming next).
- More at kineticgain.com.
License
MIT. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file slo_budget_tracker-0.1.0.tar.gz.
File metadata
- Download URL: slo_budget_tracker-0.1.0.tar.gz
- Upload date:
- Size: 15.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
268bbcf2db9af2a3fa2a92cbdcb2af7bd3673d585de063713039d4d4a9291253
|
|
| MD5 |
e3624537dc82a5b54efe7becb6b0a134
|
|
| BLAKE2b-256 |
d9a344103eccc84899c27066d61f2bb0885e625c49a1622460df4cd7e58e2902
|
Provenance
The following attestation bundles were made for slo_budget_tracker-0.1.0.tar.gz:
Publisher:
publish.yml on mizcausevic-dev/slo-budget-tracker
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
slo_budget_tracker-0.1.0.tar.gz -
Subject digest:
268bbcf2db9af2a3fa2a92cbdcb2af7bd3673d585de063713039d4d4a9291253 - Sigstore transparency entry: 1549225418
- Sigstore integration time:
-
Permalink:
mizcausevic-dev/slo-budget-tracker@4789920f025a75ce38e0da0005e73842e35054f2 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/mizcausevic-dev
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@4789920f025a75ce38e0da0005e73842e35054f2 -
Trigger Event:
push
-
Statement type:
File details
Details for the file slo_budget_tracker-0.1.0-py3-none-any.whl.
File metadata
- Download URL: slo_budget_tracker-0.1.0-py3-none-any.whl
- Upload date:
- Size: 14.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e8b8a00f22f463ee2d7286ce5ce5bb2153a474ffc38936ea68081d0e2314f45f
|
|
| MD5 |
caf3d26f63d2cf76aa222d22123fa4ad
|
|
| BLAKE2b-256 |
f248ff0b8a24cad8bd1e767b7331909e7913c2cdb857fa162a6e74aca6790a8d
|
Provenance
The following attestation bundles were made for slo_budget_tracker-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on mizcausevic-dev/slo-budget-tracker
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
slo_budget_tracker-0.1.0-py3-none-any.whl -
Subject digest:
e8b8a00f22f463ee2d7286ce5ce5bb2153a474ffc38936ea68081d0e2314f45f - Sigstore transparency entry: 1549225469
- Sigstore integration time:
-
Permalink:
mizcausevic-dev/slo-budget-tracker@4789920f025a75ce38e0da0005e73842e35054f2 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/mizcausevic-dev
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@4789920f025a75ce38e0da0005e73842e35054f2 -
Trigger Event:
push
-
Statement type: