Runtime audit CLI for detecting asyncio degradation and threadpool saturation.
Project description
Async Runtime Auditor
Runtime degradation detection and CI/CD gating for Python asyncio services.
Async systems can enter severe internal degradation states long before traditional application telemetry reflects failure. Event-loop starvation, blocking execution, and executor saturation frequently remain invisible behind healthy HTTP latency metrics.
Async Runtime Auditor is a lightweight operational CLI that evaluates async runtime health directly from telemetry signals and enforces deployment gating before degraded runtime behavior reaches production.
The system is designed for:
- staging validation
- runtime regression detection
- deployment pipeline enforcement
- operational diagnostics for async infrastructure
Core Idea
Most observability systems focus primarily on external request behavior:
- latency
- throughput
- error rate
But async runtimes fail internally first.
This project focuses on runtime-native signals such as:
- event-loop lag
- executor queue amplification
- blocking duration
- concurrent pressure
- threadpool backlog growth
to detect runtime degradation before application instability becomes externally visible.
Runtime Failure Pattern
In many production systems:
healthy HTTP latency != healthy runtime state
A runtime may still appear externally healthy while internally experiencing:
- scheduler starvation
- blocking execution
- executor worker exhaustion
- queue amplification
- degraded task scheduling fairness
The auditor is designed specifically to detect these hidden operational states.
Installation
Requires Python 3.9+
A Prometheus-compatible telemetry backend is required.
PyPI Installation
pip install async-runtime-auditor
Local Development
pip install -e .
from the repository root.
Runtime Metric Requirements
Your async application must expose runtime telemetry before the auditor can evaluate health conditions.
Example FastAPI instrumentation:
Install Prometheus client support:
pip install prometheus-client
import asyncio
import time
from fastapi import FastAPI
from prometheus_client import Gauge, make_asgi_app
app = FastAPI()
EVENT_LOOP_LAG = Gauge(
"asyncio_event_loop_lag_seconds",
"Measured asyncio event loop lag"
)
ACTIVE_REQUESTS = Gauge(
"active_requests",
"Current active requests"
)
metrics_app = make_asgi_app()
app.mount("/metrics", metrics_app)
async def monitor_loop_lag():
while True:
start = time.perf_counter()
await asyncio.sleep(0.1)
lag = time.perf_counter() - start - 0.1
EVENT_LOOP_LAG.set(max(0, lag))
@app.on_event("startup")
async def startup_event():
asyncio.create_task(monitor_loop_lag())
Once Prometheus scrapes these metrics, the auditor can evaluate runtime degradation thresholds.
Quick Start
1. Initialize Local Configuration
Generate a default metrics.yaml configuration file:
async-auditor --init
2. Run a Runtime Audit
Execute against a local or staging Prometheus target:
async-auditor \
--config metrics.yaml \
--target http://localhost:9090
3. Enable CI/CD Deployment Gating
Fail deployment pipelines automatically when critical runtime degradation is detected:
async-auditor \
--config metrics.yaml \
--target http://localhost:9090 \
--fail-on-critical
Exit code behavior:
| Runtime Status | Exit Code |
|---|---|
HEALTHY |
0 |
DEGRADED |
0 |
CRITICAL |
1 |
COLLAPSE_RISK |
1 |
Configuration (metrics.yaml)
The auditor is intentionally decoupled from metric naming conventions.
Prometheus queries are mapped into the runtime scoring engine through metrics.yaml.
target: "http://localhost:9090"
queries:
p99_latency: "histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m]))"
event_loop_lag: "max_over_time(asyncio_event_loop_lag_seconds[15m])"
blocking_events_total: "blocking_events_total"
blocking_duration_avg: "rate(blocking_duration_seconds_sum[15m]) / rate(blocking_duration_seconds_count[15m])"
peak_active_requests: "max_over_time(active_requests[15m])"
threadpool_queue_wait: "rate(threadpool_queue_wait_seconds_sum[15m]) / rate(threadpool_queue_wait_seconds_count[15m])"
threadpool_tasks_started: "threadpool_tasks_started_total"
threadpool_tasks_completed: "threadpool_tasks_completed_total"
thresholds:
health_score_collapse: 2500
health_score_critical: 1200
health_score_degraded: 400
max_event_loop_lag_s: 0.5
max_blocking_duration_s: 1.0
max_queue_wait_s: 1.0
max_threadpool_backlog: 3
max_active_requests: 150
Runtime Health Model
The auditor computes a deterministic runtime health score using:
- event-loop lag
- blocking duration
- executor queue wait
- concurrent request pressure
- threadpool backlog behavior
The scoring model is:
- deterministic
- threshold-driven
- operationally explainable
This project intentionally avoids:
- anomaly detection systems
- machine learning classifiers
- probabilistic runtime forecasting
- opaque scoring heuristics
Output Formats
The CLI supports both:
- terminal-readable operational output
- structured JSON output for CI systems
Standard Text Output
async-auditor --output-format text
JSON Output
async-auditor --output-format json
A structured report is automatically written to:
audit_results.json
inside the current working directory.
Example Runtime Audit Output
============================================================
ASYNC RUNTIME AUDITOR
============================================================
Target: http://localhost:9090
Runtime Status: DEGRADED
Runtime Health Score: 842
Findings:
- Event-loop starvation detected.
- Executor queue amplification detected.
- Concurrent load saturation detected.
JSON audit report written to:
audit_results.json
CI/CD Integration
A reference GitHub Actions workflow is included in:
examples/github-actions-gating.yml
Example pipeline step:
- name: Runtime Audit
run: |
async-auditor \
--config metrics.yaml \
--target http://localhost:9090 \
--fail-on-critical
This allows runtime degradation checks to execute automatically during pull request validation and deployment workflows.
Intended Usage
This tool is designed primarily for:
- deployment pipeline gating
- async runtime regression testing
- staging environment diagnostics
- operational async runtime analysis
It is not intended to replace:
- full observability platforms
- distributed tracing systems
- telemetry storage infrastructure
Scope
This project is intentionally narrow in scope.
Included
- heuristic runtime degradation detection
- CI/CD exit-code semantics
- configurable operational thresholds
- Prometheus-compatible telemetry querying
- async runtime diagnostics
Not Included
- distributed tracing infrastructure
- Kubernetes orchestration
- OpenTelemetry collectors
- AI-driven remediation systems
- autonomous operational control loops
- observability storage backends
Design Principles
The project prioritizes:
- deterministic behavior
- explainable operational output
- low dependency overhead
- operational clarity
- lightweight deployment integration
over:
- infrastructure breadth
- platform extensibility
- autonomous remediation
- generalized orchestration
Project Status
Current version:
v0.1.0
This project is currently in early-stage OSS development and intended for:
- engineering validation
- staging infrastructure testing
- deployment pipeline experimentation
Operational thresholds may require tuning depending on workload characteristics and runtime architecture.
License
MIT License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file async_runtime_auditor-0.1.1.tar.gz.
File metadata
- Download URL: async_runtime_auditor-0.1.1.tar.gz
- Upload date:
- Size: 12.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a9c05046f01a6bb7781c43167484151477421425d25079b1ef13fbe27e66fce4
|
|
| MD5 |
44c583d5e52ab1e47a506b7f346e6474
|
|
| BLAKE2b-256 |
6fc154e750c2d1468c283d0293ce8c27dc87003f9ebe6f474d1f5cffa5bc0646
|
File details
Details for the file async_runtime_auditor-0.1.1-py3-none-any.whl.
File metadata
- Download URL: async_runtime_auditor-0.1.1-py3-none-any.whl
- Upload date:
- Size: 10.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9fd9d0528c8101ffb22560bc9576ce7a61c6f5d124afc9a4d63edc7d8c7c154f
|
|
| MD5 |
4898f65d5304a46ddcbf746d3c0dfb16
|
|
| BLAKE2b-256 |
e0587835e82bb23d8a4ad418d6ebad6df2b0e27ae8af64b7836edd5ebb6625fe
|