Skip to main content

Runtime audit CLI for detecting asyncio degradation and threadpool saturation.

Project description

Async Runtime Auditor

PyPI version Python 3.9+ License: MIT

Runtime degradation detection and CI/CD gating for Python asyncio services.

Async systems can enter severe internal degradation states long before traditional application telemetry reflects failure. Event-loop starvation, blocking execution, and executor saturation frequently remain invisible behind healthy HTTP latency metrics.

Async Runtime Auditor is a lightweight operational CLI that evaluates async runtime health directly from telemetry signals and enforces deployment gating before degraded runtime behavior reaches production.

The system is designed for:

  • staging validation
  • runtime regression detection
  • deployment pipeline enforcement
  • operational diagnostics for async infrastructure

Core Idea

Most observability systems focus primarily on external request behavior:

  • latency
  • throughput
  • error rate

But async runtimes fail internally first.

This project focuses on runtime-native signals such as:

  • event-loop lag
  • executor queue amplification
  • blocking duration
  • concurrent pressure
  • threadpool backlog growth

to detect runtime degradation before application instability becomes externally visible.


Runtime Failure Pattern

In many production systems:

healthy HTTP latency != healthy runtime state

A runtime may still appear externally healthy while internally experiencing:

  • scheduler starvation
  • blocking execution
  • executor worker exhaustion
  • queue amplification
  • degraded task scheduling fairness

The auditor is designed specifically to detect these hidden operational states.


Installation

Requires Python 3.9+

A Prometheus-compatible telemetry backend is required.

PyPI Installation

pip install async-runtime-auditor

Local Development

pip install -e .

from the repository root.


Runtime Metric Requirements

Your async application must expose runtime telemetry before the auditor can evaluate health conditions.

Example FastAPI instrumentation:

Install Prometheus client support:

pip install prometheus-client
import asyncio
import time

from fastapi import FastAPI
from prometheus_client import Gauge, make_asgi_app

app = FastAPI()

EVENT_LOOP_LAG = Gauge(
    "asyncio_event_loop_lag_seconds",
    "Measured asyncio event loop lag"
)

ACTIVE_REQUESTS = Gauge(
    "active_requests",
    "Current active requests"
)

metrics_app = make_asgi_app()
app.mount("/metrics", metrics_app)

async def monitor_loop_lag():
    while True:
        start = time.perf_counter()

        await asyncio.sleep(0.1)

        lag = time.perf_counter() - start - 0.1

        EVENT_LOOP_LAG.set(max(0, lag))

@app.on_event("startup")
async def startup_event():
    asyncio.create_task(monitor_loop_lag())

Once Prometheus scrapes these metrics, the auditor can evaluate runtime degradation thresholds.


Quick Start

1. Initialize Local Configuration

Generate a default metrics.yaml configuration file:

async-auditor --init

2. Run a Runtime Audit

Execute against a local or staging Prometheus target:

async-auditor \
  --config metrics.yaml \
  --target http://localhost:9090

3. Enable CI/CD Deployment Gating

Fail deployment pipelines automatically when critical runtime degradation is detected:

async-auditor \
  --config metrics.yaml \
  --target http://localhost:9090 \
  --fail-on-critical

Exit code behavior:

Runtime Status Exit Code
HEALTHY 0
DEGRADED 0
CRITICAL 1
COLLAPSE_RISK 1

Configuration (metrics.yaml)

The auditor is intentionally decoupled from metric naming conventions.

Prometheus queries are mapped into the runtime scoring engine through metrics.yaml.

target: "http://localhost:9090"

queries:
  p99_latency: "histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m]))"

  event_loop_lag: "max_over_time(asyncio_event_loop_lag_seconds[15m])"

  blocking_events_total: "blocking_events_total"

  blocking_duration_avg: "rate(blocking_duration_seconds_sum[15m]) / rate(blocking_duration_seconds_count[15m])"

  peak_active_requests: "max_over_time(active_requests[15m])"

  threadpool_queue_wait: "rate(threadpool_queue_wait_seconds_sum[15m]) / rate(threadpool_queue_wait_seconds_count[15m])"

  threadpool_tasks_started: "threadpool_tasks_started_total"

  threadpool_tasks_completed: "threadpool_tasks_completed_total"

thresholds:
  health_score_collapse: 2500
  health_score_critical: 1200
  health_score_degraded: 400

  max_event_loop_lag_s: 0.5
  max_blocking_duration_s: 1.0

  max_queue_wait_s: 1.0
  max_threadpool_backlog: 3

  max_active_requests: 150

Runtime Health Model

The auditor computes a deterministic runtime health score using:

  • event-loop lag
  • blocking duration
  • executor queue wait
  • concurrent request pressure
  • threadpool backlog behavior

The scoring model is:

  • deterministic
  • threshold-driven
  • operationally explainable

This project intentionally avoids:

  • anomaly detection systems
  • machine learning classifiers
  • probabilistic runtime forecasting
  • opaque scoring heuristics

Output Formats

The CLI supports both:

  • terminal-readable operational output
  • structured JSON output for CI systems

Standard Text Output

async-auditor --output-format text

JSON Output

async-auditor --output-format json

A structured report is automatically written to:

audit_results.json

inside the current working directory.


Example Runtime Audit Output

============================================================
ASYNC RUNTIME AUDITOR
============================================================

Target: http://localhost:9090

Runtime Status: DEGRADED
Runtime Health Score: 842

Findings:
  - Event-loop starvation detected.
  - Executor queue amplification detected.
  - Concurrent load saturation detected.

JSON audit report written to:
audit_results.json

CI/CD Integration

A reference GitHub Actions workflow is included in:

examples/github-actions-gating.yml

Example pipeline step:

- name: Runtime Audit
  run: |
    async-auditor \
      --config metrics.yaml \
      --target http://localhost:9090 \
      --fail-on-critical

This allows runtime degradation checks to execute automatically during pull request validation and deployment workflows.


Intended Usage

This tool is designed primarily for:

  • deployment pipeline gating
  • async runtime regression testing
  • staging environment diagnostics
  • operational async runtime analysis

It is not intended to replace:

  • full observability platforms
  • distributed tracing systems
  • telemetry storage infrastructure

Scope

This project is intentionally narrow in scope.

Included

  • heuristic runtime degradation detection
  • CI/CD exit-code semantics
  • configurable operational thresholds
  • Prometheus-compatible telemetry querying
  • async runtime diagnostics

Not Included

  • distributed tracing infrastructure
  • Kubernetes orchestration
  • OpenTelemetry collectors
  • AI-driven remediation systems
  • autonomous operational control loops
  • observability storage backends

Design Principles

The project prioritizes:

  • deterministic behavior
  • explainable operational output
  • low dependency overhead
  • operational clarity
  • lightweight deployment integration

over:

  • infrastructure breadth
  • platform extensibility
  • autonomous remediation
  • generalized orchestration

Project Status

Current version:

v0.1.0

This project is currently in early-stage OSS development and intended for:

  • engineering validation
  • staging infrastructure testing
  • deployment pipeline experimentation

Operational thresholds may require tuning depending on workload characteristics and runtime architecture.


License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

async_runtime_auditor-0.1.1.tar.gz (12.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

async_runtime_auditor-0.1.1-py3-none-any.whl (10.3 kB view details)

Uploaded Python 3

File details

Details for the file async_runtime_auditor-0.1.1.tar.gz.

File metadata

  • Download URL: async_runtime_auditor-0.1.1.tar.gz
  • Upload date:
  • Size: 12.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for async_runtime_auditor-0.1.1.tar.gz
Algorithm Hash digest
SHA256 a9c05046f01a6bb7781c43167484151477421425d25079b1ef13fbe27e66fce4
MD5 44c583d5e52ab1e47a506b7f346e6474
BLAKE2b-256 6fc154e750c2d1468c283d0293ce8c27dc87003f9ebe6f474d1f5cffa5bc0646

See more details on using hashes here.

File details

Details for the file async_runtime_auditor-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for async_runtime_auditor-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 9fd9d0528c8101ffb22560bc9576ce7a61c6f5d124afc9a4d63edc7d8c7c154f
MD5 4898f65d5304a46ddcbf746d3c0dfb16
BLAKE2b-256 e0587835e82bb23d8a4ad418d6ebad6df2b0e27ae8af64b7836edd5ebb6625fe

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page