Skip to main content

Runtime intelligence layer for AI-era Python software: observe, verify, explain, and gate releases on evidence.

Project description

Barx — Runtime Intelligence Layer for AI-Era Python Software

Everything your code did, explained through evidence.

Barx observes Python code while it runs, verifies behavior, explains every runtime decision with evidence, audits what AI coding agents change, and turns a run into a GREEN/AMBER/RED release verdict you can defend — locally, with zero telemetry.

AI is changing how code gets written. Barx focuses on what comes next: runtime trust — can this run ship, and what proves it?

Barx 1.0.0 is the first stable release of the Barx Runtime Intelligence Layer. Barx has been rebuilt and repositioned: the 2025 PyPI release (0.1.0, "Fast, CPU-only AI framework") was a different product and is fully retired — see the changelog. Everything in the claims registry is implemented and tested; the limits are documented in What Barx is not and docs/CLAIMS.md.

Barx Studio — the local evidence workspace showing a GREEN release verdict

Barx Studio: the local evidence workspace (barx studio). A real recorded run — GREEN verdict, score and confidence, evidence categories, and the Evidence Spine. Graphite Dark ships too.

What Barx helps answer

  • Can this run ship?barx release-check (GREEN / AMBER / RED with evidence)
  • What changed between two runs?barx drift
  • What failed, and where? → the Risks view in Studio, the report's exceptions section
  • What did the coding agent do to my repo?barx.AgentAudit
  • What evidence supports this release? → the HTML report and Barx Studio

Quickstart

pip install barx            # core; add barx[api] for API testing

Until the 1.0.0 wheel lands on PyPI (publishing is a deliberate manual step — docs/publishing.md), install from source: pip install git+https://github.com/TheBarmaEffect/barx. The 0.1.0 currently on PyPI is the retired 2025 package, not this product.

import barx

seen = barx.Collection()                      # starts as a list
seen.extend(f"url-{i}" for i in range(250))

for i in range(2000):                         # workload turns lookup-heavy...
    seen.contains(f"url-{i % 250}")

print(seen.backend())              # -> "set" (switched, with evidence)
print(seen.explain())              # why, evidence, alternatives, confidence, rollback

Every decision is a structured event in .barx/runs/<run_id>/events.jsonl. Nothing leaves your machine — no telemetry, no network calls.

barx trace your_script.py          # record runtime spans
barx verify .                      # behavioral checks + static risk scan
barx release-check                 # GREEN/AMBER/RED verdict (RED exits 1)
barx report --html report.html     # one self-contained evidence artifact
barx studio                        # local-only visual workspace at 127.0.0.1

See it

RED verdict with Fix First Evidence Sheet in Graphite Dark
A RED run: the verdict, then Fix first — the exact blockers with recommendations. The Evidence Sheet: explanation, limitations, raw JSON collapsed by default.
Release view with score dimensions Self-contained HTML report
The Release view: score dimensions with stated weights and formulas. The portable one-file HTML report — offline, no CDN, secret-redacted.

Every image is a real screenshot of real recorded runs (scripts/make_showcase_runs.py + scripts/capture_screenshots.mjs), not a mockup. The "viewer python" shown in the capsule is the Studio process's interpreter (3.14 on the capture machine) — a labeled viewer fact; the library itself is tested on 3.10–3.12.

Architecture

Barx is strictly layered around one rule — no event, no product. Every feature writes structured events to an append-only JSONL store; explanations, reports, scores, verdicts, and Studio are renderings of those events, never recomputations.

flowchart LR
    M[Instrumentation:<br/>Trace · Verify · API · Guard ·<br/>Adaptive · AI Runtime · AgentAudit] -->|events| S[(events.jsonl<br/>per run)]
    S --> R[build_report] --> H[HTML / JSON report]
    R --> U[Barx Studio]
    S --> G[Score → ReleaseGate] -->|GREEN / AMBER / RED| C[CI · PR comment · exit code]

The full map — layer contracts, the event schema, Guard's single-patch seam model, the privacy table, and extension points — is in docs/architecture.md.

Core concepts

  • Run — one instrumented execution, stored under .barx/runs/<id>/.
  • Event — a structured record with evidence; everything Barx shows is rendered from events, never invented.
  • Evidence — the event ids behind every claim, score, and verdict.
  • Reportbuild_report output, served as JSON or one self-contained HTML file. The portable artifact.
  • Studio — a local-only viewer over that same report data.
  • ReleaseGate — documented GREEN/AMBER/RED rules over the evidence.

Main capabilities

  • Trace — function spans, nesting, boundary exception capture; no argument values captured. (docs)
  • Verify — behavioral verification over your cases + a 20-rule AST risk scan (no code execution). (docs)
  • API — API testing with runtime evidence; auth/tokens redacted. Optional barx[api] extra. (docs)
  • Policy / Guard — runtime guardrails (observe/warn/strict) via reversible patch seams. Not a sandbox. (guard)
  • Drift / Replay — compare two runs (comparative, not causal); replay GET-only and dry-run by default. (drift, replay)
  • Score / ReleaseGate — evidence-backed score and verdict with stated formulas. (score, gate)
  • Adaptive runtime — Collection, Cache, Router, Pipeline: evidence- backed, explainable, overridable switching. (collection)
  • AI runtime — LLMTrace (prompts/responses hashed by default), PromptGuard (heuristic), Cost (estimates from your price table). (llm)
  • AgentAudit — observable evidence of what an AI coding agent did to a repo. (docs)
  • Evidence Testing — Mock (recorded replay), Contract (schema-lite), AutoTest (generated skeletons). (mock)
  • Studio — local visual workspace. (docs)
  • GitHub Action / VS Code MVP — Barx in PRs, CI, and the editor. (action, vscode)

Full module index: docs/README.md.

What Barx is not

  • Not a sandbox. Guard patches documented seams; it is not isolation.
  • Not formal verification. Verify runs real cases and flags risks with evidence; it does not prove the absence of bugs.
  • Not a cloud observability platform. Barx is local-first; nothing is uploaded.
  • Not a Postman replacement. API testing brings runtime evidence to Python; it is not a full API client.
  • Not an LLM provider or client. LLMTrace wraps your callables; Barx makes no provider calls and ships no provider SDKs.
  • Not a guarantee of safety, correct billing, or coverage. PromptGuard is heuristic, Cost is an estimate, AutoTest generates starting points, and a GREEN gate means "no configured blockers in the available evidence" — not proof.

Privacy & security

  • Local-first. No telemetry, no hidden network calls, local storage only.
  • Prompts/responses hashed by default (SHA-256); raw capture is an explicit opt-in that still redacts secrets.
  • Secrets redacted across events, reports, and fixtures (auth headers, tokens, cookies, api keys, passwords).
  • Studio binds 127.0.0.1 by default with no telemetry and no external assets.

What works today

Every row below is implemented and tested. This table is the claims registry — Barx advertises nothing before it works. The detailed, categorized version with allowed/forbidden wording lives in docs/CLAIMS.md.

Area Status
Structured event system (stable schema, JSONL store, corrupt-line recovery) ✅ tested
Runtime manager (fail-soft by default, strict mode opt-in, BARX_DISABLED) ✅ tested
barx.Collection — adaptive backends: list, set, deque, heap, sorted (value mode) and dict (key-value mode) ✅ tested
Collection strategy engine — thresholds + hysteresis + cooldown, confidence with stated formula, alternatives with rejection reasons, conversion-cost estimates ✅ tested
Collection safety — lock_backend, data-preserving rollback (refused with a recorded warning if it would lose data), duplicate/unhashable/uncomparable fallbacks ✅ tested
pop_min() / iter_sorted() on every value backend (cost varies by backend) ✅ tested
Explain engine — evidence-backed answers to what/why/evidence/alternatives/confidence/rollback/override, per collection instance ✅ tested
barx.Trace — function spans (no args/values captured), nesting, exception capture at the trace boundary, include/exclude filters, sampling, max_depth/max_events, fail-soft ✅ tested
Trace ↔ Collection linkage — adaptive decisions during a trace are counted, listed, and linked via related_event_ids ✅ tested
JSON reports (with a trace section: summaries, slowest spans, exceptions) ✅ tested
HTML report — one self-contained file (inline CSS, no JS frameworks, no CDN, offline); explain-style decision cards, trace summary, CSS span timeline, event feed, raw evidence anchors, honest empty states and caps; escaped + secret-redacted; JSON fallback on failure ✅ tested
barx.verify — behavioral verification: real cases, expected/contract checks, exception capture, latency + stability checks, type-hint warnings, redacted evidence ✅ tested
barx.verify_file / verify_project — AST risk scan (20 rules, critical→low), file:line evidence, no code execution ✅ tested
Verification events + explain support + report sections (JSON and HTML) + stated confidence heuristic ✅ tested
barx.API / barx.APISuite — API testing with runtime evidence (optional barx[api] extra): status/latency/header/JSON-path/schema-lite assertions, token capture + chaining, fail-fast or continue, declarative JSON specs ✅ tested
API privacy — auth headers, cookies, and token-like values redacted in all stored evidence; no raw-secret flag exists ✅ tested
barx.Policy / barx.Guard — runtime guardrails (not a sandbox): 10 active rules, observe/warn/strict modes, reversible patch seams always restored (incl. before strict violations propagate), allow_network/allow_file_delete approval contexts, latency budget ✅ tested
Policy events + explain + report sections; evidence redacted; stdlib-internal eval/exec exempted (documented); barx.API runner never falsely flagged ✅ tested
barx.Graph — project graph (best-effort AST structure: imports, classes, inheritance, local calls), runtime graph (evidence-backed from events; no invented links), failure graph (event-supported chains); JSON + Mermaid-text exports, caps with disclosure ✅ tested
barx.Drift — compare two runs across 7 categories with stated thresholds, evidence event ids, improvement findings, and zero causal language (test-enforced) ✅ tested
barx.Replay — dry-run by default, GET-only by default, status-parity assertions, disclosed skips, shell/eval/file/pickle/policy actions never replayed; evidence-based path reconstruction ✅ tested
barx.Score — evidence-backed trust score (formula v1.0 stated in every result: weights, penalty table, evidence ids, limitations; no score without evidence) ✅ tested
barx.ReleaseGate — documented GREEN/AMBER/RED rules (v1.0), release confidence with stated formula, blockers/warnings with evidence, insufficient evidence → AMBER never GREEN, PR-comment markdown ✅ tested
barx.Cache — adaptive caching (lru/lfu/ttl/fifo/no_cache/auto) with evidence-backed strategy switches, decorator, bypass disclosure, injectable clock, RLock ✅ tested
barx.Router — measured-evidence routing (fixed/round_robin/fastest/lowest_error/auto), fair warmup, disclosed fallback, exceptions never swallowed ✅ tested
barx.Pipeline — environment detection via find_spec only (heavy frameworks never imported) + honest workflow recommendations with limitations ✅ tested
barx.LLMTrace — callable wrapper (no provider SDKs, no provider calls): prompts/responses as SHA-256 hashes by default, tokens only when supplied, redacted opt-in capture ✅ tested
barx.PromptGuard — heuristic output validation (JSON, schema-lite, unsafe commands, secret leakage, undeclared tools, injection markers); observe/warn/strict ✅ tested
barx.Cost — estimates from user-supplied price tables only; missing prices/tokens disclosed, never assumed ✅ tested
AI Runtime score dimension (only when LLM events exist) + gate rules + restrained report section with privacy note ✅ tested
barx.AgentAudit — agent-session evidence: before/after snapshots (hash/metadata, contents never stored), dependency diffs, commands/network via Guard's seams, policy links, timeline ✅ tested
barx.Mock — redacted replay fixtures from recorded evidence (X-Barx-Mock: recorded; misses disclosed, never invented; refuses without evidence) ✅ tested
barx.Contract — schema-lite contracts from observed responses; drift = review finding, never a breakage claim ✅ tested
barx.AutoTest — pytest skeletons generated from evidence (review-required banner, deterministic, nothing invented) ✅ tested
Barx Studio — local-only run viewer (barx studio): 127.0.0.1, zero telemetry, no external assets, viewer-not-source-of-truth ✅ tested
Local benchmarks (benchmarks/) incl. honest overhead numbers for Collection and Trace ✅ tested
GitHub Action (barx-release-check) — composite action: verdict, report, fail-on, PR comment via token; shells out to the CLI, no duplicated gate logic ✅ tested
CI workflow (ci.yml) — Python 3.10/3.11/3.12 matrix, ruff + format gates, coverage ≥90% gate ✅ tested
VS Code MVP (vscode/barx) — status bar + commands, shells out to CLI only, no telemetry/cloud/chat ✅ tested
CLI: version, runs, latest, explain, report, trace, verify, api test, policy, guard, graph, drift, replay, score, release-check, pipeline, llm, cost, prompt-guard, agent-audit, mock, contract, autotest, studio, ci comment--json where applicable ✅ tested

Limitations

Barx shows what the evidence holds; absent evidence renders as an empty state, never a guess. Guard is not isolation. Drift is comparative, not causal. PromptGuard, Score, and the gate are documented heuristics, not proofs. Cost is an estimate from your price table. AgentAudit cannot see inside child processes. AutoTest output requires human review. Supported on Python 3.10–3.12 (3.13/3.14 are unverified). The full list lives in docs/AUDIT.md and each module's doc.

Principles

  • No event, no product. Explanations and reports are rendered from recorded events, never invented.
  • Fail soft. Instrumentation failures never break your program unless you opt into strict mode.
  • No magic. Every adaptive switch is loggable, explainable, rollback- able, and overridable.
  • Private by default. No telemetry, no hidden network calls, local storage only.

Development

python -m venv .venv && .venv/bin/pip install -e ".[dev]"
.venv/bin/pytest --cov=barx        # full suite, coverage ≥ 90%
.venv/bin/ruff check barx tests && .venv/bin/ruff format --check barx tests
python scripts/launch_smoke.py     # end-to-end launch smoke
python scripts/run_examples.py     # run all safe examples

Docs index: docs/README.md · Architecture: docs/architecture.md · Website: docs/website.md · Claims: docs/CLAIMS.md · Changelog: CHANGELOG.md · Roadmap: docs/ROADMAP.md

License

MIT © 2026 Karthik Barma. See LICENSE. Built under the Aura banner.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

barx-1.0.0.tar.gz (284.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

barx-1.0.0-py3-none-any.whl (228.6 kB view details)

Uploaded Python 3

File details

Details for the file barx-1.0.0.tar.gz.

File metadata

  • Download URL: barx-1.0.0.tar.gz
  • Upload date:
  • Size: 284.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for barx-1.0.0.tar.gz
Algorithm Hash digest
SHA256 8906b1c1570cb47ec268240d2ad9eea30bb2fd564ec9c49248eea2bae7800481
MD5 f13120db8b815922c07fc48761e18df4
BLAKE2b-256 3e1a28d3ef515f3ccae61116ada96137b6e93d53e4b8106d5344a1a861d30c82

See more details on using hashes here.

Provenance

The following attestation bundles were made for barx-1.0.0.tar.gz:

Publisher: workflow.yml on TheBarmaEffect/Barx

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file barx-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: barx-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 228.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for barx-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b0c2e0b2a93d52116d58de36991284c5a41f49b29beb53f62cea2f4b5748ecce
MD5 6d192859775bc12c240de8aae0f5934d
BLAKE2b-256 d147b5394bd3a531fa0395a803848c7204da319a489fcc7cd3fce50c28b799b8

See more details on using hashes here.

Provenance

The following attestation bundles were made for barx-1.0.0-py3-none-any.whl:

Publisher: workflow.yml on TheBarmaEffect/Barx

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page