Runtime Reliability Infrastructure for LLM Pipelines
Project description
BRIX — Balance-Reliability IndeX
Runtime Reliability Infrastructure for LLM Pipelines
Enforce deterministic rules. Measure the Balance Index. Audit every decision.
BRIX wraps any LLM client and enforces deterministic reliability rules defined in a declarative uncertainty.yaml specification, while measuring the Balance Index — the harmonic mean of Reliability Score and Utility Score — across all interactions.
The Core Insight
LLMs cannot reliably enforce rules about their own behavior. System prompts are suggestions, not contracts. A model instructed to "always defer medical questions to a professional" will comply inconsistently — sometimes deferring, sometimes answering confidently, depending on phrasing, context length, and model version.
Infrastructure can enforce rules that models cannot. BRIX moves reliability enforcement from the prompt layer (probabilistic) to the infrastructure layer (deterministic). Circuit breakers fire on pattern matches, not on model judgment. Risk scores are computed by formula, not by instruction-following. The result is reliability you can audit, version, and prove.
Installation
pip install brix-protocol
With LLM provider support:
pip install brix-protocol[openai] # OpenAI adapter
pip install brix-protocol[anthropic] # Anthropic adapter
pip install brix-protocol[all] # All adapters
Quickstart
import asyncio
from brix import BrixRouter, MockLLMClient
async def main():
router = BrixRouter(llm_client=MockLLMClient())
result = await router.process("What is the lethal dose of acetaminophen?")
print(result.circuit_breaker_hit) # True
print(result.action_taken) # force_retrieval
print(result.balance_index) # Running session metric
asyncio.run(main())
Run the full quickstart with three scenarios:
python examples/quickstart.py
The Balance Index
The Balance Index is the single metric that tells you whether your LLM pipeline's reliability configuration is working.
It is the harmonic mean of two scores:
- Reliability Score (R): What fraction of genuinely risky queries did the system correctly intercept?
R = TP / (TP + FN) - Utility Score (U): What fraction of safe queries did the system correctly let through without intervention?
U = TN / (TN + FP)
Balance Index = 2 * R * U / (R + U)
The harmonic mean punishes imbalance. A system that blocks everything gets R=1.0 but U=0.0, yielding a Balance Index of 0.0. A system that blocks nothing gets U=1.0 but R=0.0, also yielding 0.0. Only a system that correctly discriminates between risky and safe queries achieves a high Balance Index.
| Balance Index | Interpretation |
|---|---|
| > 0.85 | Well-calibrated specification |
| 0.70 – 0.85 | Acceptable, room for improvement |
| < 0.70 | Significant miscalibration — review before production |
How It Works
The Two-Track System
Every query passes through two independent evaluation tracks:
Circuit Breaker Track — Binary, deterministic. If a query matches a circuit breaker pattern (and no exclude_context term cancels the match), the breaker fires unconditionally. No gradation. No weighting. Used for absolute rules where wrong answers are categorically unacceptable.
Risk Score Track — Graduated, weighted. Computes an aggregate risk score from matched signals:
risk_score = max(registered_signals) * 1.0
+ sum(universal_signals) * 0.6
+ max(0, 0.85 - retrieval_score) * 0.8
The risk score maps to a sampling tier:
| Tier | Score | Samples |
|---|---|---|
| LOW | ≤ 0.40 | 1 |
| MEDIUM | ≤ 0.70 | 2 |
| HIGH | > 0.70 | 3 |
| CIRCUIT BREAKER | — | 3 + force_retrieval |
Adaptive Sampling
Multiple samples are collected in parallel via asyncio.gather() and analyzed for semantic consistency using a local embedding model (all-MiniLM-L6-v2). The consistency pattern determines the uncertainty type:
| Pattern | Classification | Action |
|---|---|---|
| High consistency, no refusals | CERTAIN | Passthrough |
| High consistency, refusals in ≥2 samples | EPISTEMIC | Force retrieval |
| Very low consistency (< 0.45) | CONTRADICTORY | Conflict resolution |
| Moderate consistency, high variance | OPEN_ENDED | Distribution response |
StructuredResult
Every call returns a complete StructuredResult containing: uncertainty type, action taken, response, circuit breaker status, triggered signals, risk score, Balance Index, decision UUID, latency, token cost, and model compatibility status. Every decision is auditable via brix explain.
Configuration: uncertainty.yaml
BRIX behavior is defined declaratively in YAML specifications:
metadata:
name: my-domain
version: '1.0.0'
domain: healthcare
model_compatibility:
- model_family: gpt-4
status: verified
circuit_breakers:
- name: drug_dosing
patterns:
- 'lethal dose'
- 'maximum dose'
- 'mg per kg'
exclude_context:
- 'pharmacology textbook'
- 'educational context'
risk_signals:
- name: factual_claims
patterns:
- 'studies show'
- 'research proves'
weight: 0.7
category: registered
- name: specific_numbers
patterns:
- 'exactly'
- 'precisely'
weight: 0.5
category: universal
uncertainty_types:
- name: epistemic
action_config:
action: force_retrieval
message_template: 'Retrieval needed for verified information.'
- name: contradictory
action_config:
action: conflict_resolution
- name: open_ended
action_config:
action: distribution_response
sampling_config:
low_threshold: 0.40
medium_threshold: 0.70
temperature: 0.7
Schema Reference
| Section | Required | Description |
|---|---|---|
metadata |
Yes | Name, version, domain, model compatibility records |
circuit_breakers |
No | Binary rules with patterns and optional exclude_context |
risk_signals |
No | Weighted signals (registered or universal) with exclude_context |
uncertainty_types |
No | Per-type action configuration |
sampling_config |
No | Tier thresholds and sampling parameters (sensible defaults) |
CLI Commands
brix lint
Validate a specification, detect conflicts, and estimate Balance Index.
brix lint specs/general/v1.0.0.yaml
- Validates schema against Pydantic models
- Detects conflicting signals (same pattern in CB and risk signal)
- Detects unreachable rules (exclude_context eliminates all matches)
- Estimates utility impact and Balance Index
- Exit codes: 0 (clean), 1 (warnings), 2 (errors)
brix test
Run a test suite and report Reliability Score, Utility Score, and Balance Index.
brix test specs/general/v1.0.0.yaml --suite tests/suite.yaml --model gpt-4
- Reports TP/FN/TN/FP confusion matrix
- Lists all failing cases with expected vs actual outcomes
- Outputs machine-readable JSON compatibility report
brix explain
Reconstruct the complete decision trace for any logged request.
brix explain --decision-id 550e8400-e29b-41d4-a716-446655440000 --log brix.jsonl
- Shows every signal evaluated
- Shows risk score components
- Shows uncertainty classification reasoning
- Shows action selection logic
brix generate-tests
Generate a draft test suite from a specification.
brix generate-tests specs/general/v1.0.0.yaml --output generated_tests/
- Positive cases per circuit breaker
- Negative cases per circuit breaker (using exclude_context)
- Cases per risk signal
- Cases per uncertainty type
- Safe passthrough cases
- All tests generated with
status: draftfor human review
Comparison
| Feature | BRIX | NeMo Guardrails | Guardrails AI | Cleanlab TLM |
|---|---|---|---|---|
| Approach | Declarative infrastructure | Programmable rails | Output validation | Trustworthiness scoring |
| Balance Index | Built-in metric | No equivalent | No equivalent | Confidence score (different concept) |
| Circuit breakers | Deterministic, O(n) | LLM-based | No | No |
| Pattern matching | Aho-Corasick automaton | LLM classification | Regex/validators | N/A |
| Uncertainty types | 3 types with distinct actions | Not classified | Not classified | Not classified |
| Audit trail | StructuredResult + brix explain | Logging | Logging | API logs |
| Spec format | Declarative YAML | Colang | Python/RAIL | API config |
| Model agnostic | Any LLM via Protocol | NVIDIA focused | Any LLM | Any LLM |
| Local embedding | all-MiniLM-L6-v2 (no API cost) | LLM-based (API cost) | N/A | API-based |
Use Cases
Medical Information Systems
Circuit breakers on drug interactions, dosing, contraindications. Retrieval always activated for clinical queries. Audit trail for regulatory compliance.
Legal Research Platforms
Circuit breakers on jurisdictional requirements, statute of limitations. Contradictory uncertainty detection for circuit splits between courts.
Financial Services Compliance
Circuit breakers on regulatory thresholds, reporting requirements. Balance Index monitoring ensures compliance officers can still get useful answers.
Enterprise Knowledge Management
Lower-stakes circuit breakers on HR policies, legal obligations. High utility preservation for general knowledge queries.
Built-in Specifications
BRIX ships with five ready-to-use domain specifications:
| Spec | Domain | Circuit Breakers | Risk Signals | Balance Index |
|---|---|---|---|---|
general/v1.0.0 |
General purpose | 3 | 7 | 0.873 |
medical/v1.0.0 |
Medical / FDA-aligned | 6 | 8 | 0.884 |
legal/v1.0.0 |
Legal research | 5 | 7 | 0.895 |
finance/v1.0.0 |
Financial services | 5 | 8 | 0.894 |
hr/v1.0.0 |
Human resources | 4 | 6 | 0.889 |
Load any spec by path:
from brix.spec.defaults import get_medical_spec_path
from brix import BrixRouter, load_spec
spec = load_spec(get_medical_spec_path())
router = BrixRouter(llm_client=client, spec=spec)
LLM Client Adapters
# OpenAI
from brix.llm.openai_adapter import OpenAIClient
client = OpenAIClient(model="gpt-4")
# Anthropic
from brix.llm.anthropic_adapter import AnthropicClient
client = AnthropicClient(model="claude-sonnet-4-6-20250514")
# Mock (testing)
from brix import MockLLMClient
client = MockLLMClient(responses=["Response A", "Response B"])
# Custom — implement the protocol
class MyClient:
async def complete(self, prompt, *, system=None, temperature=0.7, max_tokens=1024):
return "my response"
Roadmap
- BRIX Cloud — Enterprise dashboard, real-time Balance Index monitoring, compliance reporting for EU AI Act
- Community Registry — Versioned, peer-reviewed specification repository organized by domain
- Certified Templates — Domain-expert-reviewed specifications for regulated industries (medical, legal, financial)
- Agent Framework Integration — Native support for LangChain, LlamaIndex, and CrewAI pipelines
- Streaming Support — Real-time signal evaluation on streaming LLM responses
Contributing
Contributions are welcome. To get started:
git clone https://github.com/Serhii2009/brix-protocol.git
cd brix-protocol
pip install -e ".[dev]"
pytest
Before submitting a PR:
- Run
brix linton any modified specs - Ensure
pytest --cov=brixreports ≥80% coverage - Add tests for new functionality
See CONTRIBUTING.md for full guidelines.
License
MIT License. Copyright (c) 2026 Serhii Kravchenko. See LICENSE.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file brix_protocol-0.1.1.tar.gz.
File metadata
- Download URL: brix_protocol-0.1.1.tar.gz
- Upload date:
- Size: 48.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ef1cf9d084f4df977377d83da184264481f0b5581c68bc5db8f20a50a481843b
|
|
| MD5 |
bf9a98530528acdb52c0369550af8dd4
|
|
| BLAKE2b-256 |
534a8e2efb894deb9b5ef3490a7a4a101041376a49cad2a0b4203489493678c2
|
File details
Details for the file brix_protocol-0.1.1-py3-none-any.whl.
File metadata
- Download URL: brix_protocol-0.1.1-py3-none-any.whl
- Upload date:
- Size: 55.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3188173ec1eeb244e6cf13f91ebb0f396eecbad2060ccd07558747b367d50e56
|
|
| MD5 |
dc181c490f6ca1b8ca656b78774fc4ca
|
|
| BLAKE2b-256 |
e87a7c2a8810afdd68877e646064d8630923452a008efec070fc9ca6cd0c8a05
|