Forensic-grade tamper-evident audit chain for LLM applications. HMAC-SHA256 chain, content-addressable storage, pre-call policy gate, regression detection.
Project description
BIJOTEL
Forensic-grade tamper-evident audit chain for LLM applications.
BIJOTEL adds tamper-evidence (HMAC-SHA256 chain), content-addressable storage, and pre-call policy gating to existing OpenTelemetry GenAI pipelines (OpenLLMetry, custom instrumentations, etc.). It does NOT replace your tracer — it extends it.
Status: v1.0.0 — production-ready core (chain + CAS + policy + regression). Layers (fingerprint, AST safety, routing, misalignment probes, Combo D containment) are stable. API surface frozen for v1.x.
Install
pip install bijotel
Optional extras:
pip install bijotel[anthropic] # Anthropic SDK + instrumentation
pip install bijotel[openai] # OpenAI SDK
pip install bijotel[api] # FastAPI + uvicorn (for `bijotel serve`)
pip install bijotel[fingerprint] # sentence-transformers (semantic dedup)
pip install bijotel[ast] # tree-sitter (bash AST safety)
pip install bijotel[all] # everything above
Quickstart
import os
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from bijotel.processors import HmacChainSpanProcessor, CasSpanProcessor
provider = TracerProvider()
provider.add_span_processor(
HmacChainSpanProcessor(
secret_key=bytes.fromhex(os.environ["BIJOTEL_HMAC_SECRET"]),
db_path="chain.db",
)
)
provider.add_span_processor(CasSpanProcessor(db_path="chain.db"))
trace.set_tracer_provider(provider)
# Now any OTel-instrumented LLM call is sealed in the chain.
Verify integrity later:
bijotel verify --db chain.db
Features (13/20 bijuterii catalog patterns covered)
- #1 Permitted/Safe/Sealed — three-question safety frame (Combo D)
- #2 Content-Addressable Storage + Merkle DAG — dedup + reference graph
- #5 AST-First Code Safety — tree-sitter bash + stdlib Python AST scan
- #7 Deterministic + Semantic Fingerprinting — SHA-256 + embeddings
- #10 Compliance-as-Code — PII / output-length / model-pin / cost rules
- #11 Forensic-First (HMAC chain) — JCS + SHA-256 + HMAC tamper-evidence
- #15 Inference Routing — Pareto cost/quality/latency selector + budget
- #16 Regression Detection — z-score + IQR drift detection on tokens/cost
- #18 Misalignment Probes — 29 builtin probes across 8 attack categories
- Plus: provider adapters (Anthropic, OpenAI),
@trace_genaidecorator, portable signed JSON chain export.
Docker
docker run -p 8080:8080 \
-v $(pwd)/data:/data \
-e BIJOTEL_HMAC_SECRET=$(openssl rand -hex 32) \
bijotel/bijotel:1.0.0
See docker-compose.yml in the repo for the full reference deploy.
Architecture
BIJOTEL is a plug-in. You keep your existing OpenTelemetry tracer (e.g., opentelemetry-instrumentation-anthropic). BIJOTEL adds three reusable SpanProcessors:
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.instrumentation.anthropic import AnthropicInstrumentor
from bijotel.processors import (
HmacChainSpanProcessor, # F2: tamper-evident audit chain
CasSpanProcessor, # F3: content-addressable storage
PolicyGateSpanProcessor, # F4: in-process policy gate
)
provider = TracerProvider()
provider.add_span_processor(HmacChainSpanProcessor(secret_key="..."))
provider.add_span_processor(CasSpanProcessor(store_path="./cas.db"))
provider.add_span_processor(PolicyGateSpanProcessor(rules=[...]))
trace.set_tracer_provider(provider)
AnthropicInstrumentor().instrument() # tracer rămâne upstream
Custom Code Tracing (@trace_genai)
For LLM calls outside instrumentation-anthropic coverage (custom wrappers,
non-Anthropic providers, multi-provider clients), use the @trace_genai
decorator or bijotel.wrap() runtime equivalent:
from bijotel import trace_genai
# Anthropic-style API: defaults work
@trace_genai(provider="anthropic")
def call_claude(*, model, messages, max_tokens):
return client.messages.create(model=model, messages=messages, max_tokens=max_tokens)
# Custom API: provide extractors (e.g. for multi-provider wrappers)
@trace_genai(
name="ara.llm.call",
provider="ara",
request_extractor=lambda kw: {
"model": kw["cfg"].model_id,
"messages": kw["messages"],
"max_tokens": kw["cfg"].max_tokens,
},
response_extractor=lambda resp: {
"input_tokens": resp.input_tokens,
"output_tokens": resp.output_tokens,
},
extra_attrs={"ara.deployment": "prod"}, # constants only
)
async def complete(self, *, agent_id, messages, cfg, ...):
return await self._dispatch(...)
Auto-detects sync/async via asyncio.iscoroutinefunction. All emitted spans
pass through HmacChain/CAS/Policy processors normally. Exceptions in the
wrapped function set span status to ERROR and re-raise. Extractor failures
log to bijotel.extractor_error attribute without crashing the call.
bijotel.wrap(fn, ...) is the runtime alternative — same behavior, no
source modification needed (third-party libs, dynamic dispatch).
Note: dual audit when combining @trace_genai with AnthropicInstrumentor
If you decorate a function that internally calls client.messages.create()
while AnthropicInstrumentor().instrument() is active, two spans are
emitted per call:
- Outer span: from
@trace_genai(your wrapper boundary) - Inner span: from
AnthropicInstrumentor(the SDK call itself)
Both are sealed in the chain. This is intentional — the outer span captures
your application context (e.g. ara.agent_id, ara.org_id), the inner span
captures the raw SDK request/response. Together they give you full audit
coverage at two granularities.
If you want only one audit layer, choose one approach:
- Decorator only (single span per logical call): don't call
AnthropicInstrumentor().instrument() - Instrumentation only (single span per SDK call): don't decorate your wrapper
Storage cost of dual audit: ~2× span count. For most workloads this is trivial; for high-volume production, pick one layer.
Provider Adapters (F7)
Provider Protocol unifies LLM provider integration. Adapters implement
contract methods, enabling clean @trace_genai integration via
provider=adapter shorthand:
from bijotel import trace_genai
from bijotel.adapters import AnthropicAdapter
adapter = AnthropicAdapter()
@trace_genai(provider=adapter)
async def my_call(*, model, messages, max_tokens):
return await adapter.complete(
messages=messages, model=model, max_tokens=max_tokens
)
The decorator auto-extracts:
gen_ai.provider.namefromadapter.name- Request attrs from
adapter.extract_request_attrs() - Response attrs from
adapter.extract_response_attrs()
Explicit request_extractor= / response_extractor= always override
adapter-supplied methods (escape hatch preserved).
Calling the adapter directly returns a normalized ProviderResponse:
response = await adapter.complete(
messages=[{"role": "user", "content": "hi"}],
model="claude-haiku-4-5-20251001",
max_tokens=20,
)
print(response.text, response.input_tokens, response.output_tokens)
Available adapters:
AnthropicAdapter— Anthropic Claude (usesanthropic.AsyncAnthropic). Install:pip install bijotel[anthropic].OpenAIAdapter— OpenAI GPT (usesopenai.AsyncOpenAI). Install:pip install bijotel[openai].
from bijotel import trace_genai
from bijotel.adapters import OpenAIAdapter
adapter = OpenAIAdapter()
@trace_genai(provider=adapter)
async def call_gpt(*, model, messages, max_tokens):
return await adapter.complete(
messages=messages, model=model, max_tokens=max_tokens
)
# Direct call:
response = await adapter.complete(
messages=[{"role": "user", "content": "hi"}],
model="gpt-4o-mini",
max_tokens=20,
)
Same Provider Protocol, same ProviderResponse shape — only the SDK underneath differs. F7 validated empirical with two consumers (Anthropic + OpenAI).
Adding new providers — subclass Provider:
from bijotel.adapters import Provider, ProviderResponse
class OpenAIAdapter(Provider):
@property
def name(self) -> str:
return "openai"
def extract_request_attrs(self, kwargs): ...
def extract_response_attrs(self, response): ...
async def complete(self, *, messages, model, max_tokens, **kwargs):
raw = await self.client.chat.completions.create(...)
return ProviderResponse(
text=raw.choices[0].message.content,
model=raw.model,
input_tokens=raw.usage.prompt_tokens,
output_tokens=raw.usage.completion_tokens,
response_id=raw.id,
finish_reason=raw.choices[0].finish_reason,
raw_response=raw,
)
Backward-compatible: passing provider="anthropic" (string) still works
exactly as in F5 — Provider object is opt-in.
Policy Gate
The PolicyEngine evaluates pre-call rules against request payload (model, messages, max_tokens, …) and returns a Decision (allow / warn / deny). Use the guard decorator for the typical "wrap an LLM call" pattern, or call PolicyEngine directly for custom integration.
PolicyEngine direct usage
from bijotel import PolicyEngine, cost_per_call_max, model_allowlist
engine = PolicyEngine(rules=[
cost_per_call_max(usd=0.50),
model_allowlist("claude-haiku-4-5", "claude-sonnet-4-20250514"),
])
request = {"model": "claude-haiku-4-5", "messages": [...], "max_tokens": 100}
decision = engine.evaluate(request)
if decision.is_deny:
print(f"Blocked by {decision.rule}: {decision.reason}")
elif decision.is_warn:
print(f"Warning from {decision.rule}: {decision.reason}") # call still proceeds
else:
print("Allowed")
engine.evaluate() short-circuits on first deny. Warnings are collected and attached as bijotel.policy.warning attributes on emitted spans. See Decision and State classes in bijotel.policy.decision.
model_allowlist
Restrict which models can be called via your wrapper. Useful for cost control + audit.
from bijotel import model_allowlist
# Deny if model not in list
rule = model_allowlist("claude-haiku-4-5", "claude-sonnet-4-20250514", mode="deny")
# Warn-only mode (audit + proceed)
rule_audit = model_allowlist("claude-haiku-4-5", mode="warn")
prompt_pattern_deny (F11)
Block prompts matching jailbreak / prompt-injection regex patterns before the SDK call is made. Five attack categories covered out of the box: instruction override ("ignore previous instructions"), system prompt extraction ("reveal your system prompt"), role override ("you are now a different AI"), jailbreak framing ("DAN mode", "developer mode"), encoding bypass (base64:, rot13). Defaults are case-insensitive and applied via re.search.
from bijotel import prompt_pattern_deny
# Defaults only (DEFAULT_JAILBREAK_PATTERNS, ~15 patterns, 5 categories)
rule = prompt_pattern_deny()
# Custom patterns appended to defaults (defaults checked first)
rule = prompt_pattern_deny(
patterns=[r"my_company_secret", r"\bAPI[_-]KEY\b"],
)
# Custom patterns only — defaults disabled
rule = prompt_pattern_deny(
patterns=[r"sensitive_term"], use_defaults=False
)
# Warn mode — audit but allow (recommended for first deployment)
rule_audit = prompt_pattern_deny(mode="warn")
Handles both Anthropic SDK (messages=[{"role": "user", "content": "..."}]) and Anthropic multipart format (content=[{"type": "text", "text": "..."}]), plus OpenAI-style messages — extracts and concatenates text content from all roles before matching.
Suggested rollout: deploy in mode="warn" first to surface false positives via bijotel.policy.warning span attributes, review for ~1 week, then flip to mode="deny". False positives are easier to diagnose than false negatives in this domain.
Pattern catalog adapted from substrate-guard's agent_safety.rego dangerous_patterns concept (separate project, read-only access). The substrate-guard version targets filesystem / network / shell actions; this BIJOTEL adaptation targets LLM prompts (instruction overrides, system-prompt extraction, role overrides, jailbreak framings, encoding bypass).
PolicyDeniedError
Raised by guard() decorator when a rule returns Decision.deny. Catch it in your application code to surface a useful message:
from bijotel import guard, PolicyDeniedError, cost_per_call_max
@guard(rules=[cost_per_call_max(usd=0.10)])
def call_llm(*, model, messages, max_tokens):
return client.messages.create(model=model, messages=messages, max_tokens=max_tokens)
try:
response = call_llm(model="claude-opus-4-7", messages=[...], max_tokens=4000)
except PolicyDeniedError as e:
print(f"Policy denied: rule={e.rule!r}, reason={e.reason!r}")
# → returns to user instead of leaking expensive call
Chain export — programmatic API
CLI is the typical use, but export_chain and verify_export are exposed as public functions for programmatic integration (e.g. scheduled audit-trail uploads, CI verification jobs):
from pathlib import Path
from bijotel import export_chain, verify_export
secret = bytes.fromhex("<your hex secret>") # min 16 bytes
# Export
out = export_chain(
db_path=Path("/data/bijotel_chain.db"),
output_path=Path("/var/audit/audit_2026-05-10.json"),
secret_key=secret,
)
# → "/var/audit/audit_2026-05-10.json"
# Verify (auditor side, only needs secret + JSON file)
valid, reason = verify_export(out, secret)
if not valid:
raise RuntimeError(f"Audit trail tampered: {reason}")
Schema: bijotel-chain-v1. Per-entry HMAC + file-level chain_signature. Integrity verifiable with shared secret only — no SQLite access required.
Regression Detection (F12, Bijuteria #16)
Detect drift in token usage / cost over time using z-score + IQR methods on the BIJOTEL chain.db. Empirically motivated by patterns observed during GENA deployment (T+2h checkpoint revealed bimodal quality distributions and dimension-specific bottlenecks worth monitoring temporally).
Programmatic API
from bijotel import RegressionDetector, AnomalyMethod
detector = RegressionDetector(
db_path="chain.db",
baseline_window=100, # Use last 100 spans as baseline
z_threshold=3.0, # Flag values > 3σ from mean
iqr_multiplier=1.5, # Tukey-style IQR outlier
method=AnomalyMethod.BOTH, # Require BOTH methods to flag (low FP)
)
# Single dimension
anomalies = detector.detect("input_tokens")
for a in anomalies:
print(f" seq={a.seq} value={a.value} z={a.z_score:.2f} severity={a.severity}")
# All 3 dimensions (input_tokens, output_tokens, cost)
results = detector.detect_all_dimensions(filter_model="claude-haiku-4-5-20251001")
CLI usage
# Scan all 3 dimensions on entire chain (default: last 50 spans vs prior 100)
bijotel regression --db chain.db
# Single dimension, specific model
bijotel regression --db chain.db --dimension cost --model claude-sonnet-4-20250514
# Custom baseline window + sensitivity
bijotel regression --db chain.db --window 200 --z-threshold 2.5
Exit codes: 0 no anomalies, 1 anomalies detected, 2 invalid args.
Detection methods
- z-score (parametric):
z = (value - baseline.mean) / baseline.stdev. Fast for Gaussian-like signals (most token counts when calls are similar). - IQR (non-parametric, Tukey): flag if
value < p25 - k·iqrORvalue > p75 + k·iqr. Robust to heavy-tailed distributions (cost can spike). AnomalyMethod.BOTH(default): flags only when BOTH agree → minimizes false positives. UseZ_SCOREorIQRalone for broader detection.
Severity levels
anomaly— both z-score AND IQR triggered (high confidence drift).warning— only one method triggered (worth review, lower confidence).
Limitations
- Requires ≥5 baseline samples (
MIN_SAMPLES); insufficient data returns empty list (no anomalies, but no false negatives surfaced either). - Cost dimension requires model in
DEFAULT_PRICESprice table (seepolicy/prices.py); spans with unknown models contribute no cost datapoint. - Single chain.db per
RegressionDetectorinstance — no cross-chain analysis in v0.3.0.
Shutting down BIJOTEL
shutdown() flushes any pending spans and tears down the global TracerProvider. Important when running scripts that exit immediately (without flush, last spans may be lost).
from bijotel import init, shutdown
init(...)
# ... do work, emit spans ...
shutdown() # flushes processors, releases resources
shutdown() is idempotent — safe to call multiple times.
Development install
git clone <repo>
cd BIJOTEL
pip install -e ".[anthropic,api,fingerprint,ast,dev]"
pytest
CLI
After install, the bijotel command is available:
# Verify chain integrity (requires HMAC secret)
export BIJOTEL_HMAC_SECRET=<hex>
bijotel verify --db chain.db
# Inspect a span (by hex span_id or integer seq)
bijotel inspect --db chain.db 1
bijotel inspect --db chain.db abc123def456
# Summary stats (chain + CAS + policy daily state)
bijotel stats --db chain.db
# List spans with filters
bijotel list --db chain.db
bijotel list --db chain.db --blocked
bijotel list --db chain.db --rule cost_per_call_max
bijotel list --db chain.db --model claude-haiku-4-5-20251001
bijotel list --db chain.db --since 2026-05-07 --limit 100
# Export chain to portable signed JSON (verifiable by external auditors)
bijotel export --db chain.db --output audit_trail.json
# Verify integrity of an exported JSON (no DB needed, just secret)
bijotel verify-export audit_trail.json
# Run the HTTP API server (requires `pip install bijotel[api]`)
bijotel serve --port 8080 --db chain.db
# GET /health, /version, /docs (OpenAPI / Swagger UI)
--since uses calendar date UTC (YYYY-MM-DD, lower bound 00:00:00Z), consistent with daily_token_budget rule.
Validation
End-to-end smoke test on real Anthropic API exercising the full BIJOTEL stack
(HmacChain + CAS + PolicyGate + AnthropicInstrumentor + @trace_genai
decorator + all 6 CLI commands):
export ANTHROPIC_API_KEY=sk-ant-...
export BIJOTEL_HMAC_SECRET=$(python -c "import secrets; print(secrets.token_hex(32))")
python scripts/e2e_smoke.py
Cost: ~$0.001 per run (3-4 real Haiku calls; denied calls don't hit network).
The script validates:
- Chain integrity end-to-end (
bijotel verifyreturns VALID) - CAS dedup on identical input (ref_count > 1 for repeated calls)
- Policy gate enforcement (denied calls produce synthetic spans, no SDK call)
- All 6 CLI subcommands return exit 0
- Custom
@trace_genaidecorator works alongsideAnthropicInstrumentor
Roadmap
Shipped in v1.0.0:
- F0–F6: Core (skeleton → init → HMAC chain → CAS → policy gate → decorator → CLI)
- F7: Provider protocol + AnthropicAdapter + OpenAIAdapter
- F8: Portable signed JSON chain export
- F11:
prompt_pattern_deny(regex jailbreak/injection detection) - F12: Regression detection (z-score + IQR over tokens/cost)
- F13: Deterministic + semantic fingerprinting layer
- F14: AST safety layer (tree-sitter bash + stdlib Python ast)
- F15: Inference routing (Pareto cost/quality/latency + budget)
- F16: CAS Merkle DAG (content-addressable + reference graph)
- F17: Misalignment probe library (29 probes × 8 attack categories)
- F18: Combo D containment guard (Policy + AST + chain seal)
- Compliance rules: PII / output-length / model-pin
- CLI: verify + inspect + stats + list + export + verify-export + regression + serve
- Hardening: WAL + busy_timeout + BEGIN IMMEDIATE, crash isolation, perms, lockfile
- FastAPI
bijotel serve(health + version, full chain/policy/regression in v1.1.0) - Docker image + docker-compose example
Planned:
- v1.1.0 — FastAPI chain/policy/regression endpoints
- v1.2.0 — Dashboard (chain explorer + policy + regression)
- v1.3.0 — Consensus voting (Bijuteria #9) + energy accounting (#3)
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file bijotel-1.0.0.tar.gz.
File metadata
- Download URL: bijotel-1.0.0.tar.gz
- Upload date:
- Size: 176.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f804390c937fd0a960e5bad02c16574e0d7d2d0b053f6dd2c71c7fe075650336
|
|
| MD5 |
1b41c57f5a0a1d02c569f8488a086e5f
|
|
| BLAKE2b-256 |
01c7fdcfa08b69e6da86189af6736a4cf96c80b575a600e4a07ca43654272391
|
File details
Details for the file bijotel-1.0.0-py3-none-any.whl.
File metadata
- Download URL: bijotel-1.0.0-py3-none-any.whl
- Upload date:
- Size: 97.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7ffd25e14f4aa5e46e4b26f252263555f43e3d27eee26d8353ccc7a24442ac42
|
|
| MD5 |
edd0600137d1589af6b0c8f618afa341
|
|
| BLAKE2b-256 |
df2e9796a582eae0887b2a69540fdc1d0ebaa36340ab032f9c2e1e89cdc780d3
|