Skip to main content

Forensic-grade tamper-evident audit chain for LLM applications. HMAC-SHA256 chain, content-addressable storage, pre-call policy gate, regression detection.

Project description

BIJOTEL

Forensic-grade tamper-evident audit chain for LLM applications.

BIJOTEL adds tamper-evidence (HMAC-SHA256 chain), content-addressable storage, and pre-call policy gating to existing OpenTelemetry GenAI pipelines (OpenLLMetry, custom instrumentations, etc.). It does NOT replace your tracer — it extends it.

Status: v1.0.0 — production-ready core (chain + CAS + policy + regression). Layers (fingerprint, AST safety, routing, misalignment probes, Combo D containment) are stable. API surface frozen for v1.x.

Install

pip install bijotel

Optional extras:

pip install bijotel[anthropic]     # Anthropic SDK + instrumentation
pip install bijotel[openai]        # OpenAI SDK
pip install bijotel[api]           # FastAPI + uvicorn (for `bijotel serve`)
pip install bijotel[fingerprint]   # sentence-transformers (semantic dedup)
pip install bijotel[ast]           # tree-sitter (bash AST safety)
pip install bijotel[all]           # everything above

Quickstart

import os
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider

from bijotel.processors import HmacChainSpanProcessor, CasSpanProcessor

provider = TracerProvider()
provider.add_span_processor(
    HmacChainSpanProcessor(
        secret_key=bytes.fromhex(os.environ["BIJOTEL_HMAC_SECRET"]),
        db_path="chain.db",
    )
)
provider.add_span_processor(CasSpanProcessor(db_path="chain.db"))
trace.set_tracer_provider(provider)

# Now any OTel-instrumented LLM call is sealed in the chain.

Verify integrity later:

bijotel verify --db chain.db

Features (13/20 bijuterii catalog patterns covered)

  • #1 Permitted/Safe/Sealed — three-question safety frame (Combo D)
  • #2 Content-Addressable Storage + Merkle DAG — dedup + reference graph
  • #5 AST-First Code Safety — tree-sitter bash + stdlib Python AST scan
  • #7 Deterministic + Semantic Fingerprinting — SHA-256 + embeddings
  • #10 Compliance-as-Code — PII / output-length / model-pin / cost rules
  • #11 Forensic-First (HMAC chain) — JCS + SHA-256 + HMAC tamper-evidence
  • #15 Inference Routing — Pareto cost/quality/latency selector + budget
  • #16 Regression Detection — z-score + IQR drift detection on tokens/cost
  • #18 Misalignment Probes — 29 builtin probes across 8 attack categories
  • Plus: provider adapters (Anthropic, OpenAI), @trace_genai decorator, portable signed JSON chain export.

Docker

docker run -p 8080:8080 \
    -v $(pwd)/data:/data \
    -e BIJOTEL_HMAC_SECRET=$(openssl rand -hex 32) \
    bijotel/bijotel:1.0.0

See docker-compose.yml in the repo for the full reference deploy.


Architecture

BIJOTEL is a plug-in. You keep your existing OpenTelemetry tracer (e.g., opentelemetry-instrumentation-anthropic). BIJOTEL adds three reusable SpanProcessors:

from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.instrumentation.anthropic import AnthropicInstrumentor

from bijotel.processors import (
    HmacChainSpanProcessor,    # F2: tamper-evident audit chain
    CasSpanProcessor,          # F3: content-addressable storage
    PolicyGateSpanProcessor,   # F4: in-process policy gate
)

provider = TracerProvider()
provider.add_span_processor(HmacChainSpanProcessor(secret_key="..."))
provider.add_span_processor(CasSpanProcessor(store_path="./cas.db"))
provider.add_span_processor(PolicyGateSpanProcessor(rules=[...]))
trace.set_tracer_provider(provider)

AnthropicInstrumentor().instrument()  # tracer rămâne upstream

Custom Code Tracing (@trace_genai)

For LLM calls outside instrumentation-anthropic coverage (custom wrappers, non-Anthropic providers, multi-provider clients), use the @trace_genai decorator or bijotel.wrap() runtime equivalent:

from bijotel import trace_genai

# Anthropic-style API: defaults work
@trace_genai(provider="anthropic")
def call_claude(*, model, messages, max_tokens):
    return client.messages.create(model=model, messages=messages, max_tokens=max_tokens)

# Custom API: provide extractors (e.g. for multi-provider wrappers)
@trace_genai(
    name="ara.llm.call",
    provider="ara",
    request_extractor=lambda kw: {
        "model": kw["cfg"].model_id,
        "messages": kw["messages"],
        "max_tokens": kw["cfg"].max_tokens,
    },
    response_extractor=lambda resp: {
        "input_tokens": resp.input_tokens,
        "output_tokens": resp.output_tokens,
    },
    extra_attrs={"ara.deployment": "prod"},  # constants only
)
async def complete(self, *, agent_id, messages, cfg, ...):
    return await self._dispatch(...)

Auto-detects sync/async via asyncio.iscoroutinefunction. All emitted spans pass through HmacChain/CAS/Policy processors normally. Exceptions in the wrapped function set span status to ERROR and re-raise. Extractor failures log to bijotel.extractor_error attribute without crashing the call.

bijotel.wrap(fn, ...) is the runtime alternative — same behavior, no source modification needed (third-party libs, dynamic dispatch).

Note: dual audit when combining @trace_genai with AnthropicInstrumentor

If you decorate a function that internally calls client.messages.create() while AnthropicInstrumentor().instrument() is active, two spans are emitted per call:

  • Outer span: from @trace_genai (your wrapper boundary)
  • Inner span: from AnthropicInstrumentor (the SDK call itself)

Both are sealed in the chain. This is intentional — the outer span captures your application context (e.g. ara.agent_id, ara.org_id), the inner span captures the raw SDK request/response. Together they give you full audit coverage at two granularities.

If you want only one audit layer, choose one approach:

  • Decorator only (single span per logical call): don't call AnthropicInstrumentor().instrument()
  • Instrumentation only (single span per SDK call): don't decorate your wrapper

Storage cost of dual audit: ~2× span count. For most workloads this is trivial; for high-volume production, pick one layer.

Provider Adapters (F7)

Provider Protocol unifies LLM provider integration. Adapters implement contract methods, enabling clean @trace_genai integration via provider=adapter shorthand:

from bijotel import trace_genai
from bijotel.adapters import AnthropicAdapter

adapter = AnthropicAdapter()

@trace_genai(provider=adapter)
async def my_call(*, model, messages, max_tokens):
    return await adapter.complete(
        messages=messages, model=model, max_tokens=max_tokens
    )

The decorator auto-extracts:

  • gen_ai.provider.name from adapter.name
  • Request attrs from adapter.extract_request_attrs()
  • Response attrs from adapter.extract_response_attrs()

Explicit request_extractor= / response_extractor= always override adapter-supplied methods (escape hatch preserved).

Calling the adapter directly returns a normalized ProviderResponse:

response = await adapter.complete(
    messages=[{"role": "user", "content": "hi"}],
    model="claude-haiku-4-5-20251001",
    max_tokens=20,
)
print(response.text, response.input_tokens, response.output_tokens)

Available adapters:

  • AnthropicAdapter — Anthropic Claude (uses anthropic.AsyncAnthropic). Install: pip install bijotel[anthropic].
  • OpenAIAdapter — OpenAI GPT (uses openai.AsyncOpenAI). Install: pip install bijotel[openai].
from bijotel import trace_genai
from bijotel.adapters import OpenAIAdapter

adapter = OpenAIAdapter()

@trace_genai(provider=adapter)
async def call_gpt(*, model, messages, max_tokens):
    return await adapter.complete(
        messages=messages, model=model, max_tokens=max_tokens
    )

# Direct call:
response = await adapter.complete(
    messages=[{"role": "user", "content": "hi"}],
    model="gpt-4o-mini",
    max_tokens=20,
)

Same Provider Protocol, same ProviderResponse shape — only the SDK underneath differs. F7 validated empirical with two consumers (Anthropic + OpenAI).

Adding new providers — subclass Provider:

from bijotel.adapters import Provider, ProviderResponse

class OpenAIAdapter(Provider):
    @property
    def name(self) -> str:
        return "openai"

    def extract_request_attrs(self, kwargs): ...
    def extract_response_attrs(self, response): ...

    async def complete(self, *, messages, model, max_tokens, **kwargs):
        raw = await self.client.chat.completions.create(...)
        return ProviderResponse(
            text=raw.choices[0].message.content,
            model=raw.model,
            input_tokens=raw.usage.prompt_tokens,
            output_tokens=raw.usage.completion_tokens,
            response_id=raw.id,
            finish_reason=raw.choices[0].finish_reason,
            raw_response=raw,
        )

Backward-compatible: passing provider="anthropic" (string) still works exactly as in F5 — Provider object is opt-in.

Policy Gate

The PolicyEngine evaluates pre-call rules against request payload (model, messages, max_tokens, …) and returns a Decision (allow / warn / deny). Use the guard decorator for the typical "wrap an LLM call" pattern, or call PolicyEngine directly for custom integration.

PolicyEngine direct usage

from bijotel import PolicyEngine, cost_per_call_max, model_allowlist

engine = PolicyEngine(rules=[
    cost_per_call_max(usd=0.50),
    model_allowlist("claude-haiku-4-5", "claude-sonnet-4-20250514"),
])

request = {"model": "claude-haiku-4-5", "messages": [...], "max_tokens": 100}
decision = engine.evaluate(request)

if decision.is_deny:
    print(f"Blocked by {decision.rule}: {decision.reason}")
elif decision.is_warn:
    print(f"Warning from {decision.rule}: {decision.reason}")  # call still proceeds
else:
    print("Allowed")

engine.evaluate() short-circuits on first deny. Warnings are collected and attached as bijotel.policy.warning attributes on emitted spans. See Decision and State classes in bijotel.policy.decision.

model_allowlist

Restrict which models can be called via your wrapper. Useful for cost control + audit.

from bijotel import model_allowlist

# Deny if model not in list
rule = model_allowlist("claude-haiku-4-5", "claude-sonnet-4-20250514", mode="deny")

# Warn-only mode (audit + proceed)
rule_audit = model_allowlist("claude-haiku-4-5", mode="warn")

prompt_pattern_deny (F11)

Block prompts matching jailbreak / prompt-injection regex patterns before the SDK call is made. Five attack categories covered out of the box: instruction override ("ignore previous instructions"), system prompt extraction ("reveal your system prompt"), role override ("you are now a different AI"), jailbreak framing ("DAN mode", "developer mode"), encoding bypass (base64:, rot13). Defaults are case-insensitive and applied via re.search.

from bijotel import prompt_pattern_deny

# Defaults only (DEFAULT_JAILBREAK_PATTERNS, ~15 patterns, 5 categories)
rule = prompt_pattern_deny()

# Custom patterns appended to defaults (defaults checked first)
rule = prompt_pattern_deny(
    patterns=[r"my_company_secret", r"\bAPI[_-]KEY\b"],
)

# Custom patterns only — defaults disabled
rule = prompt_pattern_deny(
    patterns=[r"sensitive_term"], use_defaults=False
)

# Warn mode — audit but allow (recommended for first deployment)
rule_audit = prompt_pattern_deny(mode="warn")

Handles both Anthropic SDK (messages=[{"role": "user", "content": "..."}]) and Anthropic multipart format (content=[{"type": "text", "text": "..."}]), plus OpenAI-style messages — extracts and concatenates text content from all roles before matching.

Suggested rollout: deploy in mode="warn" first to surface false positives via bijotel.policy.warning span attributes, review for ~1 week, then flip to mode="deny". False positives are easier to diagnose than false negatives in this domain.

Pattern catalog adapted from substrate-guard's agent_safety.rego dangerous_patterns concept (separate project, read-only access). The substrate-guard version targets filesystem / network / shell actions; this BIJOTEL adaptation targets LLM prompts (instruction overrides, system-prompt extraction, role overrides, jailbreak framings, encoding bypass).

PolicyDeniedError

Raised by guard() decorator when a rule returns Decision.deny. Catch it in your application code to surface a useful message:

from bijotel import guard, PolicyDeniedError, cost_per_call_max

@guard(rules=[cost_per_call_max(usd=0.10)])
def call_llm(*, model, messages, max_tokens):
    return client.messages.create(model=model, messages=messages, max_tokens=max_tokens)

try:
    response = call_llm(model="claude-opus-4-7", messages=[...], max_tokens=4000)
except PolicyDeniedError as e:
    print(f"Policy denied: rule={e.rule!r}, reason={e.reason!r}")
    # → returns to user instead of leaking expensive call

Chain export — programmatic API

CLI is the typical use, but export_chain and verify_export are exposed as public functions for programmatic integration (e.g. scheduled audit-trail uploads, CI verification jobs):

from pathlib import Path
from bijotel import export_chain, verify_export

secret = bytes.fromhex("<your hex secret>")  # min 16 bytes

# Export
out = export_chain(
    db_path=Path("/data/bijotel_chain.db"),
    output_path=Path("/var/audit/audit_2026-05-10.json"),
    secret_key=secret,
)
# → "/var/audit/audit_2026-05-10.json"

# Verify (auditor side, only needs secret + JSON file)
valid, reason = verify_export(out, secret)
if not valid:
    raise RuntimeError(f"Audit trail tampered: {reason}")

Schema: bijotel-chain-v1. Per-entry HMAC + file-level chain_signature. Integrity verifiable with shared secret only — no SQLite access required.

Regression Detection (F12, Bijuteria #16)

Detect drift in token usage / cost over time using z-score + IQR methods on the BIJOTEL chain.db. Empirically motivated by patterns observed during GENA deployment (T+2h checkpoint revealed bimodal quality distributions and dimension-specific bottlenecks worth monitoring temporally).

Programmatic API

from bijotel import RegressionDetector, AnomalyMethod

detector = RegressionDetector(
    db_path="chain.db",
    baseline_window=100,        # Use last 100 spans as baseline
    z_threshold=3.0,            # Flag values > 3σ from mean
    iqr_multiplier=1.5,         # Tukey-style IQR outlier
    method=AnomalyMethod.BOTH,  # Require BOTH methods to flag (low FP)
)

# Single dimension
anomalies = detector.detect("input_tokens")
for a in anomalies:
    print(f"  seq={a.seq} value={a.value} z={a.z_score:.2f} severity={a.severity}")

# All 3 dimensions (input_tokens, output_tokens, cost)
results = detector.detect_all_dimensions(filter_model="claude-haiku-4-5-20251001")

CLI usage

# Scan all 3 dimensions on entire chain (default: last 50 spans vs prior 100)
bijotel regression --db chain.db

# Single dimension, specific model
bijotel regression --db chain.db --dimension cost --model claude-sonnet-4-20250514

# Custom baseline window + sensitivity
bijotel regression --db chain.db --window 200 --z-threshold 2.5

Exit codes: 0 no anomalies, 1 anomalies detected, 2 invalid args.

Detection methods

  • z-score (parametric): z = (value - baseline.mean) / baseline.stdev. Fast for Gaussian-like signals (most token counts when calls are similar).
  • IQR (non-parametric, Tukey): flag if value < p25 - k·iqr OR value > p75 + k·iqr. Robust to heavy-tailed distributions (cost can spike).
  • AnomalyMethod.BOTH (default): flags only when BOTH agree → minimizes false positives. Use Z_SCORE or IQR alone for broader detection.

Severity levels

  • anomaly — both z-score AND IQR triggered (high confidence drift).
  • warning — only one method triggered (worth review, lower confidence).

Limitations

  • Requires ≥5 baseline samples (MIN_SAMPLES); insufficient data returns empty list (no anomalies, but no false negatives surfaced either).
  • Cost dimension requires model in DEFAULT_PRICES price table (see policy/prices.py); spans with unknown models contribute no cost datapoint.
  • Single chain.db per RegressionDetector instance — no cross-chain analysis in v0.3.0.

Shutting down BIJOTEL

shutdown() flushes any pending spans and tears down the global TracerProvider. Important when running scripts that exit immediately (without flush, last spans may be lost).

from bijotel import init, shutdown

init(...)
# ... do work, emit spans ...
shutdown()  # flushes processors, releases resources

shutdown() is idempotent — safe to call multiple times.

Development install

git clone <repo>
cd BIJOTEL
pip install -e ".[anthropic,api,fingerprint,ast,dev]"
pytest

CLI

After install, the bijotel command is available:

# Verify chain integrity (requires HMAC secret)
export BIJOTEL_HMAC_SECRET=<hex>
bijotel verify --db chain.db

# Inspect a span (by hex span_id or integer seq)
bijotel inspect --db chain.db 1
bijotel inspect --db chain.db abc123def456

# Summary stats (chain + CAS + policy daily state)
bijotel stats --db chain.db

# List spans with filters
bijotel list --db chain.db
bijotel list --db chain.db --blocked
bijotel list --db chain.db --rule cost_per_call_max
bijotel list --db chain.db --model claude-haiku-4-5-20251001
bijotel list --db chain.db --since 2026-05-07 --limit 100

# Export chain to portable signed JSON (verifiable by external auditors)
bijotel export --db chain.db --output audit_trail.json

# Verify integrity of an exported JSON (no DB needed, just secret)
bijotel verify-export audit_trail.json

# Run the HTTP API server (requires `pip install bijotel[api]`)
bijotel serve --port 8080 --db chain.db
# GET /health, /version, /docs (OpenAPI / Swagger UI)

--since uses calendar date UTC (YYYY-MM-DD, lower bound 00:00:00Z), consistent with daily_token_budget rule.

Validation

End-to-end smoke test on real Anthropic API exercising the full BIJOTEL stack (HmacChain + CAS + PolicyGate + AnthropicInstrumentor + @trace_genai decorator + all 6 CLI commands):

export ANTHROPIC_API_KEY=sk-ant-...
export BIJOTEL_HMAC_SECRET=$(python -c "import secrets; print(secrets.token_hex(32))")
python scripts/e2e_smoke.py

Cost: ~$0.001 per run (3-4 real Haiku calls; denied calls don't hit network).

The script validates:

  • Chain integrity end-to-end (bijotel verify returns VALID)
  • CAS dedup on identical input (ref_count > 1 for repeated calls)
  • Policy gate enforcement (denied calls produce synthetic spans, no SDK call)
  • All 6 CLI subcommands return exit 0
  • Custom @trace_genai decorator works alongside AnthropicInstrumentor

Roadmap

Shipped in v1.0.0:

  • F0–F6: Core (skeleton → init → HMAC chain → CAS → policy gate → decorator → CLI)
  • F7: Provider protocol + AnthropicAdapter + OpenAIAdapter
  • F8: Portable signed JSON chain export
  • F11: prompt_pattern_deny (regex jailbreak/injection detection)
  • F12: Regression detection (z-score + IQR over tokens/cost)
  • F13: Deterministic + semantic fingerprinting layer
  • F14: AST safety layer (tree-sitter bash + stdlib Python ast)
  • F15: Inference routing (Pareto cost/quality/latency + budget)
  • F16: CAS Merkle DAG (content-addressable + reference graph)
  • F17: Misalignment probe library (29 probes × 8 attack categories)
  • F18: Combo D containment guard (Policy + AST + chain seal)
  • Compliance rules: PII / output-length / model-pin
  • CLI: verify + inspect + stats + list + export + verify-export + regression + serve
  • Hardening: WAL + busy_timeout + BEGIN IMMEDIATE, crash isolation, perms, lockfile
  • FastAPI bijotel serve (health + version, full chain/policy/regression in v1.1.0)
  • Docker image + docker-compose example

Planned:

  • v1.1.0 — FastAPI chain/policy/regression endpoints
  • v1.2.0 — Dashboard (chain explorer + policy + regression)
  • v1.3.0 — Consensus voting (Bijuteria #9) + energy accounting (#3)

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bijotel-1.0.0.tar.gz (176.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bijotel-1.0.0-py3-none-any.whl (97.1 kB view details)

Uploaded Python 3

File details

Details for the file bijotel-1.0.0.tar.gz.

File metadata

  • Download URL: bijotel-1.0.0.tar.gz
  • Upload date:
  • Size: 176.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for bijotel-1.0.0.tar.gz
Algorithm Hash digest
SHA256 f804390c937fd0a960e5bad02c16574e0d7d2d0b053f6dd2c71c7fe075650336
MD5 1b41c57f5a0a1d02c569f8488a086e5f
BLAKE2b-256 01c7fdcfa08b69e6da86189af6736a4cf96c80b575a600e4a07ca43654272391

See more details on using hashes here.

File details

Details for the file bijotel-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: bijotel-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 97.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for bijotel-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7ffd25e14f4aa5e46e4b26f252263555f43e3d27eee26d8353ccc7a24442ac42
MD5 edd0600137d1589af6b0c8f618afa341
BLAKE2b-256 df2e9796a582eae0887b2a69540fdc1d0ebaa36340ab032f9c2e1e89cdc780d3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page