Skip to main content

Tamper-evident evidence bundles for AI outputs

Project description

AELITIUM

Detect when LLM behavior silently changes — verifiable, offline, no server.

License tests python


The problem

You run the same prompt in production. One week later, the output is different.

The model changed — but your logs just show two JSON blobs. There's no proof of when it changed, or which call started returning different results.

AELITIUM gives you cryptographic evidence for every LLM call — request hash, response hash, tamper-evident bundle — so you can prove exactly when behavior changed, and that your records haven't been altered.


30-second demo

pip install aelitium
# Capture two runs of the same request:
aelitium compare ./bundle_last_week ./bundle_today
# STATUS=CHANGED rc=2
# REQUEST_HASH=SAME
# RESPONSE_HASH=DIFFERENT
# INTERPRETATION=Same request produced a different response

Same request. Different output. That means the change came from the model — not your code.

# Scan your codebase for unprotected LLM calls:
aelitium scan ./src
# LLM call sites detected: 4
# Missing evidence capture:
#   ⚠ openai — worker.py:42
#   ⚠ anthropic — agent.py:17
# Coverage: 2/4 (50%)
# STATUS=INCOMPLETE rc=2

All commands accept --json for structured output.


How it works

API call (OpenAI / Anthropic)
      ↓
capture adapter   ← records request_hash + response_hash at call time
      ↓
evidence bundle   ← canonical JSON + ai_manifest.json + binding_hash
      ↓
aelitium verify-bundle   ← STATUS=VALID / STATUS=INVALID
aelitium compare         ← UNCHANGED / CHANGED / NOT_COMPARABLE

Each bundle contains a deterministic SHA-256 hash of the payload, a manifest with timestamp and schema, and a cryptographic binding_hash linking the exact request to the exact response. Anyone with the bundle can verify it — no network required.


Capture adapter (OpenAI / Anthropic)

No manual JSON. The capture adapter intercepts the API call and writes the bundle automatically.

from openai import OpenAI
from aelitium import capture_openai

client = OpenAI()
result = capture_openai(
    client, "gpt-4o",
    [{"role": "user", "content": "What is the capital of France?"}],
    out_dir="./evidence",
)
print(result.ai_hash_sha256)  # deterministic proof of this exact call
aelitium verify-bundle ./evidence
# STATUS=VALID rc=0
# AI_HASH_SHA256=...
# BINDING_HASH=...   ← cryptographic link between request and response

See Capture layer for Anthropic, streaming, and signing.


Detect when the model changed

aelitium compare ./bundle_last_week ./bundle_today
# STATUS=CHANGED rc=2
# REQUEST_HASH=SAME    a=3f4a8c1d... b=3f4a8c1d...
# RESPONSE_HASH=DIFFERENT  a=9b2e7f1a... b=c41d8e3b...
# INTERPRETATION=Same request produced a different response

If REQUEST_HASH=SAME and RESPONSE_HASH=DIFFERENT, the change came from the model — not your code.

Run the full example (requires OpenAI API key):

python examples/model_drift_detector.py

Scan for unprotected LLM calls

Find every LLM call in your codebase that isn't wrapped in a capture adapter:

aelitium scan ./src

# LLM call sites detected: 12
# Instrumented with capture adapter: 9
#   ✓ openai — api/worker.py:14
#   ✓ openai — api/worker.py:38
# Missing evidence capture: 3
#   ⚠ openai — jobs/batch.py:22
#   ⚠ anthropic — agents/classifier.py:11
#   ⚠ litellm — utils/fallback.py:7
# Coverage: 9/12 (75%)
# STATUS=INCOMPLETE rc=2

Add to CI/CD to enforce evidence coverage:

- name: Check LLM evidence coverage
  run: aelitium scan ./src

For CI-friendly key=value output:

aelitium scan ./src --ci
# AELITIUM_SCAN_STATUS=INCOMPLETE
# AELITIUM_SCAN_TOTAL=12
# AELITIUM_SCAN_INSTRUMENTED=9
# AELITIUM_SCAN_MISSING=3
# AELITIUM_SCAN_COVERAGE=75

Reproducibility

The same AI output always produces the same hash, on any machine:

bash scripts/verify_repro.sh
# === RESULT: PASS ===
# AI_HASH_SHA256=8b647717...

Validated on two independent machines (A + B) with identical hashes.


CLI reference

aelitium

Command Description
scan <path> Scan Python files for uninstrumented LLM call sites
compare <bundle_a> <bundle_b> Compare two bundles — detect model behavior change
verify-bundle <dir> Verify bundle: hash + signature + binding hash
pack --input <file> --out <dir> Generate canonical JSON + manifest
verify --out <dir> Verify integrity of a pack output dir
validate --input <file> Validate against ai_output_v1 schema
canonicalize --input <file> Print deterministic hash
verify-receipt --receipt <file> --pubkey <file> Verify Ed25519 authority receipt offline
export --bundle <dir> Export bundle in compliance format (EU AI Act Art.12)

Exit codes: 0 = success, 2 = failure. Designed for CI/CD pipelines.


Documentation


Design principles

  • Deterministic — same input always produces the same hash, on any machine
  • Offline-first — verification never requires network access
  • Fail-closed — any verification error returns rc=2; no silent failures
  • Auditable — every pack includes a manifest with schema, timestamp, and hash
  • Pipeline-friendly — all output parseable (STATUS=, AI_HASH_SHA256=, --json)

Trust boundary

AELITIUM provides tamper-evidence, not truth guarantees.

What AELITIUM proves:

  • the bundle contents have not changed since packing
  • the canonicalized payload matches the recorded hash
  • (with capture adapter) the request hash matches the exact API payload sent

What AELITIUM does not prove:

  • that the model output was correct or safe
  • that the system that packed the bundle was trustworthy
  • that the model actually produced the output (without capture adapter)

Integrity ≠ completeness. AELITIUM proves that captured events were not altered. It does not guarantee that all events were captured. Capture completeness depends on the integration layer — SDK wrapper, proxy, or observer. If the agent controls its own logging, an observer-based capture pattern provides stronger guarantees. See TRUST_BOUNDARY.md for the full analysis.

Stronger provenance — signing authorities, hardware-backed keys — is the direction of P3.


Compliance alignment

AELITIUM provides tamper-evident evidence bundles that support the following regulatory and audit requirements:

Framework Requirement How AELITIUM helps
EU AI Act — Article 12 Logging and traceability of high-risk AI system outputs Evidence bundles provide immutable, verifiable records of AI outputs with deterministic hashes
SOC 2 — CC7 System monitoring and integrity controls Independent offline verification confirms records have not been altered after creation
ISO 42001 AI management system auditability Canonical bundles with schema versioning support third-party audits without infrastructure access
NIST AI RMF — MG 2.2 Traceability of AI decisions and outputs Each bundle contains a complete, reproducible record: payload, hash, timestamp, and optional signature

AELITIUM does not replace logging infrastructure. It adds cryptographic integrity on top of any existing pipeline — offline, without a server, without a blockchain.


License

Apache-2.0. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aelitium-0.2.4.tar.gz (40.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

aelitium-0.2.4-py3-none-any.whl (31.1 kB view details)

Uploaded Python 3

File details

Details for the file aelitium-0.2.4.tar.gz.

File metadata

  • Download URL: aelitium-0.2.4.tar.gz
  • Upload date:
  • Size: 40.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for aelitium-0.2.4.tar.gz
Algorithm Hash digest
SHA256 ee73286dd911206296a9e7d3ec575c01b8a8ca179e28bd4ecb29687928d94c7f
MD5 77d930ae43dbb5a8919de06adb3d89aa
BLAKE2b-256 8b26d0d825ae600f61bc9d4c9b0ef5774c5041ea5d624686048e843664076081

See more details on using hashes here.

File details

Details for the file aelitium-0.2.4-py3-none-any.whl.

File metadata

  • Download URL: aelitium-0.2.4-py3-none-any.whl
  • Upload date:
  • Size: 31.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for aelitium-0.2.4-py3-none-any.whl
Algorithm Hash digest
SHA256 429dcbc77a7f99fe8abf646bacfa5839236ae9c0d81743ddd27a4d760f8f44d7
MD5 e80d003695239e8cd644a6ae7bcf5dd7
BLAKE2b-256 f74f4ccb76d6c035facaa00ecadd4ce051445c35792b96a58301a541290dabfc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page