Skip to main content

Emit and verify portable cryptographic evidence bundles, offline: Ed25519 + RFC 6962 Merkle + optional SD-JWT.

Project description

b7n0de, Verified AI Work

proofbundle

Turn an AI eval result into one portable, offline-verifiable receipt. It proves who signed these exact bytes and that nothing changed since — not that the number is true. Ed25519 + RFC 6962 Merkle, one file, no server, no network.

CI License: MIT Ruff Mutation tested

The problem

Every AI eval number you read — a safety benchmark, a capability score, a leaderboard entry — is an unverifiable claim. You trust the lab. There's no portable way to check, offline, that a result was signed by a stated party, hasn't been altered, and covers the samples it claims.

proofbundle is that check. It's a small MIT-licensed Python tool (a compact, auditable trusted core, depends only on cryptography) that turns a result into a signed receipt anyone can verify from a single file — and it's honest about the line it does not cross.

60-second try (offline, no setup)

pip install "proofbundle[eval]"
proofbundle demo

You'll see an honest receipt verify => OK, then six independent tampers each verify FAILED, then a swapped sample get caught — all in memory. The command exits non-zero if any tamper slips through, so it's also a self-test. Full walkthrough: docs/DEMO.md.

# your own receipt, from a signed payload:
proofbundle emit --payload-file result.json --new-key signer.key --out receipt.json
proofbundle verify receipt.json        # exit 0 = OK, 1 = failed, 2 = malformed

What a receipt proves — and what it doesn't

✅ It proves ❌ It does not prove
These exact bytes were signed by this key (authorship) That the number is true
Nothing changed since signing (integrity, Ed25519 + RFC 6962) That the issuer is honest
The result is attributable to a stated issuer That the eval was well-designed
A threshold was met while hiding the model/dataset (salted commitments) That there was no cherry-picking — unless pre-registered
Optionally: individual samples, offline-auditable (per-sample Merkle) That the computation was correct — that needs a TEE or independent reproduction

This boundary is the point, not a weakness. A receipt makes a claim attributable, tamper-evident, and — with pre-registration and per-sample auditing — bounded and spot-checkable. Full detail: THREAT_MODEL.md.

How it fits together

flowchart LR
    H["eval harness<br/>inspect_ai · lm-eval · promptfoo · pytest"] --> A["adapter → signed claim<br/>salted commitments · provenance · samples root"]
    A --> R["receipt<br/>one portable file"]
    R --> V{{"proofbundle verify — offline"}}
    V --> C["signature · Merkle inclusion · SD-JWT/KB ·<br/>witness quorum · status list · sample openings"]
    C --> OK(["=> OK / FAILED"])
    style V fill:#D6248A,stroke:#D6248A,color:#fff
    style OK fill:#D6248A,stroke:#D6248A,color:#fff

What's in the box

  • Core — Ed25519 signature + RFC 6962 / 9162 Merkle inclusion, verified fully offline. Checks a real Sigstore Rekor proof, so correctness isn't self-referential.
  • Eval receipts — a signed claim (metric ⋈ threshold, n, salted model/dataset commitments, assurance level, provenance) from your run. See EVAL_CLAIM.md.
  • Selective disclosure — SD-JWT (RFC 9901) with Key Binding: prove a threshold while withholding the exact score.
  • Transparency-log interop — C2SP tlog-checkpoint / cosignature / .tlog-proof, with post-quantum ML-DSA-44 witness cosignatures. Optional Token-Status-List revocation snapshots.
  • Per-sample audit — commit to every sample; an auditor challenges random indices (with a fresh nonce or a public randomness beacon, v1.9) and openings must bind to the signed root. Catches 1% sample-doctoring with 95% confidence at 300 samples, regardless of run size.
  • Pre-registrationproofbundle prereg <plan> commits to the protocol before the run, so best-of-many publishing becomes visible.
  • Integrations — opt-in inspect_ai end-of-task hook and pytest plugin (emit only when PROOFBUNDLE_EMIT=1 / --proofbundle), plus a Hugging Face Community Evals bridge. See INTEGRATIONS.md.

Docs

For… Read
Skeptics (why not SHA-256 / Sigstore / trust the issuer) docs/FAQ.md
New to this? plain-terms glossary docs/GLOSSARY.md
Reviewers (30-minute adversarial audit path) docs/REVIEWERS.md
Where every trust anchor comes from docs/TRUST_ANCHORS.md
The demos, tier by tier docs/DEMO.md
The normative format + verification order SPEC.md
Honest comparison to Rekor / in-toto / OMS / ValiChord INTEROP.md
Regulatory mapping (and what to never claim) COMPLIANCE.md
Funders / role fit docs/PROJECT_BRIEF.md
Preview: TEE-attestation bridge (v2.0 beta) docs/EXPERIMENTAL_ENCLAVE.md

Install

pip install proofbundle                 # core: offline verify + plain emit (dependency-free)
pip install "proofbundle[eval]"          # + eval receipts, prereg, and the demo (adds an RFC 8785 JCS canonicalizer)
pip install "proofbundle[inspect]"      # inspect_ai adapter + hook
pip install "proofbundle[pq]"           # verify ML-DSA-44 (post-quantum) witness cosignatures

Requires Python 3.10+. The verify path never rolls its own crypto — Ed25519 comes from cryptography; Merkle hashing is RFC 6962.

Status & scope

Beta, SemVer-committed, 303 tests + a CI mutation gate + property-based parser fuzzing. Correctness is anchored to external RFC 6962 vectors and a real Rekor proof, not just its own bundles. It is not a log service, a full in-toto client, a TEE, a consensus network, or a compliance product by itself — it is the small, offline, standards-native receipt layer between them. Security policy: SECURITY.md.

Contributing

See CONTRIBUTING.md and the Code of Conduct. Good first issues are labeled good-first-issue; security findings go through SECURITY.md. The verifier core aims to stay small, dependency-light, and correct.

License

MIT — see LICENSE.


proofbundle is part of b7n0de, Verified AI Work · b7n0de.com

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

proofbundle-2.0.0b1.tar.gz (141.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

proofbundle-2.0.0b1-py3-none-any.whl (110.5 kB view details)

Uploaded Python 3

File details

Details for the file proofbundle-2.0.0b1.tar.gz.

File metadata

  • Download URL: proofbundle-2.0.0b1.tar.gz
  • Upload date:
  • Size: 141.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for proofbundle-2.0.0b1.tar.gz
Algorithm Hash digest
SHA256 d32891d52ff58a1f74fb36bba53599e1b919a3835473eedd4c9f26e56e53bc2f
MD5 bfbeab00fb92fb2492dd34745cec0588
BLAKE2b-256 3e587504ce712c26b0f44ddbaf5395de223f71966b4f869163995d68374c2a92

See more details on using hashes here.

Provenance

The following attestation bundles were made for proofbundle-2.0.0b1.tar.gz:

Publisher: release.yml on b7n0de/proofbundle

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file proofbundle-2.0.0b1-py3-none-any.whl.

File metadata

  • Download URL: proofbundle-2.0.0b1-py3-none-any.whl
  • Upload date:
  • Size: 110.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for proofbundle-2.0.0b1-py3-none-any.whl
Algorithm Hash digest
SHA256 be4f5fce105cc55259166dd885a028dfda71dff4f165d41b303a9f5775c7a7a6
MD5 b0d058a6523f9c3d32db0bd124298469
BLAKE2b-256 2ea98daf029cd16c5ee5aba93b7b08dd2298cc21bf7f2c0fdae58a4982a2e82a

See more details on using hashes here.

Provenance

The following attestation bundles were made for proofbundle-2.0.0b1-py3-none-any.whl:

Publisher: release.yml on b7n0de/proofbundle

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page