Skip to main content

Emit and verify portable cryptographic evidence bundles, offline: Ed25519 + RFC 6962 Merkle + optional SD-JWT.

Project description

b7n0de, Verified AI Work

proofbundle

Turn an AI eval result into one portable, offline-verifiable receipt. It proves who signed these exact bytes and that nothing changed since — not that the number is true. Ed25519 + RFC 6962 Merkle, one file, no server, no network.

CI License: MIT Ruff Mutation tested

The problem

Every AI eval number you read — a safety benchmark, a capability score, a leaderboard entry — is an unverifiable claim. You trust the lab. There's no portable way to check, offline, that a result was signed by a stated party, hasn't been altered, and covers the samples it claims.

proofbundle is that check. It's a small MIT-licensed Python tool (a compact, auditable trusted core, depends only on cryptography) that turns a result into a signed receipt anyone can verify from a single file — and it's honest about the line it does not cross.

60-second try (offline, no setup)

pip install "proofbundle[eval]"
proofbundle demo

You'll see an honest receipt verify => OK, then six independent tampers each verify FAILED, then a swapped sample get caught — all in memory. The command exits non-zero if any tamper slips through, so it's also a self-test. Full walkthrough: docs/DEMO.md.

# your own receipt, from a signed payload:
proofbundle emit --payload-file result.json --new-key signer.key --out receipt.json
proofbundle verify receipt.json        # exit 0 = OK, 1 = failed, 2 = malformed

What a receipt proves — and what it doesn't

✅ It proves ❌ It does not prove
These exact bytes were signed by this key (authorship) That the number is true
Nothing changed since signing (integrity, Ed25519 + RFC 6962) That the issuer is honest
The result is attributable to a stated issuer That the eval was well-designed
A threshold was met while hiding the model/dataset (salted commitments) That there was no cherry-picking — unless pre-registered
Optionally: individual samples, offline-auditable (per-sample Merkle) That the computation was correct — that needs a TEE or independent reproduction

This boundary is the point, not a weakness. A receipt makes a claim attributable, tamper-evident, and — with pre-registration and per-sample auditing — bounded and spot-checkable. Full detail: THREAT_MODEL.md.

How it fits together

flowchart LR
    H["eval harness<br/>inspect_ai · lm-eval · promptfoo · pytest"] --> A["adapter → signed claim<br/>salted commitments · provenance · samples root"]
    A --> R["receipt<br/>one portable file"]
    R --> V{{"proofbundle verify — offline"}}
    V --> C["signature · Merkle inclusion · SD-JWT/KB ·<br/>witness quorum · status list · sample openings"]
    C --> OK(["=> OK / FAILED"])
    style V fill:#D6248A,stroke:#D6248A,color:#fff
    style OK fill:#D6248A,stroke:#D6248A,color:#fff

What's in the box

  • Core — Ed25519 signature + RFC 6962 / 9162 Merkle inclusion, verified fully offline. Checks a real Sigstore Rekor proof, so correctness isn't self-referential.
  • Eval receipts — a signed claim (metric ⋈ threshold, n, salted model/dataset commitments, assurance level, provenance) from your run. See EVAL_CLAIM.md.
  • Selective disclosure — SD-JWT (RFC 9901) with Key Binding: prove a threshold while withholding the exact score.
  • Transparency-log interop — C2SP tlog-checkpoint / cosignature / .tlog-proof, with post-quantum ML-DSA-44 witness cosignatures. Optional Token-Status-List revocation snapshots.
  • Per-sample audit — commit to every sample; an auditor challenges random indices (with a fresh nonce or a public randomness beacon, v1.9) and openings must bind to the signed root. Catches 1% sample-doctoring with 95% confidence at 300 samples, regardless of run size.
  • Pre-registrationproofbundle prereg <plan> commits to the protocol before the run, so best-of-many publishing becomes visible.
  • Integrations — opt-in inspect_ai end-of-task hook and pytest plugin (emit only when PROOFBUNDLE_EMIT=1 / --proofbundle), plus a Hugging Face Community Evals bridge. See INTEGRATIONS.md.

Docs

For… Read
Skeptics (why not SHA-256 / Sigstore / trust the issuer) docs/FAQ.md
New to this? plain-terms glossary docs/GLOSSARY.md
Reviewers (30-minute adversarial audit path) docs/REVIEWERS.md
Where every trust anchor comes from docs/TRUST_ANCHORS.md
The demos, tier by tier docs/DEMO.md
The normative format + verification order SPEC.md
Honest comparison to Rekor / in-toto / OMS / ValiChord INTEROP.md
Regulatory mapping (and what to never claim) COMPLIANCE.md
Funders / role fit docs/PROJECT_BRIEF.md

Install

pip install proofbundle                 # core: offline verify + plain emit (dependency-free)
pip install "proofbundle[eval]"          # + eval receipts, prereg, and the demo (adds an RFC 8785 JCS canonicalizer)
pip install "proofbundle[inspect]"      # inspect_ai adapter + hook
pip install "proofbundle[pq]"           # verify ML-DSA-44 (post-quantum) witness cosignatures

Requires Python 3.10+. The verify path never rolls its own crypto — Ed25519 comes from cryptography; Merkle hashing is RFC 6962.

Status & scope

Beta, SemVer-committed, 303 tests + a CI mutation gate + property-based parser fuzzing. Correctness is anchored to external RFC 6962 vectors and a real Rekor proof, not just its own bundles. It is not a log service, a full in-toto client, a TEE, a consensus network, or a compliance product by itself — it is the small, offline, standards-native receipt layer between them. Security policy: SECURITY.md.

Contributing

See CONTRIBUTING.md and the Code of Conduct. Good first issues are labeled good-first-issue; security findings go through SECURITY.md. The verifier core aims to stay small, dependency-light, and correct.

License

MIT — see LICENSE.


proofbundle is part of b7n0de, Verified AI Work · b7n0de.com

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

proofbundle-1.9.1.tar.gz (134.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

proofbundle-1.9.1-py3-none-any.whl (104.9 kB view details)

Uploaded Python 3

File details

Details for the file proofbundle-1.9.1.tar.gz.

File metadata

  • Download URL: proofbundle-1.9.1.tar.gz
  • Upload date:
  • Size: 134.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for proofbundle-1.9.1.tar.gz
Algorithm Hash digest
SHA256 0fdfe1c8f6b6468c32aa6a870bb79939e6453d5644d727cd51d370125682af01
MD5 b410aaa8e5d8072d1d02faeb108a8c1d
BLAKE2b-256 d1f52b952d9d4f2d5d3e25ba6d2b2920431ae58e2eee8d0118349d27c5f13400

See more details on using hashes here.

Provenance

The following attestation bundles were made for proofbundle-1.9.1.tar.gz:

Publisher: release.yml on b7n0de/proofbundle

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file proofbundle-1.9.1-py3-none-any.whl.

File metadata

  • Download URL: proofbundle-1.9.1-py3-none-any.whl
  • Upload date:
  • Size: 104.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for proofbundle-1.9.1-py3-none-any.whl
Algorithm Hash digest
SHA256 444001f88a9cbaf8326d4e55d3ad0a0bf8b112a807504070c7cb88582425615a
MD5 89830cb3f1ac514a3bd5be6c619f3e42
BLAKE2b-256 83680ff12276a9eba2d7c2c40080765020e82be92739895c05a11a4ece9a4101

See more details on using hashes here.

Provenance

The following attestation bundles were made for proofbundle-1.9.1-py3-none-any.whl:

Publisher: release.yml on b7n0de/proofbundle

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page