Native Python SDK for Provedex: Ed25519-signed, hash-chained agent evidence, byte-identical to the Rust reference.
Project description
provedex (native Python SDK)
Native, in-process Ed25519 signing and hash-chaining for AI-agent evidence. Byte-identical to the Provedex Rust reference: a ledger signed here verifies with provedex verify, and vice versa.
This is the opt-in fast-path. The default integration for non-Rust apps is the localhost sidecar (provedex-agent); see ADR 0004. Use this binding when you want sub-millisecond, in-process signing with no extra process to run.
Install
pip install provedex
Pre-built wheels ship for cpython 3.11+ on Linux x86_64, Linux aarch64, and macOS arm64. No Rust toolchain required to install. Add provedex to the requirements of the backend service that runs your AI agents.
How it fits your backend
provedex is a library you embed in the backend that runs your agents and automations, not a separate service. The model:
- Sign in-process. Wherever your agent does something worth proving (an LLM call, a tool call, an utterance), you call
session.record(...). The event is signed and appended to a local ledger as it happens. No network hop, no sidecar. - The key and the ledger live on the backend host. The signing key is read once at startup from a path you control. The ledger is an append-only NDJSON file on that host.
- Verify anywhere, later, by anyone. A regulator, an auditor, or you on a laptop can run
provedex verify(orprovedex.verify_file) against the ledger with only the public key, offline, with no involvement from the backend that produced it. That separation is the point: the operator never has to be trusted for the integrity of the log.
your backend (agents + automations) an auditor, months later
pip install provedex (only needs the public key)
session.record(event) ---> ledger.ndjson ---> provedex verify -> VALID / BROKEN
(signing key stays here) (the evidence) (offline, no trust in you)
Quickstart
import hashlib
import os
import provedex
def sha256_hex(data: str | bytes) -> str:
"""Event payloads carry SHA-256 hex digests, not raw content. Hash with
your own hashlib; what you hash vs. keep in clear is your decision."""
if isinstance(data, str):
data = data.encode("utf-8")
return hashlib.sha256(data).hexdigest()
# Once at startup. The key is created on first run, then reused (0600 on unix).
keypair = provedex.SigningKeypair.load_or_create(
os.path.expanduser("~/.provedex/keys/ed25519.key")
)
# Open one session per conversation / agent run. Resumes if the ledger exists.
session = provedex.Session.open(
keypair=keypair,
ledger_path=os.path.expanduser("~/.provedex/ledger.ndjson"),
session_id="conversation-42",
)
session.record(
provedex.events.session_started(
agent_id="intake-bot", model_id="gpt-4o", session_id="conversation-42"
)
)
prompt = "Summarize the patient's chief complaint."
response = call_your_model(prompt) # your code
signed = session.record(
provedex.events.model_invoked(
model_id="gpt-4o",
prompt_sha256=sha256_hex(prompt),
response_sha256=sha256_hex(response),
prompt_tokens=120,
response_tokens=80,
)
)
print(signed.seq, signed.self_hash)
session.record(provedex.events.session_ended(reason="completed", summary_sha256=sha256_hex(response)))
# Anyone with the public key can now verify this ledger, offline.
report = provedex.verify_file(os.path.expanduser("~/.provedex/ledger.ndjson"))
assert report.ok
Events
One typed factory per core variant. The variant set is locked to the Rust core; there is no Python-only event. All arguments are keyword-only.
| Factory | Signs |
|---|---|
events.session_started(agent_id, model_id, session_id) |
session open |
events.utterance_captured(audio_sha256, transcript, lang, duration_ms) |
inbound speech |
events.tool_called(tool_name, args_sha256, args_redacted) |
tool invocation |
events.tool_returned(tool_name, result_sha256, latency_ms, success) |
tool result |
events.model_invoked(model_id, prompt_sha256, response_sha256, prompt_tokens, response_tokens) |
LLM call |
events.utterance_spoken(text_sha256, text, audio_sha256) |
outbound speech |
events.session_ended(reason, summary_sha256) |
session close |
events.from_dict({"type": ..., "payload": ...}) rebuilds an event from its stored JSON.
What goes in the fields
-
*_sha256fields take a 64-character SHA-256 hex digest that you compute. The ledger stores digests, not raw prompts, responses, or audio - this keeps sensitive content out of the evidence while still proving exactly what was processed. For raw text (a prompt, a response, a transcript), hash the UTF-8 bytes withsha256_hexabove. For a structured payload (a dict or list, such as tool arguments), hash its canonical JSON so the digest is reproducible by anyone, in any language:def canonical_sha256(payload: object) -> str: return hashlib.sha256(provedex.canonical_json(payload)).hexdigest()
provedex.canonical_jsonis the same deterministic encoder the chain signs with (sorted keys, fixed number formatting), so an auditor re-hashing the original gets the identical digest. Do NOT hash an ad-hocstr(dict)orjson.dumps- those are not stable across runs or languages. Pick one convention (raw-bytes for text, canonical-JSON for structures), apply it consistently, and document it for whoever verifies. -
args_redacted(ontool_called) is a dict you store in clear - the non-sensitive subset of the tool arguments (for example an account ID but not an SSN). You decide what is safe to keep readable. It is signed as canonical JSON, so it must be JSON-serializable; non-finite floats (NaN, Infinity) are rejected. -
transcript/texton the utterance events are stored in clear alongside their hash, because a transcript is usually the thing an auditor wants to read. Omit or redact upstream if your data policy forbids it.
Provedex does not redact for you. What is hashed versus kept in clear is the customer's decision (see "Out of scope" in the project README).
Sessions
Session is the primary path: it allocates the next seq, chains each event to the previous self_hash, appends to the ledger, and fsyncs. On open it reads any existing ledger and resumes from the last event, so a restarted process continues the same chain rather than starting over.
- One session per conversation or run. Use a distinct
session_idper logical conversation so the boundary is meaningful to an auditor. (A process-wide session works but means "one session = the process lifetime," which usually is not what you want.) - The ledger file is the chain. Reopening the same
ledger_pathresumes that chain regardless ofsession_id; thesession_idis recorded inside events as metadata, it does not locate the ledger. If you want separate chains per conversation, give each its ownledger_path; if you want one continuous chain, point them all at the same file. - Concurrency. A single
Sessionis safe to call from multiple threads or async tasks; the core serializes each seal-and-append, so the chain stays valid under concurrent writers. There is noclose()- aSessionholds an appendable file handle that is released when it is garbage-collected.
record (and sign_event) return a SignedEvent with .seq, .timestamp_nanos, .event (the tagged {"type", "payload"} dict), .parent_hash, .self_hash, .signature, .signer_pubkey, and .to_json() (the exact NDJSON ledger line).
For full manual control (you own the seq and parent hash), there is a low-level path:
signed = provedex.sign_event(
event=e, seq=0, parent_hash=provedex.GENESIS_PARENT_HASH, keypair=keypair
)
Native binding vs. the framework adapters
If your agents are built on a framework, you have a choice:
provedex(this package) - in-process, fastest (~11 us/seal), no extra process. You callsession.record(...)yourself at each event. Most control; you instrument the code.provedex-pipecat/provedex-langchain- auto-capture every frame / LLM / tool call via the framework's hooks, no manualrecordcalls, but they route through theprovedex-agentsidecar (1-2 ms/event) and need that process running.
Same backend, different trade-off: native = manual + in-process, adapters = automatic + via sidecar.
Latency
| Operation | Cost |
|---|---|
sign_event / seal (no I/O), GIL released |
11-15 us |
Session.record (seal + append + fsync) |
3.8 ms, dominated by fsync |
Session.record fsyncs for durability, the same as the sidecar. On an async backend, run it off the event loop so the fsync does not block:
signed = await asyncio.to_thread(session.record, event)
Failure modes
All failures raise; nothing returns an error sentinel.
| Exception | When |
|---|---|
provedex.KeyLoadError |
bad key file (length, hex, missing on load) |
provedex.SigningError |
seal/hash failure, bad event shape in from_dict, non-finite float in a payload |
provedex.LedgerError |
ledger read/write failure |
provedex.ChainError |
malformed (unparseable) ledger input in verify_file |
verify_chain / verify_file do NOT raise on a broken chain; they return ChainReport(ok=False, broken_at=<seq>, reason=...). A broken chain is data, not an exception. Out-of-range or negative integers passed to an event field raise the standard OverflowError.
Byte-compat
There is one canonical-JSON encoder in the whole system: the Rust one. This binding calls it directly, so the bytes it signs are identical to the sidecar and the CLI. The repo's tests/compat/vectors/ golden suite and the cross-verify tests assert it.
JSON numbers follow the Rust reference exactly: an integer and a float are distinct (1 and 1.0 hash differently), and non-finite floats (NaN, Infinity) are rejected rather than silently coerced.
Verifying offline
Anyone with the public key can verify the ledger with no involvement from you:
provedex verify --ledger ~/.provedex/ledger.ndjson
In Python, provedex.verify_file(path) returns a ChainReport (.ok, .event_count, .broken_at, .reason, .root_hash). On a valid chain .broken_at and .reason are None.
The signer's public key is keypair.pubkey_hex (64 hex chars). Publish it out of band - a trust page, a key registry, a signed disclosure - not only from the same service that produced the ledger, or a verifier is back to trusting the operator, which is the trust this design removes.
License
Apache-2.0.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file provedex-0.1.1.tar.gz.
File metadata
- Download URL: provedex-0.1.1.tar.gz
- Upload date:
- Size: 42.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c6d174a025bebe790c9e7d9573fd374b0eb239ad1d727156dc259ce97bd301ae
|
|
| MD5 |
a1ab1a1954378656c405d21967189777
|
|
| BLAKE2b-256 |
556f505c7f34eed6ecf53b57a073dc1ccc7c39a0ff8df9142f2a75f9aaa5cdc5
|
File details
Details for the file provedex-0.1.1-cp311-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: provedex-0.1.1-cp311-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 618.2 kB
- Tags: CPython 3.11+, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c19da99a09d0f40ab1037e26bbfef2b23fa358a5b560db5be5022e870b6b6ecb
|
|
| MD5 |
29b022fce0fec0b054c172eb01eb6c73
|
|
| BLAKE2b-256 |
5fee773da7882e87f275b6e7dcca70dbc81224952c0e91e5ebe311bdec2debca
|
File details
Details for the file provedex-0.1.1-cp311-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.
File metadata
- Download URL: provedex-0.1.1-cp311-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 568.6 kB
- Tags: CPython 3.11+, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ca98050cee06b90bcd5a064f188ad7b84d1e2a278dd244f8546c8d3390aebae9
|
|
| MD5 |
f83a9416c15e7c1c22be353635bc4178
|
|
| BLAKE2b-256 |
8caff97970fd7c7bf61db708224da11ae488ff8db89a2f9fe5c86e77e7ae8195
|
File details
Details for the file provedex-0.1.1-cp311-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: provedex-0.1.1-cp311-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 548.7 kB
- Tags: CPython 3.11+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d3086592ee5284a05943ed22a1781ccfb7e47f9daa648a1e0a750b3c224b9ff8
|
|
| MD5 |
32eb654c8f7599d012065dfdf1da52b1
|
|
| BLAKE2b-256 |
f258c4f923505093cf9dbf379fc713f708927e899ba019f72f94098643f3a5c5
|