Tamper-evident, auditor-navigable evidence trails for AI-assisted and data-transformation workflows.
Project description
AuditWeave
Tamper-evident, auditor-navigable evidence trails for AI-assisted and data-transformation workflows.
When an AI system or a data pipeline produces a conclusion that someone later has to defend — to an auditor, a regulator, a court, or an internal reviewer — the hard question is always the same: where did this come from, and can you prove it wasn't changed after the fact?
AuditWeave answers that. It records every step of a workflow — source documents, retrievals, data transformations, model inferences, decisions, and human sign-offs — into a single, append-only, cryptographically chained ledger. Any conclusion can be traced back to the exact evidence beneath it, and any tampering with the record is detectable.
from auditweave import Recorder
from auditweave.adapters import RagTrace
from auditweave.navigator import evidence_report
rec = Recorder()
rag = RagTrace(rec, model="gpt-4o", model_version="2025-01")
ids = rag.record_turn(
query="Why did Q3 revenue exceed forecast?",
retrieved=[{"name": "contract #4471", "text": "Perpetual license, $0.7M"}],
prompt="Explain using only the documents.",
answer="A one-time $0.7M license sale drove the Q3 spike.",
)
print(evidence_report(rec.trail, ids["decision_id"]))
Why this exists
The fastest-growing use of AI in regulated industries is assistive: a model reads documents and proposes a conclusion a human then relies on. Auditors use it to navigate enormous datasets; banks use it for credit and fraud workflows; healthcare uses it for prior-authorization. In every one of these, regulators are converging on the same requirement — source attribution and traceability: you must be able to show what evidence informed an AI-influenced decision, and prove the record is intact.
Existing tools mostly solve a different problem. ML observability platforms (drift, latency, cost) and model-governance suites are built for the ML engineer. AuditWeave is built for the reviewer — the person who has to reconstruct one conclusion from the bottom up and trust that the trail hasn't been edited.
It is deliberately small, dependency-free, and framework-agnostic, because the only audit layer that gets adopted is the one that's trivial to add to a pipeline you already have.
What it does
- One vocabulary for two worlds. A RAG pipeline and a Spark/lakehouse pipeline write into the same trail, so a conclusion that depends on both AI inference and upstream data transformations is traceable end-to-end.
- Tamper-evident by construction. Every event is SHA-256 hash-chained to the previous one. Altering, inserting, deleting, or reordering any event breaks the chain, and
trail.verify()reports exactly where. - Confidential-data friendly. Bind the trail to the hash of a source document instead of its contents, so you can prove which bytes informed a decision without copying regulated data into the ledger.
- Navigable.
evidence_report(trail, decision_id)renders the full evidence behind any single conclusion, in reading order, with an integrity stamp. - Append-only persistence. The default JSON Lines store grows only by addition — diff it over time and you see additions, never rewrites.
- Zero runtime dependencies. Standard library only. PySpark is optional and never imported unless you use it.
Install
pip install auditweave # once published
From source (editable install, with the dev/test extras):
git clone https://github.com/vimalnakrani08/auditweave
cd auditweave
pip install -e ".[dev]"
The package is pure Python with zero runtime dependencies and builds with
the standard setuptools backend, so it installs cleanly anywhere Python 3.9+
runs — no special toolchain required.
Running the tests
The full suite covers chain integrity, tamper detection (modification and reordering), provenance tracing, append-only persistence round-trips, both adapters, and the cross-pipeline case where one conclusion depends on both a data pipeline and an AI inference:
pip install -e ".[dev]"
pytest -q
All tests should pass. Integrity and provenance are the whole point of this project, so every behavior that matters is covered by a test — including explicit tests that a tampered record (in memory and after being saved to disk and reloaded) is always detected.
Try it end-to-end without writing any code:
python examples/audit_demo.py
This runs a full audit scenario — a lakehouse aggregation plus a RAG explanation, signed off by a human — and prints the navigable evidence report with an integrity stamp.
Core concepts
| Concept | What it is |
|---|---|
Event |
One immutable record: a source, retrieval, transformation, inference, decision, or attestation. |
Trail |
The append-only, hash-chained sequence of events. Supports verify() and trace(). |
Recorder |
The one-call-per-step API your pipeline (or an adapter) uses. |
| Adapters | RagTrace for retrieval+LLM turns; TableTrace for tabular/Spark lineage. |
evidence_report |
Renders the evidence behind a conclusion for a human reviewer. |
Verifying integrity
result = rec.trail.verify()
if not result:
for issue in result.issues:
print(f"[seq {issue.sequence}] {issue.problem}")
If anyone edits a persisted trail — even a single digit in a recorded figure — reloading and verifying it fails, pinpointing the altered event.
Roadmap
- Adapters for LangChain and LlamaIndex callbacks (auto-instrumentation)
- Optional signed attestations (per-reviewer keypairs)
- HTML / PDF evidence-report export
- Merkle-root anchoring for external timestamping
Contributions welcome — see CONTRIBUTING.md.
License
Apache-2.0. See LICENSE.
AuditWeave is infrastructure for accountability, not a compliance guarantee. Whether a given trail satisfies a specific regulation is a question for your auditors and counsel.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file auditweave-0.1.0.tar.gz.
File metadata
- Download URL: auditweave-0.1.0.tar.gz
- Upload date:
- Size: 19.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9cf76546a19639016045f7511fde18fdf3dcdec6964e079ced75eb20b57c9ef2
|
|
| MD5 |
9307766df99d22c771908be605a3b1bf
|
|
| BLAKE2b-256 |
d147e893210e141a4d838a47363c45388f1711e86021d0c825c47465d1fb47f7
|
File details
Details for the file auditweave-0.1.0-py3-none-any.whl.
File metadata
- Download URL: auditweave-0.1.0-py3-none-any.whl
- Upload date:
- Size: 17.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0c125058353bb1ea29763c34d89af90cc616a7e9b68b31f8281c2d5be3b0a34b
|
|
| MD5 |
11e7d48b4f510a8a9d0d453b82ec4bc8
|
|
| BLAKE2b-256 |
2e747416fae8bac90df94f0a9680d389a27113fcd86fc89785afef421773c46b
|