Skip to main content

Tamper-evident, auditor-navigable evidence trails for AI-assisted and data-transformation workflows.

Project description

AuditWeave

Tamper-evident, auditor-navigable evidence trails for AI-assisted and data-transformation workflows.

When an AI system or a data pipeline produces a conclusion that someone later has to defend — to an auditor, a regulator, a court, or an internal reviewer — the hard question is always the same: where did this come from, and can you prove it wasn't changed after the fact?

AuditWeave answers that. It records every step of a workflow — source documents, retrievals, data transformations, model inferences, decisions, and human sign-offs — into a single, append-only, cryptographically chained ledger. Any conclusion can be traced back to the exact evidence beneath it, and any tampering with the record is detectable.

from auditweave import Recorder
from auditweave.adapters import RagTrace
from auditweave.navigator import evidence_report

rec = Recorder()
rag = RagTrace(rec, model="gpt-4o", model_version="2025-01")

ids = rag.record_turn(
    query="Why did Q3 revenue exceed forecast?",
    retrieved=[{"name": "contract #4471", "text": "Perpetual license, $0.7M"}],
    prompt="Explain using only the documents.",
    answer="A one-time $0.7M license sale drove the Q3 spike.",
)

print(evidence_report(rec.trail, ids["decision_id"]))

Why this exists

The fastest-growing use of AI in regulated industries is assistive: a model reads documents and proposes a conclusion a human then relies on. Auditors use it to navigate enormous datasets; banks use it for credit and fraud workflows; healthcare uses it for prior-authorization. In every one of these, regulators are converging on the same requirement — source attribution and traceability: you must be able to show what evidence informed an AI-influenced decision, and prove the record is intact.

Existing tools mostly solve a different problem. ML observability platforms (drift, latency, cost) and model-governance suites are built for the ML engineer. AuditWeave is built for the reviewer — the person who has to reconstruct one conclusion from the bottom up and trust that the trail hasn't been edited.

It is deliberately small, dependency-free, and framework-agnostic, because the only audit layer that gets adopted is the one that's trivial to add to a pipeline you already have.

What it does

  • One vocabulary for two worlds. A RAG pipeline and a Spark/lakehouse pipeline write into the same trail, so a conclusion that depends on both AI inference and upstream data transformations is traceable end-to-end.
  • Tamper-evident by construction. Every event is SHA-256 hash-chained to the previous one. Altering, inserting, deleting, or reordering any event breaks the chain, and trail.verify() reports exactly where.
  • Confidential-data friendly. Bind the trail to the hash of a source document instead of its contents, so you can prove which bytes informed a decision without copying regulated data into the ledger.
  • Navigable. evidence_report(trail, decision_id) renders the full evidence behind any single conclusion, in reading order, with an integrity stamp.
  • Append-only persistence. The default JSON Lines store grows only by addition — diff it over time and you see additions, never rewrites.
  • Zero runtime dependencies. Standard library only. PySpark is optional and never imported unless you use it.

Install

pip install auditweave          # once published

From source (editable install, with the dev/test extras):

git clone https://github.com/vimalnakrani08/auditweave
cd auditweave
pip install -e ".[dev]"

The package is pure Python with zero runtime dependencies and builds with the standard setuptools backend, so it installs cleanly anywhere Python 3.9+ runs — no special toolchain required.

Running the tests

The full suite covers chain integrity, tamper detection (modification and reordering), provenance tracing, append-only persistence round-trips, both adapters, and the cross-pipeline case where one conclusion depends on both a data pipeline and an AI inference:

pip install -e ".[dev]"
pytest -q

All tests should pass. Integrity and provenance are the whole point of this project, so every behavior that matters is covered by a test — including explicit tests that a tampered record (in memory and after being saved to disk and reloaded) is always detected.

Try it end-to-end without writing any code:

python examples/audit_demo.py

This runs a full audit scenario — a lakehouse aggregation plus a RAG explanation, signed off by a human — and prints the navigable evidence report with an integrity stamp.

Core concepts

Concept What it is
Event One immutable record: a source, retrieval, transformation, inference, decision, or attestation.
Trail The append-only, hash-chained sequence of events. Supports verify() and trace().
Recorder The one-call-per-step API your pipeline (or an adapter) uses.
Adapters RagTrace for retrieval+LLM turns; TableTrace for tabular/Spark lineage.
evidence_report Renders the evidence behind a conclusion for a human reviewer.

Verifying integrity

result = rec.trail.verify()
if not result:
    for issue in result.issues:
        print(f"[seq {issue.sequence}] {issue.problem}")

If anyone edits a persisted trail — even a single digit in a recorded figure — reloading and verifying it fails, pinpointing the altered event.

Roadmap

  • Adapters for LangChain and LlamaIndex callbacks (auto-instrumentation)
  • Optional signed attestations (per-reviewer keypairs)
  • HTML / PDF evidence-report export
  • Merkle-root anchoring for external timestamping

Contributions welcome — see CONTRIBUTING.md.

License

Apache-2.0. See LICENSE.


AuditWeave is infrastructure for accountability, not a compliance guarantee. Whether a given trail satisfies a specific regulation is a question for your auditors and counsel.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

auditweave-0.1.0.tar.gz (19.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

auditweave-0.1.0-py3-none-any.whl (17.9 kB view details)

Uploaded Python 3

File details

Details for the file auditweave-0.1.0.tar.gz.

File metadata

  • Download URL: auditweave-0.1.0.tar.gz
  • Upload date:
  • Size: 19.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.3

File hashes

Hashes for auditweave-0.1.0.tar.gz
Algorithm Hash digest
SHA256 9cf76546a19639016045f7511fde18fdf3dcdec6964e079ced75eb20b57c9ef2
MD5 9307766df99d22c771908be605a3b1bf
BLAKE2b-256 d147e893210e141a4d838a47363c45388f1711e86021d0c825c47465d1fb47f7

See more details on using hashes here.

File details

Details for the file auditweave-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: auditweave-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 17.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.3

File hashes

Hashes for auditweave-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0c125058353bb1ea29763c34d89af90cc616a7e9b68b31f8281c2d5be3b0a34b
MD5 11e7d48b4f510a8a9d0d453b82ec4bc8
BLAKE2b-256 2e747416fae8bac90df94f0a9680d389a27113fcd86fc89785afef421773c46b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page