Skip to main content

Paragraph-level compliance audit trails for RAG pipelines

Project description

TrailRAG

Compliance audit layer for RAG pipelines built on rag_control. Regulators in EU AI Act / GDPR / HIPAA-regulated industries need more than a response — they need to know which document, which page, and which model version produced it. TrailRAG intercepts every retrieval call, traces each chunk to its source, and writes an immutable audit record before returning the result.

The Problem

"The AI referenced this document" is not sufficient. Regulators need: the AI extracted $2.5M from page 7, paragraph 3 of insurance_policy_v2.pdf, version 2026-01-15, at 04:32:59 UTC, by user analyst_001.

Starting August 2026, EU AI Act enforcement makes this non-negotiable:

  • Art. 30 — High-risk AI systems must log every inference: input, output, retrieved sources, model identity, timestamp, and the user who triggered it.
  • GDPR Art. 30 — Records of processing activities must identify the data subject, legal basis, and retention period for every automated decision.
  • HIPAA §164.312(b) — Audit controls must record activity in systems that touch PHI, including AI-generated responses citing medical records.

Quick Start

pip install trailrag
from trailrag import TrailRAGEngine

engine = TrailRAGEngine(
    base_engine=your_rag_control_engine,
    jurisdiction="GDPR",                    # sets retention period automatically
    db_url="postgresql://user:pw@host/db",  # or "sqlite:///audit.db" for dev
)

result = engine.run(query="What is the fire coverage limit?", user_context=ctx)
print(result.audit_id)       # 01KNZZDS0161Z4VEWVVV0REP0A
print(result.chunk_records)  # [{chunk_id, doc_id, page_number, score, ...}, ...]

What Gets Logged

{
  "audit_id":        "01KNZZDS0161Z4VEWVVV0REP0A",
  "timestamp_utc":   "2026-04-12T04:32:59Z",
  "user_id":         "analyst_001",
  "jurisdiction":    "GDPR",
  "retention_until": "2026-10-09T04:32:59Z",
  "model_name":      "gpt-4o",
  "prompt_hash":     "880d6f35...",
  "query_text":      "What is the fire coverage limit?",
  "retrieved_chunks": [
    {
      "doc_id":           "insurance_policy_v2.pdf",
      "doc_version":      "2026-01-15",
      "page_number":      7,
      "similarity_score": 0.91,
      "retrieval_rank":   1
    }
  ],
  "response_hash":        "7eb675c8...",
  "total_tokens":         252,
  "retrieval_latency_ms": 13.4,
  "total_latency_ms":     161.2
}

Audit records are append-only. The only deletion path is purge_expired(), which removes records past their retention_until. store.delete() always raises ComplianceViolationError.

Compliance Coverage

Regulation Key Requirement TrailRAG Field
EU AI Act Art. 30 Decision traceability per inference retrieved_chunkspage_number, doc_id, similarity_score
GDPR Art. 30 Records of processing activities user_id, timestamp_utc, retention_until
HIPAA §164.312(b) Audit controls for PHI systems audit_id + append-only store
BASEL III Credit risk model documentation model_version, prompt_hash, temperature

Retention is set automatically: GDPR → 180 days, HIPAA → 6 years, EU AI Act → 10 years, SOC 2 → 1 year.

GDPR Subject Access Requests (Art. 15)

records = store.get_by_user("analyst_001", from_date=start, to_date=end)
store.export_json(audit_id, "/tmp/record_for_dpa.json")

Self-hosting (Docker)

The fastest way to run the compliance dashboard on your own infrastructure.

Prerequisites

  • Docker ≥ 24 and Docker Compose ≥ 2

1 — Clone and configure

git clone https://github.com/LeQuocAnh123/RAG_control.git
cd RAG_control

Create a .env file in the project root (never commit this file):

# Required — set a strong random value; used for POST /api/purge
TRAILRAG_API_KEY=change-me-to-a-long-random-secret

# Optional — defaults shown below
TRAILRAG_DB_URL=sqlite:////data/audit.db
TRAILRAG_CORS_ORIGINS=*

2 — Build and start

docker compose up -d --build

3 — Open the dashboard

http://localhost:8080/dashboard

The API docs (OpenAPI / Swagger) are at http://localhost:8080/api/docs.

4 — Verify health

curl http://localhost:8080/health
# {"status": "ok", "total_records": 0}

Persistent storage

Audit records are stored in a Docker named volume (trailrag_data) mounted at /data inside the container. Data survives container restarts and upgrades.

To back up:

docker run --rm -v trailrag_data:/data -v $(pwd):/backup busybox \
  tar czf /backup/trailrag_backup_$(date +%Y%m%d).tar.gz /data

Switching to PostgreSQL

Set TRAILRAG_DB_URL in your .env:

TRAILRAG_DB_URL=postgresql://user:password@postgres_host:5432/trailrag

TrailRAG will create the audit_log table automatically on first start.

Purging expired records

# Dry run — count eligible records
curl -X POST "http://localhost:8080/api/purge?dry_run=true" \
     -H "X-TrailRAG-Key: $TRAILRAG_API_KEY"

# Execute purge
curl -X POST "http://localhost:8080/api/purge" \
     -H "X-TrailRAG-Key: $TRAILRAG_API_KEY"

Built On

  • rag_control — runtime governance, RBAC, policy enforcement
  • SQLAlchemy 2.0 — SQLite (dev) / PostgreSQL (production)
  • Pydantic v2 — immutable, validated audit schema

License

Apache 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

trailrag-0.1.0.tar.gz (91.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

trailrag-0.1.0-py3-none-any.whl (63.9 kB view details)

Uploaded Python 3

File details

Details for the file trailrag-0.1.0.tar.gz.

File metadata

  • Download URL: trailrag-0.1.0.tar.gz
  • Upload date:
  • Size: 91.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.1 {"installer":{"name":"uv","version":"0.11.1","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for trailrag-0.1.0.tar.gz
Algorithm Hash digest
SHA256 7820685bbe6a969ba886de474299fc13f1b86077da5ad12914ac2342a615ec47
MD5 a40dad87f03ab46adcfdee3442e0088b
BLAKE2b-256 51d376c2624c73819ffbd0f541a208fbafba870a9963f3b6a7662d577031181f

See more details on using hashes here.

File details

Details for the file trailrag-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: trailrag-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 63.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.1 {"installer":{"name":"uv","version":"0.11.1","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for trailrag-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5052008d2000f6606d6e72bf07f146700bccd81469a2820ff98c31dad2d4cec0
MD5 723131ed15746503119224062d1aa774
BLAKE2b-256 bfbfb73d24be67fe10b44cd122693b8ba7aad32c0a710972c0b7c72561d8dadb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page