Skip to main content

Verified data deletion and leak detection for RAG systems

Project description

sura-rag

PyPI version Python versions License: MIT Tests

Verified data deletion and runtime leak detection for RAG systems. GDPR Article 17 compliant forget pipeline with multi-strategy leak probing, runtime guardrailing, and signed compliance certificates. 100% local, zero cloud API required.

The Problem

RAG systems retrieve and present data from vector stores, but when a user exercises their GDPR Article 17 "right to be forgotten," simply deleting a document from the vector store is not enough. The LLM may have memorized fragments during retrieval, cached chunks may persist, and there is no way to verify that the data is truly gone. sura-rag closes this gap by providing a complete forget pipeline: delete → probe → guardrail → certify.

Quick Install

# Core (Ollama-based, no GPU required)
pip install sura-rag

# With CPU embeddings (sentence-transformers)
pip install sura-rag[cpu]

# With CUDA support (pre-install CUDA torch first)
pip install sura-rag[cuda]

# With framework connectors
pip install sura-rag[langchain]
pip install sura-rag[llamaindex]

# Everything
pip install sura-rag[all]

30-Second Quickstart

import sura_rag as sr

# Connect to your vector store
client = sr.SuraClient(
    vector_store=sr.adapters.ChromaDBAdapter("my_collection"),
    config=sr.SuraConfig(generator_model="llama3.2:3b"),
)

# Forget a document (GDPR Article 17)
result = client.forget(
    doc_ids=["doc_001"],
    subject="John Smith salary records",
    requestor_id="user_4821",
    regulation="GDPR_Art17",
)

print(f"Score: {result.forget_score.composite_score}")  # 0.0–1.0
print(f"Status: {result.status}")                       # "completed"
print(f"Certificate: {result.certificate_id}")          # UUID

Features

Feature Phase 1 (v0.1) Phase 2 (planned)
Vector store deletion
Fingerprint registry
Direct entity probes
Paraphrase probes
Contextual probes
Adversarial probes
Runtime guardrail (4 modes)
Audit logging (SQLite/Postgres)
PDF compliance certificates
LangChain connector
LlamaIndex connector
Parametric unlearning (LoRA)
TOFU benchmark evaluation
Multi-GPU training

Architecture

SURA-RAG follows a pipeline architecture: Delete → Probe → Guardrail → Certify. Documents are deleted from the vector store, their fingerprints are stored for runtime monitoring, multi-strategy probes verify the deletion, and a compliance certificate is generated. The runtime guardrail continuously scans all RAG responses against the fingerprint registry to catch any residual leakage.

Compatibility

Component Supported
ChromaDB ✅ ≥0.5.0
Qdrant ✅ ≥1.9.0
FAISS ✅ ≥1.8.0 (soft-delete)
LangChain ✅ ≥0.2.0
LlamaIndex ✅ ≥0.10.0
Ollama ✅ ≥0.2.0
HuggingFace ✅ ≥4.40.0
PyTorch ✅ ≥2.2.0
Pandas ✅ ≥2.0.0
Python 3.10
Python 3.11
Python 3.12
Windows
Linux
macOS

Local Setup

1. Install Ollama

# macOS/Linux
curl -fsSL https://ollama.com/install.sh | sh

# Windows — download from https://ollama.com

2. Pull models

ollama pull llama3.2:3b
ollama pull nomic-embed-text

3. Start Ollama

ollama serve

4. Install sura-rag

pip install sura-rag
# or for development:
git clone https://github.com/SURA-RAG/sura-rag.git
cd sura-rag
pip install -e ".[dev,cpu]"

5. Run tests

pytest tests/unit/ -v

Environment Setup

Copy .env.example to .env and fill in your values:

cp .env.example .env

The .env file is in .gitignore and will never be committed. For Phase 1 (Ollama-based), no tokens are required. See .env.example for all available settings.

Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/my-feature
  3. Run tests: pytest tests/unit/ -v
  4. Run linting: ruff check sura_rag/
  5. Submit a pull request

License

MIT License. See LICENSE for details.

Citation

If you use sura-rag in academic research, please cite:

@software{sura_rag_2024,
  title = {sura-rag: Verified Data Deletion and Leak Detection for RAG Systems},
  author = {Saxena, Aditya},
  year = {2024},
  url = {https://github.com/SURA-RAG/sura-rag},
  license = {MIT},
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sura_rag-0.1.0.tar.gz (44.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sura_rag-0.1.0-py3-none-any.whl (50.3 kB view details)

Uploaded Python 3

File details

Details for the file sura_rag-0.1.0.tar.gz.

File metadata

  • Download URL: sura_rag-0.1.0.tar.gz
  • Upload date:
  • Size: 44.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for sura_rag-0.1.0.tar.gz
Algorithm Hash digest
SHA256 94b69b54854d4abe53242dc1624866e4314900ecbc16bd0bcd17a741e56f3c80
MD5 7b3ff6bacb5c3aaeef307bcf15a2532c
BLAKE2b-256 d80d848841a27f99c0b371affb8749127bcd2565f31b41ab058586bc234f0d23

See more details on using hashes here.

File details

Details for the file sura_rag-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: sura_rag-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 50.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for sura_rag-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 972faee248fec3e53d6a4790c6967faf18e85dc8b31790575daed35121a75113
MD5 207056e3f3128fab896f471fcf68dd97
BLAKE2b-256 c52cf4e96c59e255378eb42df0f56f71f84476b279326de153d1b58dd858560c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page