Verified data deletion and leak detection for RAG systems
Project description
sura-rag
Verified data deletion and runtime leak detection for RAG systems. GDPR Article 17 compliant forget pipeline with multi-strategy leak probing, runtime guardrailing, and signed compliance certificates. 100% local, zero cloud API required.
The Problem
RAG systems retrieve and present data from vector stores, but when a user exercises their GDPR Article 17 "right to be forgotten," simply deleting a document from the vector store is not enough. The LLM may have memorized fragments during retrieval, cached chunks may persist, and there is no way to verify that the data is truly gone. sura-rag closes this gap by providing a complete forget pipeline: delete → probe → guardrail → certify.
Quick Install
# Core (Ollama-based, no GPU required)
pip install sura-rag
# With CPU embeddings (sentence-transformers)
pip install sura-rag[cpu]
# With CUDA support (pre-install CUDA torch first)
pip install sura-rag[cuda]
# With framework connectors
pip install sura-rag[langchain]
pip install sura-rag[llamaindex]
# Everything
pip install sura-rag[all]
30-Second Quickstart
import sura_rag as sr
# Connect to your vector store
client = sr.SuraClient(
vector_store=sr.adapters.ChromaDBAdapter("my_collection"),
config=sr.SuraConfig(generator_model="llama3.2:3b"),
)
# Forget a document (GDPR Article 17)
result = client.forget(
doc_ids=["doc_001"],
subject="John Smith salary records",
requestor_id="user_4821",
regulation="GDPR_Art17",
)
print(f"Score: {result.forget_score.composite_score}") # 0.0–1.0
print(f"Status: {result.status}") # "completed"
print(f"Certificate: {result.certificate_id}") # UUID
Features
| Feature | Phase 1 (v0.1) | Phase 2 (planned) |
|---|---|---|
| Vector store deletion | ✅ | ✅ |
| Fingerprint registry | ✅ | ✅ |
| Direct entity probes | ✅ | ✅ |
| Paraphrase probes | ✅ | ✅ |
| Contextual probes | ✅ | ✅ |
| Adversarial probes | ✅ | ✅ |
| Runtime guardrail (4 modes) | ✅ | ✅ |
| Audit logging (SQLite/Postgres) | ✅ | ✅ |
| PDF compliance certificates | ✅ | ✅ |
| LangChain connector | ✅ | ✅ |
| LlamaIndex connector | ✅ | ✅ |
| Parametric unlearning (LoRA) | — | ✅ |
| TOFU benchmark evaluation | — | ✅ |
| Multi-GPU training | — | ✅ |
Architecture
SURA-RAG follows a pipeline architecture: Delete → Probe → Guardrail → Certify. Documents are deleted from the vector store, their fingerprints are stored for runtime monitoring, multi-strategy probes verify the deletion, and a compliance certificate is generated. The runtime guardrail continuously scans all RAG responses against the fingerprint registry to catch any residual leakage.
Compatibility
| Component | Supported |
|---|---|
| ChromaDB | ✅ ≥0.5.0 |
| Qdrant | ✅ ≥1.9.0 |
| FAISS | ✅ ≥1.8.0 (soft-delete) |
| LangChain | ✅ ≥0.2.0 |
| LlamaIndex | ✅ ≥0.10.0 |
| Ollama | ✅ ≥0.2.0 |
| HuggingFace | ✅ ≥4.40.0 |
| PyTorch | ✅ ≥2.2.0 |
| Pandas | ✅ ≥2.0.0 |
| Python 3.10 | ✅ |
| Python 3.11 | ✅ |
| Python 3.12 | ✅ |
| Windows | ✅ |
| Linux | ✅ |
| macOS | ✅ |
Local Setup
1. Install Ollama
# macOS/Linux
curl -fsSL https://ollama.com/install.sh | sh
# Windows — download from https://ollama.com
2. Pull models
ollama pull llama3.2:3b
ollama pull nomic-embed-text
3. Start Ollama
ollama serve
4. Install sura-rag
pip install sura-rag
# or for development:
git clone https://github.com/SURA-RAG/sura-rag.git
cd sura-rag
pip install -e ".[dev,cpu]"
5. Run tests
pytest tests/unit/ -v
Environment Setup
Copy .env.example to .env and fill in your values:
cp .env.example .env
The .env file is in .gitignore and will never be committed. For Phase 1 (Ollama-based), no tokens are required. See .env.example for all available settings.
Contributing
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch:
git checkout -b feature/my-feature - Run tests:
pytest tests/unit/ -v - Run linting:
ruff check sura_rag/ - Submit a pull request
License
MIT License. See LICENSE for details.
Citation
If you use sura-rag in academic research, please cite:
@software{sura_rag_2024,
title = {sura-rag: Verified Data Deletion and Leak Detection for RAG Systems},
author = {Saxena, Aditya},
year = {2024},
url = {https://github.com/SURA-RAG/sura-rag},
license = {MIT},
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sura_rag-0.1.0.tar.gz.
File metadata
- Download URL: sura_rag-0.1.0.tar.gz
- Upload date:
- Size: 44.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
94b69b54854d4abe53242dc1624866e4314900ecbc16bd0bcd17a741e56f3c80
|
|
| MD5 |
7b3ff6bacb5c3aaeef307bcf15a2532c
|
|
| BLAKE2b-256 |
d80d848841a27f99c0b371affb8749127bcd2565f31b41ab058586bc234f0d23
|
File details
Details for the file sura_rag-0.1.0-py3-none-any.whl.
File metadata
- Download URL: sura_rag-0.1.0-py3-none-any.whl
- Upload date:
- Size: 50.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
972faee248fec3e53d6a4790c6967faf18e85dc8b31790575daed35121a75113
|
|
| MD5 |
207056e3f3128fab896f471fcf68dd97
|
|
| BLAKE2b-256 |
c52cf4e96c59e255378eb42df0f56f71f84476b279326de153d1b58dd858560c
|