Scan your AI's vector database for exposed sensitive data.

These details have not been verified by PyPI

Project links

Project description

RAGLeakGuard

Scan your AI's vector database for exposed sensitive data — before it becomes a breach you can't delete.

RAGLeakGuard is a CLI that connects to your vector store (Chroma today; more soon), reads what's stored, detects sensitive data (PII, health, financial), and writes a risk-scored report. No changes to your app — point it at the store and scan.

What it is: a data-inventory & compliance scanner — it answers the question a compliance officer actually asks: "what regulated data is sitting in our vector store, and can we prove we can delete it?" Read-only; safe to run against production.

What it isn't: a red-team tool. It doesn't fire prompt-injection or jailbreak attacks — it audits the data at rest, not how the model responds under attack.

🚧 Early development — building in public. Not production-ready yet.

Why this matters

RAG systems embed your private data into vector databases. That data can be reconstructed from the vectors (embedding inversion), is hard to delete (backups, replicas, caches, fine-tuned models), and usually isn't inventoried. RAGLeakGuard finds it.

Install (from source)

git clone https://github.com/Agenvana/RAGLeakGuard.git
cd RAGLeakGuard
python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip          # fresh venvs ship an old pip; the editable install needs a newer one
pip install -e ".[chroma,detect,dev]"
python -m spacy download en_core_web_sm

Python 3.9 note: dependencies are pinned (spaCy<3.8, numpy<2) so prebuilt wheels are used — no source build needed.

Quickstart (≈2 minutes)

# 1. Create a test vector store full of FAKE sensitive records
python scripts/seed_synthetic.py                          # -> ./sample_store (100 fake clinic records)

# 2. Scan it — global + US recognisers are on by default
ragleakguard scan --source chroma --path ./sample_store --report report.md

# 3. The fixture is Australian, so add the AU locale pack for full coverage
ragleakguard scan --source chroma --path ./sample_store --locale au --report report.md

# 4. Open report.md  (summary, findings by type + severity, risk level, remediation)

Detection

Default: global + US recognisers — SSN, bank number, driver license, credit card, email, phone, names, locations, dates, IP, crypto…
Locale packs (--locale): au (Medicare / TFN / ABN), uk, sg, in — opt-in country IDs.

Roadmap

See ROADMAP.md — next up includes a custom AU phone recogniser, more connectors (Pinecone, pgvector), and the Fix/Prove layers.

License

Apache-2.0

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Jun 25, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ragleakguard-0.1.0.tar.gz (17.6 kB view details)

Uploaded Jun 25, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ragleakguard-0.1.0-py3-none-any.whl (14.6 kB view details)

Uploaded Jun 25, 2026 Python 3

File details

Details for the file ragleakguard-0.1.0.tar.gz.

File metadata

Download URL: ragleakguard-0.1.0.tar.gz
Upload date: Jun 25, 2026
Size: 17.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for ragleakguard-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`a2dadb025fe65e7e2582b74356dc773399dda524488e27d872418f5b6e225de3`
MD5	`8370d23bac46bbd5a242a1592fbeb34e`
BLAKE2b-256	`5073a4757d9532850a67752f9562c2af7bce4bdfc14f34ab2900a2e4bf5d23bb`

See more details on using hashes here.

File details

Details for the file ragleakguard-0.1.0-py3-none-any.whl.

File metadata

Download URL: ragleakguard-0.1.0-py3-none-any.whl
Upload date: Jun 25, 2026
Size: 14.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for ragleakguard-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0ba250dd2a70b9666b02dbc6b18bb2d735dfd10ab5f489498b1be4699c9c2d2a`
MD5	`7c28dd176bd0e58b16a60e7c191bb5c8`
BLAKE2b-256	`3afd729fba78de1447212295c791cfe2f4cf6d3a3737edc520c84a6b4a8dba65`

See more details on using hashes here.

ragleakguard 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

RAGLeakGuard

Why this matters

Install (from source)

Quickstart (≈2 minutes)

Detection

Roadmap

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes