Skip to main content

Permission-aware retrieval for RAG applications

Reason this release was yanked:

no longer maintained

Project description

RAGGuard

The security layer your RAG application is missing.

PyPI version Python 3.9+ License: Apache-2.0 Tests Security

┌──────────────────────────────────────────────────────────────────────────────┐
│                         BRING YOUR OWN PERMISSIONS                           │
├──────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│   INLINE POLICIES     CUSTOM FILTERS     ACL DOCUMENTS     ENTERPRISE AUTH   │
│   ┌─────────────┐     ┌─────────────┐    ┌────────────┐    ┌─────────────┐   │
│   │ rules:      │     │ class My    │    │ {"acl": {  │    │    OPA      │   │
│   │  - allow:   │     │   Filter:   │    │   "users": │    │   Cerbos    │   │
│   │     dept    │     │   def build │    │   ["alice"]│    │   OpenFGA   │   │
│   │             │     │     ...     │    │  }}        │    │   Permit.io │   │
│   └─────────────┘     └─────────────┘    └────────────┘    └─────────────┘   │
│     Code/YAML           Full Control      Explicit Lists    Policy Engines   │
│                                                                              │
└──────────────────────────────────────────────────────────────────────────────┘

The Problem: Your RAG system retrieves documents, then filters by permissions. But by then, unauthorized data has already been exposed to the retrieval layer. That's a data leak.

The Solution: RAGGuard filters during vector search, not after. Zero unauthorized exposure.

Works with any authorization system - use your existing permissions infrastructure (OPA, Cerbos, OpenFGA, custom RBAC, ACLs) or define policies inline. RAGGuard translates your authorization decisions into vector database filters.

┌─────────────────────────────────────────────────────────────────────────────┐
│   WITHOUT RAGGUARD                      WITH RAGGUARD                       │
├─────────────────────────────────────────────────────────────────────────────┤
│   Vector Search                         Vector Search                       │
│   Returns 10 docs ──────────┐           + Permission Filter                 │
│   (includes unauthorized)   │           Returns 10 docs                     │
│             │               │           (all authorized)                    │
│             ▼               │                  │                            │
│   Filter in Python          │                  │                            │
│   Remove 7 docs             │                  │                            │
│             │               │                  │                            │
│             ▼               │                  ▼                            │
│   Return 3 docs             │           Return 10 docs                      │
│   ❌ Data leaked            │           ✅ Zero exposure                    │
│   ❌ Wrong count            │           ✅ Correct count                    │
└─────────────────────────────────────────────────────────────────────────────┘

Quick Start

pip install ragguard[chromadb]
import chromadb
from ragguard import ChromaDBSecureRetriever, Policy

# 1. Your existing ChromaDB setup
client = chromadb.Client()
collection = client.create_collection("docs")
collection.add(
    ids=["1", "2", "3"],
    documents=["Finance Report", "Engineering Doc", "Public Blog"],
    metadatas=[
        {"department": "finance", "confidential": True},
        {"department": "engineering", "confidential": False},
        {"department": "public", "confidential": False}
    ]
)

# 2. Define access policy
policy = Policy.from_dict({
    "version": "1",
    "rules": [
        {"name": "same-dept", "allow": {"conditions": ["user.department == document.department"]}},
        {"name": "public", "match": {"confidential": False}, "allow": {"everyone": True}}
    ],
    "default": "deny"
})

# 3. Search with automatic permission filtering
retriever = ChromaDBSecureRetriever(collection=collection, policy=policy)

results = retriever.search(
    query="quarterly report",
    user={"id": "alice", "department": "finance"},
    limit=10
)
# Alice sees finance docs + public docs only

That's it. Documents are filtered at the database level. No post-filtering. No data leaks.

Bring Your Own Authorization

RAGGuard doesn't force you into a specific permissions model. Use what you already have:

Option 1: Inline Policies (shown above)

Define policies directly in code or YAML - great for getting started or simple use cases.

Option 2: Custom Filter Builders

Plug in any authorization logic with full control:

from ragguard.filters import CustomFilterBuilder

class MyAuthFilter(CustomFilterBuilder):
    def build_filter(self, policy, user, backend):
        # Query your auth system, check ACLs, call APIs - whatever you need
        allowed_docs = my_auth_service.get_accessible_docs(user["id"])
        return {"doc_id": {"$in": allowed_docs}}

retriever = ChromaDBSecureRetriever(
    collection=collection,
    policy=policy,
    custom_filter_builder=MyAuthFilter()
)

Option 3: ACL-Based Documents

For documents with explicit access control lists:

from ragguard.filters import ACLFilterBuilder

# Documents have: {"acl": {"users": ["alice"], "groups": ["eng"], "public": false}}
retriever = QdrantSecureRetriever(
    collection=collection,
    policy=policy,
    custom_filter_builder=ACLFilterBuilder(
        get_user_groups=lambda user: fetch_groups_from_ldap(user["id"])
    )
)

Option 4: Enterprise Authorization Systems

Connect to dedicated authorization services (available in ragguard-enterprise):

System Description
OPA Open Policy Agent - policy as code
Cerbos Access control for cloud-native apps
OpenFGA Google Zanzibar-inspired fine-grained auth
Permit.io Permissions as a service
Auth0/Okta Identity provider integration

Supported Backends

Vector DBs Graph DBs
Qdrant, ChromaDB, Pinecone, pgvector, Weaviate, Milvus, FAISS, Elasticsearch, OpenSearch, Azure AI Search Neo4j, Neptune, TigerGraph, ArangoDB

Integrations

LangChain • LlamaIndex • LangGraph • CrewAI • DSPy • AWS Bedrock

Documentation

Guide Description
Getting Started Installation and basic setup
Policy Format Policy syntax and operators
Backends Database-specific examples
Integrations LangChain, LlamaIndex, etc.
Production Health checks, logging, async
Kubernetes K8s deployment guide
Security Security testing & guarantees
Use Cases Multi-tenant, healthcare, etc.
FAQ Common questions & limitations

Installation

# With a specific backend
pip install ragguard[qdrant]
pip install ragguard[chromadb]
pip install ragguard[pgvector]
pip install ragguard[pinecone]

# With framework integration
pip install ragguard[langchain]
pip install ragguard[llamaindex]

# Everything
pip install ragguard[all]

Python Compatibility: Fully tested on Python 3.9-3.13. Python 3.14 has limited support due to upstream dependencies (chromadb, langchain) not yet supporting Python 3.14.

Why RAGGuard?

Challenge Without RAGGuard With RAGGuard
Data leaks Filter after retrieval = data exposed Filter during search = zero exposure
Authorization Rebuild permission logic for RAG Plug in your existing auth system
Multi-database Custom filter code per DB One integration, 14 databases
Setup time Days/weeks 5 minutes
Security testing DIY Comprehensive test suite

License

Apache-2.0 - See LICENSE for details.


Built for the RAG communityExamplesGitHub

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ragguard-0.3.1.tar.gz (457.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ragguard-0.3.1-py3-none-any.whl (300.2 kB view details)

Uploaded Python 3

File details

Details for the file ragguard-0.3.1.tar.gz.

File metadata

  • Download URL: ragguard-0.3.1.tar.gz
  • Upload date:
  • Size: 457.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for ragguard-0.3.1.tar.gz
Algorithm Hash digest
SHA256 f87c298092e10946c881693e116e76fc7ed5fecec4bdc1074eac7b35cb2fcab9
MD5 7b4b514c7c772405ec53ad597a27a7f3
BLAKE2b-256 45f24c9eff0883d15666d7b2fa944f8da5be8a6a95d69af01af1ac8c49a99c19

See more details on using hashes here.

File details

Details for the file ragguard-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: ragguard-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 300.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for ragguard-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c3355c03f79349af33d7e886d4a21a967eed533b8abc9d98796e291bd55155e4
MD5 f9a9238b9db35e634bdb2991f6759b51
BLAKE2b-256 940a5f6b2c7f90765ee48cb62b132febcfe326d3a136d44e8c848b8b830039c9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page