Skip to main content

Cross-industry compliance patterns for RAG pipelines: FERPA, HIPAA, GDPR, NIST AI RMF, OWASP LLM Top 10, and more. Vector store adapters, framework integrations, and audit logging.

Project description

enterprise-rag-patterns

CI PyPI Python License Downloads


The problem this solves

Standard RAG implementations retrieve documents and pass them directly to an LLM — with no enforcement of who is allowed to see what. In regulated environments (higher education, healthcare, financial services, government), this creates a structural compliance failure: a student receives another student's records, a patient's ePHI leaks into an unrelated clinical query, prompt injection hides in a retrieved document, and no audit log is produced.

This library provides the missing compliance layer — a cross-industry framework of pre-filters, identity scopes, risk assessors, and audit records that enforce regulatory requirements at the retrieval layer, before any document reaches the LLM context window.

Regulations covered: FERPA · HIPAA · GDPR · NIST AI RMF · OWASP LLM Top 10


Architecture

Session Token
     │
     ▼
StudentIdentityScope
(student_id + institution_id + authorized_categories + disclosure_reason)
     │
     ├─ Vector Store Pre-filter ──────────────────────────────────┐
     │   student_id + institution_id + categories checked here   │
     │   Only authorized documents enter the ranking stage       │
     │                                                            │
     ├─ Policy Layer Filter (defense-in-depth) ──────────────────┤
     │   Application-level identity re-check                     │
     │   Blocks any document that escaped the vector filter      │
     │                                                            │
     ├─ Audit Record ─────────────────────────────────────────────┤
     │   34 CFR § 99.32 Disclosure Log                           │
     │   Emitted before LLM sees any document                    │
     │                                                            │
     └─ LLM Context (authorized documents only) ─────────────────┘

Why pre-filter, not post-filter? Post-filtering is a UI concern, not a compliance control — the LLM has already processed the unauthorized record. FERPA and HIPAA require that disclosure not occur, not that unauthorized data be hidden after the fact. See docs/adr/ for the full architecture decision record.


Installation

pip install enterprise-rag-patterns

With framework extras:

pip install 'enterprise-rag-patterns[langchain]'
pip install 'enterprise-rag-patterns[llama-index]'
pip install 'enterprise-rag-patterns[haystack]'

60-second example

from enterprise_rag_patterns.compliance import (
    StudentIdentityScope,
    RecordCategory,
    FERPAContextPolicy,
    DisclosureReason,
)

# Build a verified scope from your session token — never from user input
scope = StudentIdentityScope(
    student_id="stu_001",
    institution_id="univ_abc",
    requesting_user_id="advisor_007",
    authorized_categories={RecordCategory.ACADEMIC_RECORD},
    disclosure_reason=DisclosureReason.SCHOOL_OFFICIAL,
)
policy = FERPAContextPolicy(scope=scope)

# Your retriever returns docs — filter before the LLM sees them
safe_docs = policy.filter_retrieved_documents(
    retrieved_docs,
    student_id_field="student_id",
    institution_id_field="institution_id",
    category_field="category",
)

# Emit a 34 CFR § 99.32 disclosure log entry
audit = policy.record_access(categories_accessed={RecordCategory.ACADEMIC_RECORD})
print(audit.to_log_entry())
# → {"record_id": "...", "student_id": "stu_001", "regulation": "FERPA",
#    "categories": ["academic_record"], "permitted": true, "timestamp": "..."}

See examples/ferpa_rag_pipeline.py for a complete runnable pipeline.


Framework integrations

Framework Integration Class Install Extra
LangChain FERPAComplianceCallbackHandler [langchain]
LlamaIndex FERPANodePostprocessor [llama-index]
Haystack 2.x FERPAHaystackFilter [haystack]
Pinecone PineconeComplianceFilter [pinecone]
Weaviate WeaviateComplianceFilter [weaviate]
Qdrant QdrantComplianceFilter [qdrant]
ChromaDB ChromaComplianceFilter [chromadb]

Cross-industry compliance coverage

Regulation / Framework Status Primary Sector RAG Controls
FERPA (34 CFR § 99) ✅ Implemented Education Identity scoping, 34 CFR § 99.32 audit log
GDPR (Articles 17, 32) ✅ Implemented EU / Global Right-to-erasure, data subject rights
HIPAA (45 CFR §§ 164.312, 164.502) ✅ Implemented Healthcare ePHI minimum-necessary, audit controls
NIST AI RMF 1.0 + AI 600-1 ✅ Implemented All sectors MAP/MEASURE/MANAGE risk assessment
OWASP LLM Top 10 (2025) ✅ Implemented Software / AI LLM01 injection, LLM02 PII disclosure
GLBA (16 CFR § 314) 🗓 Planned Financial services Customer record safeguards
SOC 2 Type II 🗓 Planned SaaS / Enterprise Context-based access control
EU AI Act 🗓 Planned EU / Global Article 12 tamper-evident audit logs

Four-layer defense-in-depth model

Layer 0: Query-time security    → OWASP (PII redaction, injection scanning)
Layer 1: Identity scoping       → FERPA / HIPAA (namespace + metadata filter)
Layer 2: Compliance filtering   → FERPA / HIPAA / GDPR (document-level rules)
Layer 3: Risk assessment + audit→ NIST AI RMF / HIPAA (structured audit records)

See docs/architecture.md for the full layered model.


Repository structure

src/enterprise_rag_patterns/
├── compliance.py               # FERPA identity scoping + 34 CFR § 99.32 audit
├── context.py                  # Multi-source context envelope assembly
├── session.py                  # Cross-channel session continuity
├── policy.py                   # Escalation and action-boundary policy objects
├── async_compliance.py         # Async wrappers for asyncio/FastAPI environments
├── regulations/
│   ├── gdpr.py                 # GDPR Article 17 right-to-erasure patterns
│   ├── hipaa.py                # HIPAA ePHI minimum-necessary + audit (NEW)
│   ├── nist_ai_rmf.py          # NIST AI RMF 1.0 + AI 600-1 risk assessment (NEW)
│   └── owasp_llm.py            # OWASP LLM Top 10 (2025) — LLM01/LLM02 (NEW)
├── vector_stores/
│   ├── pinecone_adapter.py     # PineconeComplianceFilter + namespace isolation
│   ├── weaviate_adapter.py     # WeaviateComplianceFilter
│   ├── qdrant_adapter.py       # QdrantComplianceFilter
│   └── chroma_adapter.py       # ChromaComplianceFilter
└── integrations/
    ├── langchain.py            # FERPAComplianceCallbackHandler (LangChain 0.3+)
    ├── langchain_lcel.py       # FERPAFilterRunnable + make_ferpa_chain (LCEL)
    ├── llama_index.py          # FERPANodePostprocessor (LlamaIndex)
    ├── llama_index_workflow.py # FERPAWorkflowStep (LlamaIndex 0.12+ Workflows)
    ├── haystack.py             # FERPAHaystackFilter (Haystack 2.x)
    └── maf.py                  # FERPAAgentMiddleware (Microsoft Agent Framework)
docs/
├── architecture.md             # Four-layer defense-in-depth model
├── adr/                        # Architecture decision records
└── implementation-note-*.md    # Implementation notes
examples/
└── ferpa_rag_pipeline.py       # Complete runnable FERPA-compliant pipeline

Published notes


Near-term roadmap

  • regulations/soc2.py — SOC 2 Type II context-based access control (CBAC)
  • regulations/eu_ai_act.py — EU AI Act Article 12 tamper-evident audit log with cryptographic signing
  • regulations/glba.py — GLBA Safeguards Rule financial record access controls
  • integrations/crewai.py — CrewAI policy-gated tool wrapper
  • Async vector store adapters for FastAPI/asyncio environments
  • ECOSYSTEM.md: compatibility matrix with current ecosystem versions

Contributing

Contributions are welcome. Please read CONTRIBUTING.md for guidelines and GOVERNANCE.md for the governance model. Run pytest tests/ -v to verify your changes before opening a pull request.


Citation

If you use these patterns in research or production, please cite:

@software{rana2026erp,
  author    = {Rana, Ashutosh},
  title     = {enterprise-rag-patterns: FERPA-compliant retrieval-augmented generation patterns},
  year      = {2026},
  url       = {https://github.com/ashutoshrana/enterprise-rag-patterns},
  license   = {MIT}
}

Or use GitHub's "Cite this repository" button above (reads CITATION.cff).


Part of the enterprise AI patterns trilogy

Library Focus Compliance
enterprise-rag-patterns What to retrieve FERPA, HIPAA, GDPR, NIST AI RMF, OWASP LLM
regulated-ai-governance What agents may do FERPA, HIPAA, GLBA policy enforcement
integration-automation-patterns How data flows Event-driven enterprise integration

License

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

enterprise_rag_patterns-0.5.0.tar.gz (62.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

enterprise_rag_patterns-0.5.0-py3-none-any.whl (60.4 kB view details)

Uploaded Python 3

File details

Details for the file enterprise_rag_patterns-0.5.0.tar.gz.

File metadata

  • Download URL: enterprise_rag_patterns-0.5.0.tar.gz
  • Upload date:
  • Size: 62.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for enterprise_rag_patterns-0.5.0.tar.gz
Algorithm Hash digest
SHA256 784fe2ce2dc67bf81255aa6a0dd8da700d0215523f479a11e02014d0c75fef4c
MD5 deb3d82b179227a2cdacc58b48ee74a7
BLAKE2b-256 1e642178a3c34fa1e6e4ad76e953dc233131f3b7e2fb0a6bd5382e782ed3a89d

See more details on using hashes here.

Provenance

The following attestation bundles were made for enterprise_rag_patterns-0.5.0.tar.gz:

Publisher: publish.yml on ashutoshrana/enterprise-rag-patterns

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file enterprise_rag_patterns-0.5.0-py3-none-any.whl.

File metadata

File hashes

Hashes for enterprise_rag_patterns-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a94022b72259fb4b9e1ebb6389bee4cbc889ef00c97fbecbf91758719f2c2d5a
MD5 6ffc3c7772d53e95764233b048e3c558
BLAKE2b-256 c64085a1f633576e0804a627dd699cfb39f16e4a3dbb56a4e8c4e98ba47f2a42

See more details on using hashes here.

Provenance

The following attestation bundles were made for enterprise_rag_patterns-0.5.0-py3-none-any.whl:

Publisher: publish.yml on ashutoshrana/enterprise-rag-patterns

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page