Cross-industry compliance patterns for RAG pipelines: FERPA, HIPAA, GDPR, NIST AI RMF, OWASP LLM Top 10, and more. Vector store adapters, framework integrations, and audit logging.
Project description
enterprise-rag-patterns
The problem this solves
Standard RAG implementations retrieve documents and pass them directly to an LLM — with no enforcement of who is allowed to see what. In regulated environments (higher education, healthcare, financial services, government), this creates a structural compliance failure: a student receives another student's records, a patient's ePHI leaks into an unrelated clinical query, prompt injection hides in a retrieved document, and no audit log is produced.
This library provides the missing compliance layer — a cross-industry framework of pre-filters, identity scopes, risk assessors, and audit records that enforce regulatory requirements at the retrieval layer, before any document reaches the LLM context window.
Regulations covered: FERPA · HIPAA · GDPR · NIST AI RMF · OWASP LLM Top 10
Architecture
Session Token
│
▼
StudentIdentityScope
(student_id + institution_id + authorized_categories + disclosure_reason)
│
├─ Vector Store Pre-filter ──────────────────────────────────┐
│ student_id + institution_id + categories checked here │
│ Only authorized documents enter the ranking stage │
│ │
├─ Policy Layer Filter (defense-in-depth) ──────────────────┤
│ Application-level identity re-check │
│ Blocks any document that escaped the vector filter │
│ │
├─ Audit Record ─────────────────────────────────────────────┤
│ 34 CFR § 99.32 Disclosure Log │
│ Emitted before LLM sees any document │
│ │
└─ LLM Context (authorized documents only) ─────────────────┘
Why pre-filter, not post-filter? Post-filtering is a UI concern, not a compliance control — the LLM has already processed the unauthorized record. FERPA and HIPAA require that disclosure not occur, not that unauthorized data be hidden after the fact. See docs/adr/ for the full architecture decision record.
Installation
pip install enterprise-rag-patterns
With framework extras:
pip install 'enterprise-rag-patterns[langchain]'
pip install 'enterprise-rag-patterns[llama-index]'
pip install 'enterprise-rag-patterns[haystack]'
60-second example
from enterprise_rag_patterns.compliance import (
StudentIdentityScope,
RecordCategory,
FERPAContextPolicy,
DisclosureReason,
)
# Build a verified scope from your session token — never from user input
scope = StudentIdentityScope(
student_id="stu_001",
institution_id="univ_abc",
requesting_user_id="advisor_007",
authorized_categories={RecordCategory.ACADEMIC_RECORD},
disclosure_reason=DisclosureReason.SCHOOL_OFFICIAL,
)
policy = FERPAContextPolicy(scope=scope)
# Your retriever returns docs — filter before the LLM sees them
safe_docs = policy.filter_retrieved_documents(
retrieved_docs,
student_id_field="student_id",
institution_id_field="institution_id",
category_field="category",
)
# Emit a 34 CFR § 99.32 disclosure log entry
audit = policy.record_access(categories_accessed={RecordCategory.ACADEMIC_RECORD})
print(audit.to_log_entry())
# → {"record_id": "...", "student_id": "stu_001", "regulation": "FERPA",
# "categories": ["academic_record"], "permitted": true, "timestamp": "..."}
See examples/ferpa_rag_pipeline.py for a complete runnable pipeline.
Framework integrations
| Framework | Integration Class | Install Extra |
|---|---|---|
| LangChain | FERPAComplianceCallbackHandler |
[langchain] |
| LlamaIndex | FERPANodePostprocessor |
[llama-index] |
| Haystack 2.x | FERPAHaystackFilter |
[haystack] |
| Pinecone | PineconeComplianceFilter |
[pinecone] |
| Weaviate | WeaviateComplianceFilter |
[weaviate] |
| Qdrant | QdrantComplianceFilter |
[qdrant] |
| ChromaDB | ChromaComplianceFilter |
[chromadb] |
Cross-industry compliance coverage
| Regulation / Framework | Status | Primary Sector | RAG Controls |
|---|---|---|---|
| FERPA (34 CFR § 99) | ✅ Implemented | Education | Identity scoping, 34 CFR § 99.32 audit log |
| GDPR (Articles 17, 32) | ✅ Implemented | EU / Global | Right-to-erasure, data subject rights |
| HIPAA (45 CFR §§ 164.312, 164.502) | ✅ Implemented | Healthcare | ePHI minimum-necessary, audit controls |
| NIST AI RMF 1.0 + AI 600-1 | ✅ Implemented | All sectors | MAP/MEASURE/MANAGE risk assessment |
| OWASP LLM Top 10 (2025) | ✅ Implemented | Software / AI | LLM01 injection, LLM02 PII disclosure |
| SOC 2 Type II | ✅ Implemented | SaaS / Enterprise | Tenant isolation, CBAC, CC7.2 audit log |
| GLBA (16 CFR § 314) | 🗓 Planned | Financial services | Customer record safeguards |
| EU AI Act | 🗓 Planned | EU / Global | Article 12 tamper-evident audit logs |
Four-layer defense-in-depth model
Layer 0: Query-time security → OWASP (PII redaction, injection scanning)
Layer 1: Identity scoping → FERPA / HIPAA (namespace + metadata filter)
Layer 2: Compliance filtering → FERPA / HIPAA / GDPR (document-level rules)
Layer 3: Risk assessment + audit→ NIST AI RMF / HIPAA (structured audit records)
See docs/architecture.md for the full layered model.
Repository structure
src/enterprise_rag_patterns/
├── compliance.py # FERPA identity scoping + 34 CFR § 99.32 audit
├── context.py # Multi-source context envelope assembly
├── session.py # Cross-channel session continuity
├── policy.py # Escalation and action-boundary policy objects
├── async_compliance.py # Async wrappers for asyncio/FastAPI environments
├── regulations/
│ ├── gdpr.py # GDPR Article 17 right-to-erasure patterns
│ ├── hipaa.py # HIPAA ePHI minimum-necessary + audit (NEW)
│ ├── nist_ai_rmf.py # NIST AI RMF 1.0 + AI 600-1 risk assessment
│ ├── owasp_llm.py # OWASP LLM Top 10 (2025) — LLM01/LLM02
│ └── soc2.py # SOC 2 Type II CBAC — CC6.1/CC6.6/C1.1/CC7.2 (NEW)
├── vector_stores/
│ ├── pinecone_adapter.py # PineconeComplianceFilter + namespace isolation
│ ├── weaviate_adapter.py # WeaviateComplianceFilter
│ ├── qdrant_adapter.py # QdrantComplianceFilter
│ └── chroma_adapter.py # ChromaComplianceFilter
└── integrations/
├── langchain.py # FERPAComplianceCallbackHandler (LangChain 0.3+)
├── langchain_lcel.py # FERPAFilterRunnable + make_ferpa_chain (LCEL)
├── llama_index.py # FERPANodePostprocessor (LlamaIndex)
├── llama_index_workflow.py # FERPAWorkflowStep (LlamaIndex 0.12+ Workflows)
├── haystack.py # FERPAHaystackFilter (Haystack 2.x)
└── maf.py # FERPAAgentMiddleware (Microsoft Agent Framework)
docs/
├── architecture.md # Four-layer defense-in-depth model
├── adr/ # Architecture decision records
└── implementation-note-*.md # Implementation notes
examples/
└── ferpa_rag_pipeline.py # Complete runnable FERPA-compliant pipeline
Published notes
- Implementation Note 01 — Cross-channel continuity problem and solution
- Implementation Note 02 — FERPA boundaries in retrieval-augmented generation
- Production-Grade RAG in Regulated Enterprise Environments
Near-term roadmap
regulations/eu_ai_act.py— EU AI Act Article 12 tamper-evident audit log with cryptographic signingregulations/glba.py— GLBA Safeguards Rule financial record access controlsintegrations/crewai.py— CrewAI policy-gated tool wrapper- Async vector store adapters for FastAPI/asyncio environments
- ECOSYSTEM.md: compatibility matrix with current ecosystem versions
Contributing
Contributions are welcome. Please read CONTRIBUTING.md for guidelines and GOVERNANCE.md for the governance model. Run pytest tests/ -v to verify your changes before opening a pull request.
Citation
If you use these patterns in research or production, please cite:
@software{rana2026erp,
author = {Rana, Ashutosh},
title = {enterprise-rag-patterns: FERPA-compliant retrieval-augmented generation patterns},
year = {2026},
url = {https://github.com/ashutoshrana/enterprise-rag-patterns},
license = {MIT}
}
Or use GitHub's "Cite this repository" button above (reads CITATION.cff).
Part of the enterprise AI patterns trilogy
| Library | Focus | Compliance |
|---|---|---|
| enterprise-rag-patterns | What to retrieve | FERPA, HIPAA, GDPR, NIST AI RMF, OWASP LLM |
| regulated-ai-governance | What agents may do | FERPA, HIPAA, GLBA policy enforcement |
| integration-automation-patterns | How data flows | Event-driven enterprise integration |
License
MIT — see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file enterprise_rag_patterns-0.5.1.tar.gz.
File metadata
- Download URL: enterprise_rag_patterns-0.5.1.tar.gz
- Upload date:
- Size: 69.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cc288ac110398ecd77ad3f6380220f43dd8afcecd71848b256deee1656355e2a
|
|
| MD5 |
f30c0c0538d71d253b8651973792e8ff
|
|
| BLAKE2b-256 |
fe62b087f1c92969825df0da61987f04d631aa923ce8166fa3647ff5fc9974c8
|
Provenance
The following attestation bundles were made for enterprise_rag_patterns-0.5.1.tar.gz:
Publisher:
publish.yml on ashutoshrana/enterprise-rag-patterns
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
enterprise_rag_patterns-0.5.1.tar.gz -
Subject digest:
cc288ac110398ecd77ad3f6380220f43dd8afcecd71848b256deee1656355e2a - Sigstore transparency entry: 1282414712
- Sigstore integration time:
-
Permalink:
ashutoshrana/enterprise-rag-patterns@329bb935bbb134c364e4aa9b8b37f555ebfcf425 -
Branch / Tag:
refs/tags/v0.5.1 - Owner: https://github.com/ashutoshrana
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@329bb935bbb134c364e4aa9b8b37f555ebfcf425 -
Trigger Event:
release
-
Statement type:
File details
Details for the file enterprise_rag_patterns-0.5.1-py3-none-any.whl.
File metadata
- Download URL: enterprise_rag_patterns-0.5.1-py3-none-any.whl
- Upload date:
- Size: 66.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f232b88c7b8c83b2c8bc121fa444734d1e4e3a85bfb1b81b2a200b9000d1808f
|
|
| MD5 |
e8fae2285ddfa8e9416dcf49ba388ee1
|
|
| BLAKE2b-256 |
d5b8120f9ab0ff498bd695a16cb3644ec9609bb7fc030fd7dbfafbcebf41d6a1
|
Provenance
The following attestation bundles were made for enterprise_rag_patterns-0.5.1-py3-none-any.whl:
Publisher:
publish.yml on ashutoshrana/enterprise-rag-patterns
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
enterprise_rag_patterns-0.5.1-py3-none-any.whl -
Subject digest:
f232b88c7b8c83b2c8bc121fa444734d1e4e3a85bfb1b81b2a200b9000d1808f - Sigstore transparency entry: 1282414771
- Sigstore integration time:
-
Permalink:
ashutoshrana/enterprise-rag-patterns@329bb935bbb134c364e4aa9b8b37f555ebfcf425 -
Branch / Tag:
refs/tags/v0.5.1 - Owner: https://github.com/ashutoshrana
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@329bb935bbb134c364e4aa9b8b37f555ebfcf425 -
Trigger Event:
release
-
Statement type: