Skip to main content

Cross-industry compliance patterns for RAG pipelines: FERPA, HIPAA, GDPR, NIST AI RMF, OWASP LLM Top 10, and more. Vector store adapters, framework integrations, and audit logging.

Project description

enterprise-rag-patterns

CI PyPI Python License Downloads


The problem this solves

Standard RAG implementations retrieve documents and pass them directly to an LLM — with no enforcement of who is allowed to see what. In regulated environments (higher education, healthcare, financial services, government), this creates a structural compliance failure: a student receives another student's records, a patient's ePHI leaks into an unrelated clinical query, prompt injection hides in a retrieved document, and no audit log is produced.

This library provides the missing compliance layer — a cross-industry framework of pre-filters, identity scopes, risk assessors, and audit records that enforce regulatory requirements at the retrieval layer, before any document reaches the LLM context window.

Regulations covered: FERPA · HIPAA · GDPR · NIST AI RMF · OWASP LLM Top 10


Architecture

Session Token
     │
     ▼
StudentIdentityScope
(student_id + institution_id + authorized_categories + disclosure_reason)
     │
     ├─ Vector Store Pre-filter ──────────────────────────────────┐
     │   student_id + institution_id + categories checked here   │
     │   Only authorized documents enter the ranking stage       │
     │                                                            │
     ├─ Policy Layer Filter (defense-in-depth) ──────────────────┤
     │   Application-level identity re-check                     │
     │   Blocks any document that escaped the vector filter      │
     │                                                            │
     ├─ Audit Record ─────────────────────────────────────────────┤
     │   34 CFR § 99.32 Disclosure Log                           │
     │   Emitted before LLM sees any document                    │
     │                                                            │
     └─ LLM Context (authorized documents only) ─────────────────┘

Why pre-filter, not post-filter? Post-filtering is a UI concern, not a compliance control — the LLM has already processed the unauthorized record. FERPA and HIPAA require that disclosure not occur, not that unauthorized data be hidden after the fact. See docs/adr/ for the full architecture decision record.


Installation

pip install enterprise-rag-patterns

With framework extras:

pip install 'enterprise-rag-patterns[langchain]'
pip install 'enterprise-rag-patterns[llama-index]'
pip install 'enterprise-rag-patterns[haystack]'

60-second example

from enterprise_rag_patterns.compliance import (
    StudentIdentityScope,
    RecordCategory,
    FERPAContextPolicy,
    DisclosureReason,
)

# Build a verified scope from your session token — never from user input
scope = StudentIdentityScope(
    student_id="stu_001",
    institution_id="univ_abc",
    requesting_user_id="advisor_007",
    authorized_categories={RecordCategory.ACADEMIC_RECORD},
    disclosure_reason=DisclosureReason.SCHOOL_OFFICIAL,
)
policy = FERPAContextPolicy(scope=scope)

# Your retriever returns docs — filter before the LLM sees them
safe_docs = policy.filter_retrieved_documents(
    retrieved_docs,
    student_id_field="student_id",
    institution_id_field="institution_id",
    category_field="category",
)

# Emit a 34 CFR § 99.32 disclosure log entry
audit = policy.record_access(categories_accessed={RecordCategory.ACADEMIC_RECORD})
print(audit.to_log_entry())
# → {"record_id": "...", "student_id": "stu_001", "regulation": "FERPA",
#    "categories": ["academic_record"], "permitted": true, "timestamp": "..."}

See examples/ferpa_rag_pipeline.py for a complete runnable pipeline.


Framework integrations

Framework Integration Class Install Extra
LangChain FERPAComplianceCallbackHandler [langchain]
LlamaIndex FERPANodePostprocessor [llama-index]
Haystack 2.x FERPAHaystackFilter [haystack]
Pinecone PineconeComplianceFilter [pinecone]
Weaviate WeaviateComplianceFilter [weaviate]
Qdrant QdrantComplianceFilter [qdrant]
ChromaDB ChromaComplianceFilter [chromadb]

Cross-industry compliance coverage

Regulation / Framework Status Primary Sector RAG Controls
FERPA (34 CFR § 99) ✅ Implemented Education Identity scoping, 34 CFR § 99.32 audit log
GDPR (Articles 17, 32) ✅ Implemented EU / Global Right-to-erasure, data subject rights
HIPAA (45 CFR §§ 164.312, 164.502) ✅ Implemented Healthcare ePHI minimum-necessary, audit controls
NIST AI RMF 1.0 + AI 600-1 ✅ Implemented All sectors MAP/MEASURE/MANAGE risk assessment
OWASP LLM Top 10 (2025) ✅ Implemented Software / AI LLM01 injection, LLM02 PII disclosure
SOC 2 Type II ✅ Implemented SaaS / Enterprise Tenant isolation, CBAC, CC7.2 audit log
GLBA (16 CFR § 314) 🗓 Planned Financial services Customer record safeguards
EU AI Act 🗓 Planned EU / Global Article 12 tamper-evident audit logs

Four-layer defense-in-depth model

Layer 0: Query-time security    → OWASP (PII redaction, injection scanning)
Layer 1: Identity scoping       → FERPA / HIPAA (namespace + metadata filter)
Layer 2: Compliance filtering   → FERPA / HIPAA / GDPR (document-level rules)
Layer 3: Risk assessment + audit→ NIST AI RMF / HIPAA (structured audit records)

See docs/architecture.md for the full layered model.


Repository structure

src/enterprise_rag_patterns/
├── compliance.py               # FERPA identity scoping + 34 CFR § 99.32 audit
├── context.py                  # Multi-source context envelope assembly
├── session.py                  # Cross-channel session continuity
├── policy.py                   # Escalation and action-boundary policy objects
├── async_compliance.py         # Async wrappers for asyncio/FastAPI environments
├── regulations/
│   ├── gdpr.py                 # GDPR Article 17 right-to-erasure patterns
│   ├── hipaa.py                # HIPAA ePHI minimum-necessary + audit (NEW)
│   ├── nist_ai_rmf.py          # NIST AI RMF 1.0 + AI 600-1 risk assessment
│   ├── owasp_llm.py            # OWASP LLM Top 10 (2025) — LLM01/LLM02
│   └── soc2.py                 # SOC 2 Type II CBAC — CC6.1/CC6.6/C1.1/CC7.2 (NEW)
├── vector_stores/
│   ├── pinecone_adapter.py     # PineconeComplianceFilter + namespace isolation
│   ├── weaviate_adapter.py     # WeaviateComplianceFilter
│   ├── qdrant_adapter.py       # QdrantComplianceFilter
│   └── chroma_adapter.py       # ChromaComplianceFilter
└── integrations/
    ├── langchain.py            # FERPAComplianceCallbackHandler (LangChain 0.3+)
    ├── langchain_lcel.py       # FERPAFilterRunnable + make_ferpa_chain (LCEL)
    ├── llama_index.py          # FERPANodePostprocessor (LlamaIndex)
    ├── llama_index_workflow.py # FERPAWorkflowStep (LlamaIndex 0.12+ Workflows)
    ├── haystack.py             # FERPAHaystackFilter (Haystack 2.x)
    └── maf.py                  # FERPAAgentMiddleware (Microsoft Agent Framework)
docs/
├── architecture.md             # Four-layer defense-in-depth model
├── adr/                        # Architecture decision records
└── implementation-note-*.md    # Implementation notes
examples/
└── ferpa_rag_pipeline.py       # Complete runnable FERPA-compliant pipeline

Published notes


Near-term roadmap

  • regulations/eu_ai_act.py — EU AI Act Article 12 tamper-evident audit log with cryptographic signing
  • regulations/glba.py — GLBA Safeguards Rule financial record access controls
  • integrations/crewai.py — CrewAI policy-gated tool wrapper
  • Async vector store adapters for FastAPI/asyncio environments
  • ECOSYSTEM.md: compatibility matrix with current ecosystem versions

Contributing

Contributions are welcome. Please read CONTRIBUTING.md for guidelines and GOVERNANCE.md for the governance model. Run pytest tests/ -v to verify your changes before opening a pull request.


Citation

If you use these patterns in research or production, please cite:

@software{rana2026erp,
  author    = {Rana, Ashutosh},
  title     = {enterprise-rag-patterns: FERPA-compliant retrieval-augmented generation patterns},
  year      = {2026},
  url       = {https://github.com/ashutoshrana/enterprise-rag-patterns},
  license   = {MIT}
}

Or use GitHub's "Cite this repository" button above (reads CITATION.cff).


Part of the enterprise AI patterns trilogy

Library Focus Compliance
enterprise-rag-patterns What to retrieve FERPA, HIPAA, GDPR, NIST AI RMF, OWASP LLM
regulated-ai-governance What agents may do FERPA, HIPAA, GLBA policy enforcement
integration-automation-patterns How data flows Event-driven enterprise integration

License

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

enterprise_rag_patterns-0.5.1.tar.gz (69.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

enterprise_rag_patterns-0.5.1-py3-none-any.whl (66.2 kB view details)

Uploaded Python 3

File details

Details for the file enterprise_rag_patterns-0.5.1.tar.gz.

File metadata

  • Download URL: enterprise_rag_patterns-0.5.1.tar.gz
  • Upload date:
  • Size: 69.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for enterprise_rag_patterns-0.5.1.tar.gz
Algorithm Hash digest
SHA256 cc288ac110398ecd77ad3f6380220f43dd8afcecd71848b256deee1656355e2a
MD5 f30c0c0538d71d253b8651973792e8ff
BLAKE2b-256 fe62b087f1c92969825df0da61987f04d631aa923ce8166fa3647ff5fc9974c8

See more details on using hashes here.

Provenance

The following attestation bundles were made for enterprise_rag_patterns-0.5.1.tar.gz:

Publisher: publish.yml on ashutoshrana/enterprise-rag-patterns

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file enterprise_rag_patterns-0.5.1-py3-none-any.whl.

File metadata

File hashes

Hashes for enterprise_rag_patterns-0.5.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f232b88c7b8c83b2c8bc121fa444734d1e4e3a85bfb1b81b2a200b9000d1808f
MD5 e8fae2285ddfa8e9416dcf49ba388ee1
BLAKE2b-256 d5b8120f9ab0ff498bd695a16cb3644ec9609bb7fc030fd7dbfafbcebf41d6a1

See more details on using hashes here.

Provenance

The following attestation bundles were made for enterprise_rag_patterns-0.5.1-py3-none-any.whl:

Publisher: publish.yml on ashutoshrana/enterprise-rag-patterns

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page