Skip to main content

Cross-industry compliance patterns for RAG pipelines: FERPA, HIPAA, GDPR, NIST AI RMF, OWASP LLM Top 10, and more. Vector store adapters, framework integrations, and audit logging.

Project description

enterprise-rag-patterns

CI PyPI Python License Downloads


The problem this solves

Standard RAG implementations retrieve documents and pass them directly to an LLM — with no enforcement of who is allowed to see what. In regulated environments (higher education, healthcare, financial services, government), this creates a structural compliance failure: a student receives another student's records, a patient's ePHI leaks into an unrelated clinical query, prompt injection hides in a retrieved document, and no audit log is produced.

This library provides the missing compliance layer — a cross-industry framework of pre-filters, identity scopes, risk assessors, and audit records that enforce regulatory requirements at the retrieval layer, before any document reaches the LLM context window.

Regulations covered: FERPA · HIPAA · GDPR · NIST AI RMF · OWASP LLM Top 10


Architecture

Session Token
     │
     ▼
StudentIdentityScope
(student_id + institution_id + authorized_categories + disclosure_reason)
     │
     ├─ Vector Store Pre-filter ──────────────────────────────────┐
     │   student_id + institution_id + categories checked here   │
     │   Only authorized documents enter the ranking stage       │
     │                                                            │
     ├─ Policy Layer Filter (defense-in-depth) ──────────────────┤
     │   Application-level identity re-check                     │
     │   Blocks any document that escaped the vector filter      │
     │                                                            │
     ├─ Audit Record ─────────────────────────────────────────────┤
     │   34 CFR § 99.32 Disclosure Log                           │
     │   Emitted before LLM sees any document                    │
     │                                                            │
     └─ LLM Context (authorized documents only) ─────────────────┘

Why pre-filter, not post-filter? Post-filtering is a UI concern, not a compliance control — the LLM has already processed the unauthorized record. FERPA and HIPAA require that disclosure not occur, not that unauthorized data be hidden after the fact. See docs/adr/ for the full architecture decision record.


Installation

pip install enterprise-rag-patterns

With framework extras:

pip install 'enterprise-rag-patterns[langchain]'
pip install 'enterprise-rag-patterns[llama-index]'
pip install 'enterprise-rag-patterns[haystack]'

60-second example

from enterprise_rag_patterns.compliance import (
    StudentIdentityScope,
    RecordCategory,
    FERPAContextPolicy,
    DisclosureReason,
)

# Build a verified scope from your session token — never from user input
scope = StudentIdentityScope(
    student_id="stu_001",
    institution_id="univ_abc",
    requesting_user_id="advisor_007",
    authorized_categories={RecordCategory.ACADEMIC_RECORD},
    disclosure_reason=DisclosureReason.SCHOOL_OFFICIAL,
)
policy = FERPAContextPolicy(scope=scope)

# Your retriever returns docs — filter before the LLM sees them
safe_docs = policy.filter_retrieved_documents(
    retrieved_docs,
    student_id_field="student_id",
    institution_id_field="institution_id",
    category_field="category",
)

# Emit a 34 CFR § 99.32 disclosure log entry
audit = policy.record_access(categories_accessed={RecordCategory.ACADEMIC_RECORD})
print(audit.to_log_entry())
# → {"record_id": "...", "student_id": "stu_001", "regulation": "FERPA",
#    "categories": ["academic_record"], "permitted": true, "timestamp": "..."}

See the examples/ directory for complete runnable pipelines:

Example Regulation What it shows
ferpa_rag_pipeline.py FERPA Four-layer FERPA-compliant pipeline
05_hipaa_rag_pipeline.py HIPAA Minimum-necessary ePHI filter + SHA-256 tamper-evidence
06_owasp_security_scan.py OWASP LLM01/LLM02 PII redaction + prompt injection scan
07_soc2_cbac_pipeline.py SOC 2 Type II Multi-tenant CBAC: tenant isolation, confidentiality tiers, role-based access
08_nist_ai_rmf_assessment.py NIST AI RMF MAP/MEASURE/MANAGE risk assessment + incident recording

Framework integrations

Framework Integration Class Install Extra
LangChain FERPAComplianceCallbackHandler [langchain]
LlamaIndex FERPANodePostprocessor [llama-index]
Haystack 2.x FERPAHaystackFilter [haystack]
Pinecone PineconeComplianceFilter [pinecone]
Weaviate WeaviateComplianceFilter [weaviate]
Qdrant QdrantComplianceFilter [qdrant]
ChromaDB ChromaComplianceFilter [chromadb]

Cross-industry compliance coverage

Regulation / Framework Status Primary Sector RAG Controls
FERPA (34 CFR § 99) ✅ Implemented Education Identity scoping, 34 CFR § 99.32 audit log
GDPR (Articles 17, 32) ✅ Implemented EU / Global Right-to-erasure, data subject rights
HIPAA (45 CFR §§ 164.312, 164.502) ✅ Implemented Healthcare ePHI minimum-necessary, audit controls
NIST AI RMF 1.0 + AI 600-1 ✅ Implemented All sectors MAP/MEASURE/MANAGE risk assessment
OWASP LLM Top 10 (2025) ✅ Implemented Software / AI LLM01 injection, LLM02 PII disclosure
SOC 2 Type II ✅ Implemented SaaS / Enterprise Tenant isolation, CBAC, CC7.2 audit log
ISO/IEC 27001:2022 ✅ Implemented All sectors ISMS classification, org isolation, CBAC (Annex A.5.12/A.5.15/A.8.2)
PCI DSS v4.0 ✅ Implemented Payments / Finance Merchant isolation, CHD CBAC, PAN masking (Req 3.4/7.2/7.2.1)
GLBA (16 CFR § 314) 🗓 Planned Financial services Customer record safeguards
EU AI Act 🗓 Planned EU / Global Article 12 tamper-evident audit logs

Four-layer defense-in-depth model

Layer 0: Query-time security    → OWASP (PII redaction, injection scanning)
Layer 1: Identity scoping       → FERPA / HIPAA (namespace + metadata filter)
Layer 2: Compliance filtering   → FERPA / HIPAA / GDPR (document-level rules)
Layer 3: Risk assessment + audit→ NIST AI RMF / HIPAA (structured audit records)

See docs/architecture.md for the full layered model.


Repository structure

src/enterprise_rag_patterns/
├── compliance.py               # FERPA identity scoping + 34 CFR § 99.32 audit
├── context.py                  # Multi-source context envelope assembly
├── session.py                  # Cross-channel session continuity
├── policy.py                   # Escalation and action-boundary policy objects
├── async_compliance.py         # Async wrappers for asyncio/FastAPI environments
├── regulations/
│   ├── gdpr.py                 # GDPR Article 17 right-to-erasure patterns
│   ├── hipaa.py                # HIPAA ePHI minimum-necessary + audit (NEW)
│   ├── iso27001.py             # ISO/IEC 27001:2022 ISMS CBAC — A.5.12/A.5.15/A.8.2/A.8.15
│   ├── nist_ai_rmf.py          # NIST AI RMF 1.0 + AI 600-1 risk assessment
│   ├── owasp_llm.py            # OWASP LLM Top 10 (2025) — LLM01/LLM02
│   ├── pci_dss.py              # PCI DSS v4.0 — Req 3.4/7.2/7.2.1/10.2.1 + PAN masking
│   └── soc2.py                 # SOC 2 Type II CBAC — CC6.1/CC6.6/C1.1/CC7.2
├── vector_stores/
│   ├── pinecone_adapter.py     # PineconeComplianceFilter + namespace isolation
│   ├── weaviate_adapter.py     # WeaviateComplianceFilter
│   ├── qdrant_adapter.py       # QdrantComplianceFilter
│   └── chroma_adapter.py       # ChromaComplianceFilter
└── integrations/
    ├── langchain.py            # FERPAComplianceCallbackHandler (LangChain 0.3+)
    ├── langchain_lcel.py       # FERPAFilterRunnable + make_ferpa_chain (LCEL)
    ├── llama_index.py          # FERPANodePostprocessor (LlamaIndex)
    ├── llama_index_workflow.py # FERPAWorkflowStep (LlamaIndex 0.12+ Workflows)
    ├── haystack.py             # FERPAHaystackFilter (Haystack 2.x)
    └── maf.py                  # FERPAAgentMiddleware (Microsoft Agent Framework)
docs/
├── architecture.md             # Four-layer defense-in-depth model
├── adr/                        # Architecture decision records
└── implementation-note-*.md    # Implementation notes
examples/
└── ferpa_rag_pipeline.py       # Complete runnable FERPA-compliant pipeline

Published notes


Near-term roadmap

  • regulations/eu_ai_act.py — EU AI Act Article 12 tamper-evident audit log with cryptographic signing
  • regulations/glba.py — GLBA Safeguards Rule financial record access controls
  • integrations/crewai.py — CrewAI policy-gated tool wrapper
  • Async vector store adapters for FastAPI/asyncio environments
  • ECOSYSTEM.md: compatibility matrix with current ecosystem versions

Contributing

Contributions are welcome. Please read CONTRIBUTING.md for guidelines and GOVERNANCE.md for the governance model. Run pytest tests/ -v to verify your changes before opening a pull request.


Citation

If you use these patterns in research or production, please cite:

@software{rana2026erp,
  author    = {Rana, Ashutosh},
  title     = {enterprise-rag-patterns: FERPA-compliant retrieval-augmented generation patterns},
  year      = {2026},
  url       = {https://github.com/ashutoshrana/enterprise-rag-patterns},
  license   = {MIT}
}

Or use GitHub's "Cite this repository" button above (reads CITATION.cff).


Part of the enterprise AI patterns trilogy

Library Focus Compliance
enterprise-rag-patterns What to retrieve FERPA, HIPAA, GDPR, NIST AI RMF, OWASP LLM
regulated-ai-governance What agents may do FERPA, HIPAA, GLBA policy enforcement
integration-automation-patterns How data flows Event-driven enterprise integration

License

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

enterprise_rag_patterns-0.5.3.tar.gz (88.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

enterprise_rag_patterns-0.5.3-py3-none-any.whl (83.2 kB view details)

Uploaded Python 3

File details

Details for the file enterprise_rag_patterns-0.5.3.tar.gz.

File metadata

  • Download URL: enterprise_rag_patterns-0.5.3.tar.gz
  • Upload date:
  • Size: 88.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for enterprise_rag_patterns-0.5.3.tar.gz
Algorithm Hash digest
SHA256 f6de5f21d8ae951ca7203cc8b1b558a701725074d41f14ccbd05134797ddd2ca
MD5 8b7da1758680a061bf2855733b448331
BLAKE2b-256 8331c4fe09240e6612d878286794d9aa3a468121f0c35d1b598f8e76d273d201

See more details on using hashes here.

Provenance

The following attestation bundles were made for enterprise_rag_patterns-0.5.3.tar.gz:

Publisher: publish.yml on ashutoshrana/enterprise-rag-patterns

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file enterprise_rag_patterns-0.5.3-py3-none-any.whl.

File metadata

File hashes

Hashes for enterprise_rag_patterns-0.5.3-py3-none-any.whl
Algorithm Hash digest
SHA256 58c4851ce18483647947d90ab49d6c3c900b288e067010414475009364c141aa
MD5 f17ca02aec3b6a187281bc76e5b012ad
BLAKE2b-256 c7abb968e0a32dfd7c2d5f5bc559f354ac9d4598f1917fdee6684bc70f83f864

See more details on using hashes here.

Provenance

The following attestation bundles were made for enterprise_rag_patterns-0.5.3-py3-none-any.whl:

Publisher: publish.yml on ashutoshrana/enterprise-rag-patterns

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page