FERPA-compliant document filter for Haystack RAG pipelines — enforces identity-scoped access control before documents reach the LLM

These details have not been verified by PyPI

Project links

Project description

ferpa-haystack

FERPA-compliant document filtering for Haystack RAG pipelines.

Enforces 34 CFR § 99 identity-scoped access control at the retrieval layer — before any document reaches the LLM context window.

The Problem

Standard Haystack pipelines retrieve documents and pass them directly to the LLM with no enforcement of who is allowed to see what. In higher-education deployments, this creates a structural FERPA compliance gap: a student advising chatbot may return another student's academic record, financial aid details, or disciplinary history in response to a query.

This component closes that gap by adding a two-layer compliance filter between your retriever and your LLM.

Architecture

Haystack Pipeline
     │
     ▼
InMemoryEmbeddingRetriever (or any retriever)
     │  documents (all retrieved)
     ▼
FERPAMetadataFilter
     │  Layer 1: Identity pre-filter (student_id + institution_id)
     │  Layer 2: Category authorization (academic_record, financial_aid, ...)
     │
     ├── documents ──────────────► LLM (only authorized records)
     └── disclosure_record ──────► Audit log (34 CFR § 99.32)

Documents without identity metadata (course catalogues, policy handbooks) pass through both layers unchanged — shared knowledge-base content is never blocked.

Installation

pip install ferpa-haystack

Quick Start

from haystack import Pipeline
from haystack.components.generators import OpenAIGenerator
from haystack.components.retrievers import InMemoryEmbeddingRetriever
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack_integrations.components.filters.ferpa_filter import FERPAMetadataFilter

doc_store = InMemoryDocumentStore()

ferpa_filter = FERPAMetadataFilter(
    student_id="stu_001",
    institution_id="univ_abc",
    authorized_categories=["academic_record", "financial_aid"],
    requesting_user_id="advisor_007",
)

pipeline = Pipeline()
pipeline.add_component("retriever", InMemoryEmbeddingRetriever(doc_store))
pipeline.add_component("ferpa_filter", ferpa_filter)
pipeline.add_component("llm", OpenAIGenerator(model="gpt-4o"))

pipeline.connect("retriever.documents", "ferpa_filter.documents")
pipeline.connect("ferpa_filter.documents", "llm.documents")

result = pipeline.run({"retriever": {"query_embedding": query_emb}})

# Only stu_001's authorized records reached the LLM
authorized_docs = result["ferpa_filter"]["documents"]

# 34 CFR § 99.32 audit entry — log this to your compliance system
audit_record = result["ferpa_filter"]["disclosure_record"]
print(audit_record.to_log_entry())

Filtering Layers

Layer 1 — Identity Pre-Filter

Documents are matched against student_id and institution_id metadata fields.

Document metadata	Outcome
No `student_id` or `institution_id`	Pass — treated as shared content
`student_id` matches	Continue to Layer 2
`student_id` does not match	Blocked

Layer 2 — Category Authorization

When authorized_categories is non-empty, the document's category field must be in the authorized set.

# Only academic records and financial aid — disciplinary records are blocked
FERPAMetadataFilter(
    student_id="stu_001",
    institution_id="univ_abc",
    authorized_categories=["academic_record", "financial_aid"],
    # "disciplinary" is blocked even if identity matches
)

Audit Record (34 CFR § 99.32)

Every call to run() produces a FERPADisclosureRecord regardless of how many documents are authorized:

@dataclass
class FERPADisclosureRecord:
    student_id: str
    institution_id: str
    requesting_user_id: str
    disclosed_at: datetime          # UTC timestamp
    total_retrieved: int            # documents from retriever
    total_disclosed: int            # documents that passed filtering
    categories_disclosed: list[str] # record categories in result
    pipeline_context: str           # pipeline/workflow label

Log it to your compliance database:

import logging
compliance_logger = logging.getLogger("ferpa.audit")
compliance_logger.info(result["ferpa_filter"]["disclosure_record"].to_log_entry())

Configuration

FERPAMetadataFilter(
    student_id="stu_001",
    institution_id="univ_abc",
    authorized_categories=["academic_record"],   # empty = all categories allowed
    requesting_user_id="advisor_007",            # recorded in audit log
    student_id_field="student_id",               # custom meta key
    institution_id_field="institution_id",       # custom meta key
    category_field="category",                   # custom meta key
    pipeline_context="advising_pipeline",        # audit label
    raise_on_violation=False,                    # True = raise PermissionError
)

Custom Field Names

If your document store uses different metadata keys:

FERPAMetadataFilter(
    student_id="stu_001",
    institution_id="univ_abc",
    student_id_field="learner_id",        # your custom key
    institution_id_field="campus_code",   # your custom key
    category_field="record_type",         # your custom key
)

Pipeline Serialization

The component is fully serializable for YAML/JSON pipeline storage:

pipeline.to_yaml("advising_pipeline.yaml")
pipeline_restored = Pipeline.from_yaml("advising_pipeline.yaml")

Regulatory Basis

Regulation	Section	What this component enforces
FERPA	34 CFR § 99.31(a)(1)	Legitimate educational interest — only authorized roles access records
FERPA	34 CFR § 99.32	Record of disclosures — structured audit entry on every access

Related Projects

enterprise-rag-patterns — FERPA, HIPAA, GDPR compliance patterns for RAG across 50+ regulated sectors
regulated-ai-governance — Policy enforcement for AI agents across 25 jurisdictions

License

Apache License 2.0 — see LICENSE

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.0

May 24, 2026

0.1.0

Apr 18, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ferpa_haystack-0.2.0.tar.gz (26.6 kB view details)

Uploaded May 24, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ferpa_haystack-0.2.0-py3-none-any.whl (15.7 kB view details)

Uploaded May 24, 2026 Python 3

File details

Details for the file ferpa_haystack-0.2.0.tar.gz.

File metadata

Download URL: ferpa_haystack-0.2.0.tar.gz
Upload date: May 24, 2026
Size: 26.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for ferpa_haystack-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`b33d9a7b4b9adcc7de1f8315cee2287f9e481ac0016295570e56947398648d6b`
MD5	`95bdf9ca7b0a1a8185994f42edf1ac67`
BLAKE2b-256	`223b4077b92ba37a665cd9ad9cf34c7f9d093954e385c7df6105953a4c9d47a0`

See more details on using hashes here.

File details

Details for the file ferpa_haystack-0.2.0-py3-none-any.whl.

File metadata

Download URL: ferpa_haystack-0.2.0-py3-none-any.whl
Upload date: May 24, 2026
Size: 15.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for ferpa_haystack-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`bcee172c16fba51b5f9279073bfb64c8360498841171ee88a4bbc19a4dc4f77f`
MD5	`d5e09bc17698171500a77b34dc920a01`
BLAKE2b-256	`032e145bc90c7d9dee3ff3a6bfcc42c65dfaea7b47b0a12653e182503ad684bb`

See more details on using hashes here.

ferpa-haystack 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

ferpa-haystack

The Problem

Architecture

Installation

Quick Start

Filtering Layers

Layer 1 — Identity Pre-Filter

Layer 2 — Category Authorization

Audit Record (34 CFR § 99.32)

Configuration

Custom Field Names

Pipeline Serialization

Regulatory Basis

Related Projects

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes