PII anonymization middleware for AI agent conversations using LangChain integration.
Project description
PIIGhost
piighost is a Python library that detects PII (personally identifiable information), extracts them, applies corrections, and automatically anonymizes and deanonymizes sensitive entities (names, locations, etc.). With modules for bidirectional anonymization in AI agent conversations, it integrates via a LangChain middleware without modifying your existing agent code.
Features
- Detection: Detect PII with NER models, algorithms, and build your custom configuration with our detector composition component
- Span resolution: Resolve overlapping or nested detected spans to guarantee clean, non-redundant entities, especially when using multiple detectors
- Entity linking: Link different detections together, enabling typo tolerance and catching mentions that an NER model might miss
- Entity resolution: Resolve linked entity conflicts (e.g., one detector links A and B, another links B and C) to guarantee coherent final entities
- Anonymization: Anonymize detected entities with customizable placeholders (e.g.,
<<PERSON_1>>,<<LOCATION_1>>) to protect privacy while preserving text structure. A cache system remembers the applied anonymization and can reverse it for deanonymization - Placeholder Factory: Create custom placeholders for anonymization, with flexible naming strategies (counters, UUID, etc.) to fit your specific needs
- Middleware: Easily integrate
piighostinto your LangChain agents for transparent anonymization before and after model calls, without modifying your existing agent code
Installation
Basic installation
This project uses uv for dependency management.
uv add piighost
uv pip install piighost
Development installation
Clone the repository and install with dev dependencies:
git clone https://github.com/Athroniaeth/piighost.git
cd piighost
uv sync
Makefile helpers
Run the full lint suite with the provided Makefile:
make lint
This runs Ruff (format + lint) and PyReFly (type-check) through uv run.
Quick start
Standalone pipeline
import asyncio
from piighost.anonymizer import Anonymizer
from piighost.detector.gliner2 import Gliner2Detector
from piighost.pipeline import AnonymizationPipeline
from gliner2 import GLiNER2
model = GLiNER2.from_pretrained("urchade/gliner_multi-v2.1")
detector = Gliner2Detector(model=model, labels=["PERSON", "LOCATION"])
pipeline = AnonymizationPipeline(detector=detector, anonymizer=Anonymizer())
async def main():
text = "Patrick lives in Paris. Patrick loves Paris."
anonymized, entities = await pipeline.anonymize(text)
print(anonymized)
# <<PERSON_1>> lives in <<LOCATION_1>>. <<PERSON_1>> loves <<LOCATION_1>>.
original, _ = await pipeline.deanonymize(anonymized)
print(original)
# Patrick lives in Paris. Patrick loves Paris.
asyncio.run(main())
With LangChain middleware
from langchain.agents import create_agent
from langchain_core.tools import tool
from piighost.anonymizer import Anonymizer
from piighost.detector.gliner2 import Gliner2Detector
from piighost.pipeline import ThreadAnonymizationPipeline
from piighost.middleware import PIIAnonymizationMiddleware
from gliner2 import GLiNER2
@tool
def send_email(to: str, subject: str, body: str) -> str:
"""Send an email to a given address."""
return f"Email successfully sent to {to}."
model = GLiNER2.from_pretrained("urchade/gliner_multi-v2.1")
detector = Gliner2Detector(model=model, labels=["PERSON", "LOCATION"])
pipeline = ThreadAnonymizationPipeline(detector=detector, anonymizer=Anonymizer())
middleware = PIIAnonymizationMiddleware(pipeline=pipeline)
graph = create_agent(
model="openai:gpt-5.4",
system_prompt="You are a helpful assistant.",
tools=[send_email],
middleware=[middleware],
)
The middleware intercepts every agent turn the LLM only sees anonymized text, tools receive real values, and user-facing messages are deanonymized automatically.
Pipeline components
The pipeline runs 5 stages. Only detector and anonymizer are required — the others have sensible defaults:
| Stage | Default | Role | Without it |
|---|---|---|---|
| Detect | (required) | Finds PII spans via NER | — |
| Resolve Spans | ConfidenceSpanConflictResolver |
Deduplicates overlapping detections (keeps highest confidence) | Overlapping spans from multiple detectors cause garbled replacements |
| Link Entities | ExactEntityLinker |
Finds all occurrences of each entity via word-boundary regex | Only NER-detected mentions are anonymized; other occurrences leak through |
| Resolve Entities | MergeEntityConflictResolver |
Merges entity groups that share a mention (union-find) | Same entity could get two different placeholders |
| Anonymize | (required) | Replaces entities with placeholders (<<PERSON_1>>) |
— |
Each stage is a protocol — swap any default for your own implementation.
How it works
Anonymization pipeline
---
title: "piighost AnonymizationPipeline.anonymize() flow"
---
flowchart LR
classDef stage fill:#90CAF9,stroke:#1565C0,color:#000
classDef protocol fill:#FFF9C4,stroke:#F9A825,color:#000
classDef data fill:#A5D6A7,stroke:#2E7D32,color:#000
INPUT(["`**Input text**
_'Patrick lives in Paris.
Patrick loves Paris.'_`"]):::data
DETECT["`**1. Detect**
_AnyDetector_`"]:::stage
RESOLVE_SPANS["`**2. Resolve Spans**
_AnySpanConflictResolver_`"]:::stage
LINK["`**3. Link Entities**
_AnyEntityLinker_`"]:::stage
RESOLVE_ENTITIES["`**4. Resolve Entities**
_AnyEntityConflictResolver_`"]:::stage
ANONYMIZE["`**5. Anonymize**
_AnyAnonymizer_`"]:::stage
OUTPUT(["`**Output**
_'<<PERSON_1>> lives in <<LOCATION_1>>.
<<PERSON_1>> loves <<LOCATION_1>>.'_`"]):::data
INPUT --> DETECT
DETECT -- "list[Detection]" --> RESOLVE_SPANS
RESOLVE_SPANS -- "deduplicated detections" --> LINK
LINK -- "list[Entity]" --> RESOLVE_ENTITIES
RESOLVE_ENTITIES -- "merged entities" --> ANONYMIZE
ANONYMIZE --> OUTPUT
P_DETECT["`GlinerDetector
_(GLiNER2 NER)_`"]:::protocol
P_RESOLVE_SPANS["`ConfidenceSpanConflictResolver
_(highest confidence wins)_`"]:::protocol
P_LINK["`ExactEntityLinker
_(word-boundary regex)_`"]:::protocol
P_RESOLVE_ENTITIES["`MergeEntityConflictResolver
_(union-find merge)_`"]:::protocol
P_ANONYMIZE["`Anonymizer + CounterPlaceholderFactory
_(<<LABEL_N>> tags)_`"]:::protocol
P_DETECT -. "implements" .-> DETECT
P_RESOLVE_SPANS -. "implements" .-> RESOLVE_SPANS
P_LINK -. "implements" .-> LINK
P_RESOLVE_ENTITIES -. "implements" .-> RESOLVE_ENTITIES
P_ANONYMIZE -. "implements" .-> ANONYMIZE
Each stage uses a protocol (structural subtyping) swap GlinerDetector for spaCy, a remote API, or an ExactMatchDetector for tests. Same for every other stage.
Middleware integration
---
title: "piighost PIIAnonymizationMiddleware in an agent loop"
---
sequenceDiagram
participant U as User
participant M as Middleware
participant L as LLM
participant T as Tool
U->>M: "Send an email to Patrick in Paris"
M->>M: abefore_model()<br/>NER detect + anonymize
M->>L: "Send an email to <<PERSON_1>> in <<LOCATION_1>>"
L->>M: tool_call(send_email, to=<<PERSON_1>>)
M->>M: awrap_tool_call()<br/>deanonymize args
M->>T: send_email(to="Patrick")
T->>M: "Email sent to Patrick"
M->>M: awrap_tool_call()<br/>reanonymize result
M->>L: "Email sent to <<PERSON_1>>"
L->>M: "Done! Email sent to <<PERSON_1>>."
M->>M: aafter_model()<br/>deanonymize for user
M->>U: "Done! Email sent to Patrick."
Development
uv sync # Install dependencies
make lint # Format (ruff), lint (ruff), type-check (pyrefly)
uv run pytest # Run all tests
uv run pytest tests/ -k "test_name" # Run a single test
Contributing
- Commits: Conventional Commits via Commitizen (
feat:,fix:,refactor:, etc.) - Type checking: PyReFly (not mypy)
- Formatting/linting: Ruff
- Package manager: uv (not pip)
- Python: 3.12+
Ecosystem
- piighost-api — REST API server for PII anonymization inference. Loads a piighost pipeline once server-side and exposes anonymize/deanonymize via HTTP, so clients only need a lightweight HTTP client instead of embedding the NER model.
- piighost-chat — Demo chat app showcasing privacy-preserving AI conversations. Uses
PIIAnonymizationMiddlewarewith LangChain to anonymize messages before the LLM and deanonymize responses transparently. Built with SvelteKit, Litestar, and Docker Compose.
Additional notes
- The GLiNER2 model is downloaded from HuggingFace on first use (~500 MB)
- All data models are frozen dataclasses safe to share across threads
- Tests use
ExactMatchDetectorto avoid loading the real GLiNER2 model in CI
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file piighost-0.5.0.tar.gz.
File metadata
- Download URL: piighost-0.5.0.tar.gz
- Upload date:
- Size: 539.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4376d1b37b668b69a5229748e78b10f712f9d5f1d785e15ceaac774791f0d760
|
|
| MD5 |
d04b99a82b79cb420e15c233c5d01ae6
|
|
| BLAKE2b-256 |
0673f8811c4eb92026a658c9d3d284f0d7627e83a276e33c184ed676d34605f9
|
File details
Details for the file piighost-0.5.0-py3-none-any.whl.
File metadata
- Download URL: piighost-0.5.0-py3-none-any.whl
- Upload date:
- Size: 38.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cb72f00007e176fe994138837d5f0a6f779905c155a6b571d155104eac6eabf0
|
|
| MD5 |
c84a33223991de168cd7049f155a99eb
|
|
| BLAKE2b-256 |
9c712b7114d600542720f3d1825c6337101c2a2689625184d9af2ca5e73ff060
|