PII anonymization middleware for AI agent conversations using LangChain integration.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

Athroniaeth

These details have not been verified by PyPI

Project description

PIIGhost

Python Version from PEP 621 TOML

piighost is a Python library that detects PII (personally identifiable information), extracts them, applies corrections, and automatically anonymizes and deanonymizes sensitive entities (names, locations, etc.). With modules for bidirectional anonymization in AI agent conversations, it integrates via a LangChain middleware without modifying your existing agent code.

Features

Detection: Detect PII with NER models, algorithms, and build your custom configuration with our detector composition component
Span resolution: Resolve overlapping or nested detected spans to guarantee clean, non-redundant entities, especially when using multiple detectors
Entity linking: Link different detections together, enabling typo tolerance and catching mentions that an NER model might miss
Entity resolution: Resolve linked entity conflicts (e.g., one detector links A and B, another links B and C) to guarantee coherent final entities
Anonymization: Anonymize detected entities with customizable placeholders (e.g., <<PERSON_1>>, <<LOCATION_1>>) to protect privacy while preserving text structure. A cache system remembers the applied anonymization and can reverse it for deanonymization
Placeholder Factory: Create custom placeholders for anonymization, with flexible naming strategies (counters, UUID, etc.) to fit your specific needs
Middleware: Easily integrate piighost into your LangChain agents for transparent anonymization before and after model calls, without modifying your existing agent code

Installation

Basic installation

This project uses uv for dependency management.

uv add piighost
uv pip install piighost

Development installation

Clone the repository and install with dev dependencies:

git clone https://github.com/Athroniaeth/piighost.git
cd piighost
uv sync

Makefile helpers

Run the full lint suite with the provided Makefile:

make lint

This runs Ruff (format + lint) and PyReFly (type-check) through uv run.

Quick start

Standalone pipeline

import asyncio

from piighost.anonymizer import Anonymizer
from piighost.detector.gliner2 import Gliner2Detector
from piighost.pipeline import AnonymizationPipeline

from gliner2 import GLiNER2

model = GLiNER2.from_pretrained("urchade/gliner_multi-v2.1")
detector = Gliner2Detector(model=model, labels=["PERSON", "LOCATION"])
pipeline = AnonymizationPipeline(detector=detector, anonymizer=Anonymizer())


async def main():
    text = "Patrick lives in Paris. Patrick loves Paris."
    anonymized, entities = await pipeline.anonymize(text)
    print(anonymized)
    # <<PERSON_1>> lives in <<LOCATION_1>>. <<PERSON_1>> loves <<LOCATION_1>>.

    original, _ = await pipeline.deanonymize(anonymized)
    print(original)
    # Patrick lives in Paris. Patrick loves Paris.


asyncio.run(main())

With LangChain middleware

from langchain.agents import create_agent
from langchain_core.tools import tool

from piighost.anonymizer import Anonymizer
from piighost.detector.gliner2 import Gliner2Detector
from piighost.pipeline import ThreadAnonymizationPipeline
from piighost.middleware import PIIAnonymizationMiddleware

from gliner2 import GLiNER2


@tool
def send_email(to: str, subject: str, body: str) -> str:
    """Send an email to a given address."""
    return f"Email successfully sent to {to}."


model = GLiNER2.from_pretrained("urchade/gliner_multi-v2.1")
detector = Gliner2Detector(model=model, labels=["PERSON", "LOCATION"])
pipeline = ThreadAnonymizationPipeline(detector=detector, anonymizer=Anonymizer())
middleware = PIIAnonymizationMiddleware(pipeline=pipeline)

graph = create_agent(
    model="openai:gpt-5.4",
    system_prompt="You are a helpful assistant.",
    tools=[send_email],
    middleware=[middleware],
)

The middleware intercepts every agent turn the LLM only sees anonymized text, tools receive real values, and user-facing messages are deanonymized automatically.

Pipeline components

The pipeline runs 5 stages. Only detector and anonymizer are required — the others have sensible defaults:

Stage	Default	Role	Without it
Detect	(required)	Finds PII spans via NER	—
Resolve Spans	`ConfidenceSpanConflictResolver`	Deduplicates overlapping detections (keeps highest confidence)	Overlapping spans from multiple detectors cause garbled replacements
Link Entities	`ExactEntityLinker`	Finds all occurrences of each entity via word-boundary regex	Only NER-detected mentions are anonymized; other occurrences leak through
Resolve Entities	`MergeEntityConflictResolver`	Merges entity groups that share a mention (union-find)	Same entity could get two different placeholders
Anonymize	(required)	Replaces entities with placeholders (`<<PERSON_1>>`)	—

Each stage is a protocol — swap any default for your own implementation.

How it works

Anonymization pipeline

---
title: "piighost AnonymizationPipeline.anonymize() flow"
---
flowchart LR
    classDef stage fill:#90CAF9,stroke:#1565C0,color:#000
    classDef protocol fill:#FFF9C4,stroke:#F9A825,color:#000
    classDef data fill:#A5D6A7,stroke:#2E7D32,color:#000

    INPUT(["`**Input text**
    _'Patrick lives in Paris.
    Patrick loves Paris.'_`"]):::data

    DETECT["`**1. Detect**
    _AnyDetector_`"]:::stage
    RESOLVE_SPANS["`**2. Resolve Spans**
    _AnySpanConflictResolver_`"]:::stage
    LINK["`**3. Link Entities**
    _AnyEntityLinker_`"]:::stage
    RESOLVE_ENTITIES["`**4. Resolve Entities**
    _AnyEntityConflictResolver_`"]:::stage
    ANONYMIZE["`**5. Anonymize**
    _AnyAnonymizer_`"]:::stage

    OUTPUT(["`**Output**
    _'<<PERSON_1>> lives in <<LOCATION_1>>.
    <<PERSON_1>> loves <<LOCATION_1>>.'_`"]):::data

    INPUT --> DETECT
    DETECT -- "list[Detection]" --> RESOLVE_SPANS
    RESOLVE_SPANS -- "deduplicated detections" --> LINK
    LINK -- "list[Entity]" --> RESOLVE_ENTITIES
    RESOLVE_ENTITIES -- "merged entities" --> ANONYMIZE
    ANONYMIZE --> OUTPUT

    P_DETECT["`GlinerDetector
    _(GLiNER2 NER)_`"]:::protocol
    P_RESOLVE_SPANS["`ConfidenceSpanConflictResolver
    _(highest confidence wins)_`"]:::protocol
    P_LINK["`ExactEntityLinker
    _(word-boundary regex)_`"]:::protocol
    P_RESOLVE_ENTITIES["`MergeEntityConflictResolver
    _(union-find merge)_`"]:::protocol
    P_ANONYMIZE["`Anonymizer + CounterPlaceholderFactory
    _(<<LABEL_N>> tags)_`"]:::protocol

    P_DETECT -. "implements" .-> DETECT
    P_RESOLVE_SPANS -. "implements" .-> RESOLVE_SPANS
    P_LINK -. "implements" .-> LINK
    P_RESOLVE_ENTITIES -. "implements" .-> RESOLVE_ENTITIES
    P_ANONYMIZE -. "implements" .-> ANONYMIZE

Each stage uses a protocol (structural subtyping) swap GlinerDetector for spaCy, a remote API, or an ExactMatchDetector for tests. Same for every other stage.

Middleware integration

---
title: "piighost PIIAnonymizationMiddleware in an agent loop"
---
sequenceDiagram
    participant U as User
    participant M as Middleware
    participant L as LLM
    participant T as Tool

    U->>M: "Send an email to Patrick in Paris"
    M->>M: abefore_model()<br/>NER detect + anonymize
    M->>L: "Send an email to <<PERSON_1>> in <<LOCATION_1>>"
    L->>M: tool_call(send_email, to=<<PERSON_1>>)
    M->>M: awrap_tool_call()<br/>deanonymize args
    M->>T: send_email(to="Patrick")
    T->>M: "Email sent to Patrick"
    M->>M: awrap_tool_call()<br/>reanonymize result
    M->>L: "Email sent to <<PERSON_1>>"
    L->>M: "Done! Email sent to <<PERSON_1>>."
    M->>M: aafter_model()<br/>deanonymize for user
    M->>U: "Done! Email sent to Patrick."

Development

uv sync                      # Install dependencies
make lint                    # Format (ruff), lint (ruff), type-check (pyrefly)
uv run pytest                # Run all tests
uv run pytest tests/ -k "test_name"  # Run a single test

Contributing

Commits: Conventional Commits via Commitizen (feat:, fix:, refactor:, etc.)
Type checking: PyReFly (not mypy)
Formatting/linting: Ruff
Package manager: uv (not pip)
Python: 3.12+

Ecosystem

piighost-api — REST API server for PII anonymization inference. Loads a piighost pipeline once server-side and exposes anonymize/deanonymize via HTTP, so clients only need a lightweight HTTP client instead of embedding the NER model.
piighost-chat — Demo chat app showcasing privacy-preserving AI conversations. Uses PIIAnonymizationMiddleware with LangChain to anonymize messages before the LLM and deanonymize responses transparently. Built with SvelteKit, Litestar, and Docker Compose.

Additional notes

The GLiNER2 model is downloaded from HuggingFace on first use (~500 MB)
All data models are frozen dataclasses safe to share across threads
Tests use ExactMatchDetector to avoid loading the real GLiNER2 model in CI

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

Athroniaeth

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.11.0

May 4, 2026

0.10.0

Apr 30, 2026

0.9.1

Apr 25, 2026

0.9.0

Apr 25, 2026

0.8.0

Apr 23, 2026

0.7.0

Apr 16, 2026

0.6.0

Apr 16, 2026

0.5.1

Apr 6, 2026

This version

0.5.0

Mar 31, 2026

0.4.2

Mar 30, 2026

0.4.1

Mar 30, 2026

0.4.0

Mar 30, 2026

0.3.0

Mar 29, 2026

0.2.0

Mar 28, 2026

0.1.0

Mar 23, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

piighost-0.5.0.tar.gz (539.2 kB view details)

Uploaded Mar 31, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

piighost-0.5.0-py3-none-any.whl (38.4 kB view details)

Uploaded Mar 31, 2026 Python 3

File details

Details for the file piighost-0.5.0.tar.gz.

File metadata

Download URL: piighost-0.5.0.tar.gz
Upload date: Mar 31, 2026
Size: 539.2 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for piighost-0.5.0.tar.gz
Algorithm	Hash digest
SHA256	`4376d1b37b668b69a5229748e78b10f712f9d5f1d785e15ceaac774791f0d760`
MD5	`d04b99a82b79cb420e15c233c5d01ae6`
BLAKE2b-256	`0673f8811c4eb92026a658c9d3d284f0d7627e83a276e33c184ed676d34605f9`

See more details on using hashes here.

File details

Details for the file piighost-0.5.0-py3-none-any.whl.

File metadata

Download URL: piighost-0.5.0-py3-none-any.whl
Upload date: Mar 31, 2026
Size: 38.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for piighost-0.5.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`cb72f00007e176fe994138837d5f0a6f779905c155a6b571d155104eac6eabf0`
MD5	`c84a33223991de168cd7049f155a99eb`
BLAKE2b-256	`9c712b7114d600542720f3d1825c6337101c2a2689625184d9af2ca5e73ff060`

See more details on using hashes here.

piighost 0.5.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

PIIGhost

Features

Installation

Basic installation

Development installation

Makefile helpers

Quick start

Standalone pipeline

With LangChain middleware

Pipeline components

How it works

Anonymization pipeline

Middleware integration

Development

Contributing

Ecosystem

Additional notes

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes