Skip to main content

Self-hosted PII redaction step for LLM observability pipelines.

Project description

any-lang-anonymizer

Drop-in PII redaction for teams that already use Langfuse.

pip install any-lang-anonymizer
from langfuse import Langfuse
from pii_redactor.integrations.langfuse import make_mask

langfuse = Langfuse(mask=make_mask())

That is the main API. make_mask() recursively scans every string Langfuse sends through the mask callback, including nested input, output, metadata, messages, tool calls, and custom fields.

Supported Integrations

Framework Status How to use
Langfuse SDK Ready Langfuse(mask=make_mask())
LangChain / LangGraph Ready make_langfuse_callback() with config={"callbacks": [...]}
Pydantic AI Coming soon Planned example
OpenAI SDK Coming soon Planned example

How It Works

any-lang-anonymizer runs the Bards AI ONNX PII model locally. Raw observability data is redacted in your app before it is sent to Langfuse.

The model is downloaded from Hugging Face on first use and cached locally by huggingface-hub.

Optional model config:

export PII_MODEL_ID=bardsai/eu-pii-anonimization-multilang
export PII_MODEL_CACHE_DIR=/path/to/model-cache

You can narrow or exclude JSON paths if needed:

langfuse = Langfuse(
    mask=make_mask(
        include_paths=["input", "output", "metadata", "messages.*.content", "tool_calls.*.args"],
        exclude_paths=["metadata.trace_id", "metadata.model", "usage"],
    )
)

By default, no path config is needed. Everything text-like is scanned.

LangChain / LangGraph

Langfuse traces LangChain and LangGraph through a callback handler. Use the same callback, but create the Langfuse client with the anonymizer mask first.

pip install 'any-lang-anonymizer[langchain]'
from langchain.agents import create_agent
from pii_redactor.integrations.langchain import make_langfuse_callback

langfuse_handler = make_langfuse_callback()
agent = create_agent(model="groq:llama-3.1-8b-instant", tools=[])

agent.invoke(
    {"messages": [{"role": "user", "content": "Jan Kowalski, jan.kowalski@example.com"}]},
    config={"callbacks": [langfuse_handler]},
)

create_agent runs on LangGraph internally, so this is the shortest LangGraph-backed agent path. A runnable example is in examples/langchain_langgraph_langfuse.py.

Model

The default model is bardsai/eu-pii-anonimization-multilang. The project is Apache-2.0 licensed, matching the model license. The model files are not bundled in this package.

Local Playground

The FastAPI playground is a demo tool, not part of the core library.

python3 -m pip install -e '.[server]'
python3 -m uvicorn playground.app:app --reload

Open:

http://127.0.0.1:8000

Examples

Runnable demos live in examples/.

python3 -m pip install -e '.[demo]'
PII_LLM_PROVIDER=demo python3 examples/langfuse_demo.py

For real LLM demo traces:

export LANGFUSE_PUBLIC_KEY=pk-lf-...
export LANGFUSE_SECRET_KEY=sk-lf-...
export LANGFUSE_BASE_URL=https://cloud.langfuse.com
export GROQ_API_KEY=gsk_...
python3 examples/langfuse_txt_demo.py --allow-leaks

The long TXT demo uses Groq llama-3.1-8b-instant by default when GROQ_API_KEY is set. It asks the LLM to append | checked to each non-empty line, then verifies locally whether known raw values remain after masking.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

any_lang_anonymizer-2026.6.7.tar.gz (11.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

any_lang_anonymizer-2026.6.7-py3-none-any.whl (14.0 kB view details)

Uploaded Python 3

File details

Details for the file any_lang_anonymizer-2026.6.7.tar.gz.

File metadata

  • Download URL: any_lang_anonymizer-2026.6.7.tar.gz
  • Upload date:
  • Size: 11.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for any_lang_anonymizer-2026.6.7.tar.gz
Algorithm Hash digest
SHA256 b9ed41672201166e05fa333076c37a1f3591ae82d06db9133ff360f91d4174b5
MD5 3508ad8ceda61484994dccd45da75012
BLAKE2b-256 350cf0516cff8aa613184a8a1934adce40975c5dd825395e4ab7f7d35ec0d863

See more details on using hashes here.

File details

Details for the file any_lang_anonymizer-2026.6.7-py3-none-any.whl.

File metadata

File hashes

Hashes for any_lang_anonymizer-2026.6.7-py3-none-any.whl
Algorithm Hash digest
SHA256 018a522c222c483c8387ed52b4eefa0f894b3043739b68fbfb75e3baf84972f7
MD5 025d31d1269bc2b6bf7a5b513ea83d84
BLAKE2b-256 f1066fc6a01c0b588e78bf1abb513bcf2cb70a82c50104361234081141f00241

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page