Open-source PII anonymization agent with reproducible benchmarking for OpenAI-compatible models
Project description
AnonLM
AnonLM is an open-source Python library for LLM-based PII anonymization with reproducible benchmarking.
It provides:
- A configurable anonymization engine for OpenAI-compatible providers.
- A stable Python API for anonymize/deanonymize workflows.
- A unified CLI for anonymization and benchmark execution.
- Benchmark history artifacts for auditability and experiment tracking.
Installation
pip install anonlm-pii
For development:
python -m venv .venv
source .venv/bin/activate
pip install -e .[dev,test]
Quickstart (Python API)
from anonlm import anonymize
result = anonymize("Contact Jane Doe at jane.doe@example.com or +34 600 123 456.")
print(result.anonymized_text)
print(result.mapping_forward)
print(result.chunking.chunk_count)
print(result.chunking.chunks)
Quickstart (CLI)
# Text input
anonlm anonymize --text "Contact Jane Doe at jane.doe@example.com"
# File input -> JSON output
anonlm anonymize --file input.txt --output output.json
# Benchmark run
anonlm benchmark run --dataset datasets/pii_mvp_dataset.csv --split dev
Configuration
Configuration precedence is:
- Explicit CLI flags
- Environment variables (
ANONLM_*) - Provider defaults
Core environment variables:
| Variable | Description |
|---|---|
ANONLM_PROVIDER |
openai, openrouter, groq, or custom |
ANONLM_MODEL_NAME |
Model identifier |
ANONLM_BASE_URL |
OpenAI-compatible base URL |
ANONLM_API_KEY_ENV |
Env var name containing API key |
ANONLM_API_KEY |
API key value |
ANONLM_TEMPERATURE |
LLM temperature |
ANONLM_MAX_CHUNK_CHARS |
Chunk size |
ANONLM_CHUNK_OVERLAP_CHARS |
Chunk overlap |
Provider examples:
# OpenAI
export ANONLM_PROVIDER=openai
export ANONLM_API_KEY=sk-...
# OpenRouter
export ANONLM_PROVIDER=openrouter
export ANONLM_API_KEY=...
export ANONLM_MODEL_NAME=openai/gpt-4o-mini
# Groq
export ANONLM_PROVIDER=groq
export ANONLM_API_KEY=...
export ANONLM_MODEL_NAME=llama-3.3-70b-versatile
# Custom OpenAI-compatible endpoint
export ANONLM_PROVIDER=custom
export ANONLM_BASE_URL=https://your.endpoint/v1
export ANONLM_API_KEY=...
Benchmarking
Run benchmark with deterministic document-based splits (dev, val, final):
anonlm benchmark run --dataset datasets/pii_mvp_dataset.csv --split dev --verbose
Optional benchmark controls:
anonlm benchmark run \
--dataset datasets/pii_mvp_dataset.csv \
--split val \
--history-dir runs/benchmarks \
--threshold-f1 0.80
Artifacts:
- JSON run detail:
runs/benchmarks/<timestamp>__<split>.json - CSV summary index:
runs/benchmarks/index.csv
See docs/benchmarking.md for protocol and interpretation guidelines.
Public API
anonlm.anonymize(text: str, config: AnonLMConfig | None = None) -> AnonymizationResultanonlm.deanonymize(text: str, mapping_reverse: dict[str, str]) -> stranonlm.create_engine(config: AnonLMConfig | None = None) -> AnonymizationEngine
AnonymizationResult includes chunking metadata in result.chunking (and in result.to_dict()["chunking"]):
chunk_count: total chunks processedchunks: chunk content list in processing ordermax_chunk_chars: chunk size setting usedchunk_overlap_chars: overlap setting used
Project status
Current status: 0.x (early API hardening). Expect minor breaking changes until 1.0.0.
Next objectives
- Reach
>90%reliability withgpt-oss-20bon the current baseline dataset (datasets/pii_mvp_dataset.csv). - Build a stronger benchmark dataset, likely by adapting a PII dataset from Hugging Face and normalizing it to AnonLM's benchmark format.
- Reach
>=90%reliability withgpt-oss-20bon the new dataset.
License
Apache-2.0
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file anonlm_pii-0.1.2.tar.gz.
File metadata
- Download URL: anonlm_pii-0.1.2.tar.gz
- Upload date:
- Size: 30.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c33a37eedca2c399420543d580f3545043319940283dad7536ddaaf68ffb2fe8
|
|
| MD5 |
5ec39b7e64adfedaf8a7e756959d024a
|
|
| BLAKE2b-256 |
4e4cd53b90cb6f8f6a8148b5b8a6bd78a3ca73dcb0ace343a35b4ebc202ac40c
|
File details
Details for the file anonlm_pii-0.1.2-py3-none-any.whl.
File metadata
- Download URL: anonlm_pii-0.1.2-py3-none-any.whl
- Upload date:
- Size: 25.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f03540d246a5c5e98d69a06d7fd7ce53c1a04ba9aa2feebf77858528e1108bd2
|
|
| MD5 |
b8dd4a0dccd54e53ff5748ed22cdd522
|
|
| BLAKE2b-256 |
0511f1e6570632f5c3541c6463f2453f34bca933c3c280c08ffc4fd696e66f85
|