Mask sensitive data in documents using a local OpenAI-compatible LLM
Project description
llm-mask
A Python library that masks sensitive data in documents (PII, tokens, URLs, company names, etc.) using a local OpenAI-compatible LLM, and restores the original text via a saved mapping — no data leaves your infrastructure.
Installation
pip install llm-mask
Requirements
A running local LLM server with an OpenAI-compatible API, e.g. vLLM, LM Studio, or Ollama.
Quick start
from llm_mask import MaskingClient
client = MaskingClient(
base_url="http://localhost:8001/v1", # your LLM server
model="local-model",
language="ru", # "ru" or "en"
)
text = "Привет, меня зовут Иван, работаю в Apple. Email: ivan@apple.com"
# ── mask ──────────────────────────────────────────────────────────────
masked_text, mapping = client.mask(text)
# masked_text → "Привет, меня зовут <person_1>, работаю в <company_1>. Email: <email_1>"
# mapping → {"Иван": "<person_1>", "Apple": "<company_1>", "ivan@apple.com": "<email_1>"}
# ── unmask (no LLM call) ───────────────────────────────────────────────
original = client.unmask(masked_text, mapping)
# original → original text restored exactly
Attribute-style access also works:
result = client.mask(text)
result.masked_text
result.mapping
File & directory helpers
# Mask a file (nothing written to disk by default)
result = client.mask_file("document.md")
# Write masked file + mapping JSON to disk
result = client.mask_file(
"document.md",
save_masked=True, # → document_masked.md
save_mapping=True, # → document_mapping.json
mapping_dir="./mappings",
)
# Restore from files (no LLM)
original = client.unmask_file("document_masked.md", "document_mapping.json")
# Mask a whole directory
results = client.mask_directory(
"./docs",
pattern="*.md",
overwrite_originals=False, # writes *_masked.md next to originals
mapping_store_path="./mappings.json",
)
Configuration
| Parameter | Default | Description |
|---|---|---|
base_url |
http://localhost:8001/v1 |
LLM server base URL |
model |
local-model |
Model identifier |
api_key |
EMPTY |
API key (ignored by most local servers) |
language |
ru |
Built-in prompt language: "ru" or "en" |
chunk_size |
6000 |
Max characters per LLM call |
temperature |
0.0 |
Sampling temperature |
judge_model |
None |
Optional second LLM pass to catch missed entities |
Entity types
| Entity | Placeholder |
|---|---|
| URLs / domains | <url_1> |
| Service names | service_1 |
| Company / brand names | <company_1> |
| Person names / usernames | <person_1> |
| Email addresses | <email_1> |
| Phone numbers | <phone_1> |
| IP addresses | <ip_1> |
| Tokens / secrets / keys | <secret_1> |
| Numeric IDs | <id_1> |
| File paths | <path_1> |
| Project / code names | project_1 |
| Infrastructure names | <env_1>, <host_1> |
Mapping file format
{
"source_file": "document.md",
"masked_at": "2026-03-05T14:22:00Z",
"mapping": {
"Apple": "<company_1>",
"https://api.company.com": "<url_1>"
}
}
Development
git clone https://github.com/KodakV/llm-mask
cd llm-mask
pip install -e ".[dev]"
pytest
See CONTRIBUTING.md for contribution guidelines.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llm_mask-0.2.0.tar.gz.
File metadata
- Download URL: llm_mask-0.2.0.tar.gz
- Upload date:
- Size: 31.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eefb0c13dc40da83d518e45fdd0965fc4b017f795123f65cc3d9da0f2a01f7c5
|
|
| MD5 |
9f92d39c08e851ffaad53d5bfe4cd4d7
|
|
| BLAKE2b-256 |
77c803e84d5616bd65fca5b7793f3b4100e459ab88ebae4b27a4aff03d3df287
|
File details
Details for the file llm_mask-0.2.0-py3-none-any.whl.
File metadata
- Download URL: llm_mask-0.2.0-py3-none-any.whl
- Upload date:
- Size: 37.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9969d0dd20b3945a031d998efd5093e1cff62b3fc47647e4a8c59d0fa09ffd1b
|
|
| MD5 |
974b2ea95e7437c0233a59bef44c5e59
|
|
| BLAKE2b-256 |
d5a5094bd22aee59440f0d5a34382b00796717fd797acac25ae1b9de7f852258
|