DSGVO-konformes Erkennen und Ersetzen von PII in LLM-Prompts
Project description
privacy-guard
GDPR/DSGVO-compliant PII anonymisation for LLM workflows.
privacy-guard reliably detects personal data in German-language text, replaces it with stable placeholders, and enables clean restoration after processing. No ML-inference overhead at runtime for most detectors — clear results, API-ready.
Highlights
- 🔒 Compliance-first: protect sensitive data before it reaches external LLMs
- ⚡ Runtime-friendly: regex/rule-based detectors without a heavy inference stack
- 🔁 Deterministic: stable placeholders plus lossless restoration
- 🐳 Deploy-ready: Python package and FastAPI/Docker available out of the box
Why privacy-guard?
- Protects sensitive data before sending it to external models
- Replaces PII with deterministic placeholders such as
[NAME_1],[IBAN_1] - Restores original values via
ScanResult.restore() - Resolves overlapping matches with priority logic (e.g.
SECRET > IBAN > SOCIAL_SECURITY > EMAIL > …) - Supports Python-package and FastAPI/Docker operation
Detected PII Types
| Type | Example | Method |
|---|---|---|
NAME |
Dr. Anna Schmidt |
spaCy NER (de_core_news_sm) |
IBAN |
DE89 3704 0044 0532 0130 00 |
Regex + ISO 7064 check digit |
CREDIT_CARD |
4111 1111 1111 1111 |
Regex + Luhn algorithm |
PERSONAL_ID |
C22990047 |
Regex — Personalausweis & Reisepass (same format) |
SOCIAL_SECURITY |
12 345678 X 123 |
Regex — Rentenversicherungsnummer |
TAX_ID |
12 345 678 903 |
Regex + mod-11 check digit (§ 139b AO) |
PHONE |
+49 89 12345678 |
Regex — DACH formats |
EMAIL |
kontakt@example.de |
Regex |
ADDRESS |
Hauptstraße 12, 79100 Freiburg |
Regex built from data files |
SECRET |
AWS key, GitHub PAT, … | 100+ pattern rules (TOML) |
URL_SECRET |
?token=abc123def456 |
Regex — query parameter values |
Overlap priority: SECRET = URL_SECRET > IBAN = CREDIT_CARD = SOCIAL_SECURITY > PERSONAL_ID = TAX_ID = EMAIL > PHONE > ADDRESS > NAME
Public figures are excluded from masking by default via an internal whitelist (~1,000 entries).
Installation
Python Package
pip install privacy-guard-scanner
The name detector requires a spaCy model:
pip install "de_core_news_sm @ https://github.com/explosion/spacy-models/releases/download/de_core_news_sm-3.8.0/de_core_news_sm-3.8.0-py3-none-any.whl"
# or:
python -m spacy download de_core_news_sm
API Stack (local)
pip install -e ".[api]"
uvicorn api.main:app --reload --port 8000
Quickstart (Python)
from privacy_guard import PrivacyScanner
scanner = PrivacyScanner()
result = scanner.scan(
"Bitte überweise 500 EUR an Hans Müller, "
"IBAN DE89 3704 0044 0532 0130 00. "
"Rückfragen an h.mueller@example.de oder +49 89 123456."
)
print(result.anonymised_text)
# Bitte überweise 500 EUR an [NAME_1], IBAN [IBAN_1]. Rückfragen an [EMAIL_1] oder [PHONE_1].
print(result.mapping)
# {'[NAME_1]': 'Hans Müller', '[IBAN_1]': 'DE89 3704 0044 0532 0130 00', ...}
llm_answer = "Vielen Dank, [NAME_1]. Die Daten zu [IBAN_1] sind verarbeitet."
print(result.restore(llm_answer))
# Vielen Dank, Hans Müller. Die Daten zu DE89 3704 0044 0532 0130 00 sind verarbeitet.
Configuring the Scanner
from privacy_guard import PiiType, PrivacyScanner
scanner = PrivacyScanner(extra_whitelist_names=["Erika Musterfrau"])
scanner.disable_detector(PiiType.NAME)
scanner.enable_detector(PiiType.NAME)
result = scanner.scan("Contact: erika@example.de")
Filtering specific findings:
from privacy_guard import PiiType
secrets = [f for f in result.findings if f.pii_type == PiiType.SECRET]
for finding in secrets:
print(finding.rule_id, finding.text, finding.confidence)
Web UI
The API server includes a built-in HTMX interface — no separate process, no CDN dependencies.
uvicorn api.main:app --reload
# → http://localhost:8000
Login
An admin account with password admin is created by default (change via UI_ADMIN_PASSWORD).
After login three tabs are available:
| Tab | Description |
|---|---|
| Live Test | Enter text, select detectors, run a scan — view original and anonymised text side by side |
| History | All your own scans (admins see all users); click a row to see finding details |
| Dashboard | Overall statistics, PII-type bar chart, scans-per-day line chart (Chart.js) |
Admins additionally see the API Keys tab.
API Key Management (Admin)
Use the 🔑 API Keys tab to create and revoke any number of API keys:
- Enter a name → Generate key
- Copy the full key (
pg_…) — it is shown only once - Only the SHA-256 hash is stored; the prefix (
pg_xxxxxxxxx…) remains visible - Keys can be revoked individually at any time
The key set via the API_KEY environment variable remains valid in parallel (backwards compatibility).
REST API (Docker)
docker run -p 8000:8000 noxway/privacy-guard:latest
Or via Compose:
docker compose up
Endpoints
| Method | Path | Description |
|---|---|---|
GET |
/health |
Liveness check |
POST |
/scan |
Full scan (findings + mapping + anonymised text) |
POST |
/anonymize |
Return anonymised text only |
Request Body
{
"text": "Hans Müller, IBAN DE89370400440532013000",
"detectors": ["IBAN", "EMAIL"],
"whitelist": ["Hans Müller"]
}
Example with curl
curl -X POST http://localhost:8000/scan \
-H "Content-Type: application/json" \
-d '{"text": "Contact: hans@example.de, IBAN DE89370400440532013000", "detectors": ["EMAIL", "IBAN"]}'
With API key authentication:
curl -X POST http://localhost:8000/scan \
-H "Content-Type: application/json" \
-H "X-API-Key: pg_…" \
-d '{"text": "hans@example.de"}'
Configuration
| Variable | Default | Description |
|---|---|---|
API_KEY |
empty | If set, X-API-Key must be sent with every request (env-var key or DB key) |
CORS_ORIGINS |
* |
Comma-separated origins, e.g. https://app.example.com |
UI_DB_PATH |
ui.db |
Path to the SQLite database (users, scans, API keys) |
UI_ADMIN_PASSWORD |
admin |
Password for the automatically created admin account |
Example:
services:
api:
image: noxway/privacy-guard:latest
ports:
- "8000:8000"
environment:
API_KEY: my-secret-key
CORS_ORIGINS: https://app.example.com
UI_DB_PATH: /data/ui.db
UI_ADMIN_PASSWORD: secure123
volumes:
- ui_data:/data
volumes:
ui_data:
Roadmap Ideas
- Improved entity recognition for DACH address variants
- Optional audit logging for compliance reports
- Extended multilingual support beyond German
- Check-digit validation for Personalausweis/Reisepass
License
MIT. See LICENSE.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file privacy_guard_scanner-1.0.7.tar.gz.
File metadata
- Download URL: privacy_guard_scanner-1.0.7.tar.gz
- Upload date:
- Size: 30.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d97ab20bfdf7a8ea709c5a204162f7aaea822e2c608334104291808c704b81d5
|
|
| MD5 |
f651d58999f0d7d402a17ab310ec63c8
|
|
| BLAKE2b-256 |
1ea5dfed4658b7d594700b582ebe5369406a8f8f088f25276f9c845fb6731bd0
|
Provenance
The following attestation bundles were made for privacy_guard_scanner-1.0.7.tar.gz:
Publisher:
release.yml on adrian-lorenz/privacy-guard
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
privacy_guard_scanner-1.0.7.tar.gz -
Subject digest:
d97ab20bfdf7a8ea709c5a204162f7aaea822e2c608334104291808c704b81d5 - Sigstore transparency entry: 1004089460
- Sigstore integration time:
-
Permalink:
adrian-lorenz/privacy-guard@40f60ed483ac699570a53f0a900222aaea75909a -
Branch / Tag:
refs/heads/main - Owner: https://github.com/adrian-lorenz
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@40f60ed483ac699570a53f0a900222aaea75909a -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file privacy_guard_scanner-1.0.7-py3-none-any.whl.
File metadata
- Download URL: privacy_guard_scanner-1.0.7-py3-none-any.whl
- Upload date:
- Size: 37.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
73992e1d4c9837ffb920c94ebcc60330d302bfa6b8d80a6474098b74de1aa3d3
|
|
| MD5 |
54a35733898cdfd1a3d1b6762b785e2e
|
|
| BLAKE2b-256 |
02e05b2eb1600794599946966ed5125f45777646fa4cd0a28911f0e3b1314803
|
Provenance
The following attestation bundles were made for privacy_guard_scanner-1.0.7-py3-none-any.whl:
Publisher:
release.yml on adrian-lorenz/privacy-guard
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
privacy_guard_scanner-1.0.7-py3-none-any.whl -
Subject digest:
73992e1d4c9837ffb920c94ebcc60330d302bfa6b8d80a6474098b74de1aa3d3 - Sigstore transparency entry: 1004089461
- Sigstore integration time:
-
Permalink:
adrian-lorenz/privacy-guard@40f60ed483ac699570a53f0a900222aaea75909a -
Branch / Tag:
refs/heads/main - Owner: https://github.com/adrian-lorenz
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@40f60ed483ac699570a53f0a900222aaea75909a -
Trigger Event:
workflow_dispatch
-
Statement type: