DSGVO-konformes Erkennen und Ersetzen von PII in LLM-Prompts
Project description
privacy-guard
DSGVO/GDPR-konforme PII-Anonymisierung für LLM-Workflows.
privacy-guard erkennt personenbezogene Daten zuverlässig in deutschem Text,
ersetzt sie durch stabile Platzhalter und ermöglicht eine saubere Rückführung nach der Verarbeitung.
Kein ML-Inference-Overhead zur Laufzeit für die meisten Detektoren, klare Ergebnisse, API-ready.
Highlights
- 🔒 Compliance-first: Schutz sensibler Daten vor externen LLMs
- ⚡ Runtime-freundlich: Regex/Regel-Detektoren ohne schweren Inference-Stack
- 🔁 Deterministisch: stabile Platzhalter plus verlustfreie Rückführung
- 🐳 Deploy-ready: Python Package und FastAPI/Docker sofort nutzbar
Warum privacy-guard?
- Schützt sensible Daten vor dem Versand an externe Modelle
- Ersetzt PII durch deterministische Platzhalter wie
[NAME_1],[IBAN_1] - Stellt Originalwerte mit
ScanResult.restore()wieder her - Löst überlappende Treffer mit Prioritätslogik (z. B.
SECRET > IBAN > EMAIL > ...) - Unterstützt Python-Package und FastAPI/Docker-Betrieb
Erfasste PII-Typen
| Typ | Beispiel | Hinweis |
|---|---|---|
NAME |
Dr. Anna Schmidt |
via spaCy NER (de_core_news_sm) |
IBAN |
DE89 3704 0044 0532 0130 00 |
inkl. ISO-7064-Prüfung |
PHONE |
+49 89 12345678 |
deutschsprachige Formate |
EMAIL |
kontakt@example.de |
RFC-nahe Muster |
ADDRESS |
Hauptstraße 12, 79100 Freiburg |
regelbasiert |
SECRET |
API-Keys, Tokens, Passwörter | 100+ Musterregeln |
Zusätzlich: Personen des öffentlichen Lebens werden per interner Liste standardmäßig nicht maskiert.
Installation
Python Package
pip install privacy-guard-scanner
Für den Namensdetektor wird ein spaCy-Modell benötigt:
pip install "de_core_news_sm @ https://github.com/explosion/spacy-models/releases/download/de_core_news_sm-3.8.0/de_core_news_sm-3.8.0-py3-none-any.whl"
# oder:
python -m spacy download de_core_news_sm
API-Stack lokal
pip install -e ".[api]"
uvicorn api.main:app --reload --port 8000
Quickstart (Python)
from privacy_guard import PrivacyScanner
scanner = PrivacyScanner()
result = scanner.scan(
"Bitte überweise 500 EUR an Hans Müller, "
"IBAN DE89 3704 0044 0532 0130 00. "
"Rückfragen an h.mueller@example.de oder +49 89 123456."
)
print(result.anonymised_text)
# Bitte überweise 500 EUR an [NAME_1], IBAN [IBAN_1]. Rückfragen an [EMAIL_1] oder [PHONE_1].
print(result.mapping)
# {'[NAME_1]': 'Hans Müller', '[IBAN_1]': 'DE89 3704 0044 0532 0130 00', ...}
llm_answer = "Vielen Dank, [NAME_1]. Die Daten zu [IBAN_1] sind verarbeitet."
print(result.restore(llm_answer))
# Vielen Dank, Hans Müller. Die Daten zu DE89 3704 0044 0532 0130 00 sind verarbeitet.
Scanner konfigurieren
from privacy_guard import PiiType, PrivacyScanner
scanner = PrivacyScanner(extra_whitelist_names=["Erika Musterfrau"])
scanner.disable_detector(PiiType.NAME)
scanner.enable_detector(PiiType.NAME)
result = scanner.scan("Kontakt: erika@example.de")
Nur bestimmte Findings auswerten:
from privacy_guard import PiiType
secrets = [f for f in result.findings if f.pii_type == PiiType.SECRET]
for finding in secrets:
print(finding.rule_id, finding.text, finding.confidence)
REST API (Docker)
docker run -p 8000:8000 noxway/privacy-guard:latest
Alternativ via Compose:
docker compose up
Endpunkte
| Methode | Pfad | Beschreibung |
|---|---|---|
GET |
/health |
Liveness-Check |
POST |
/scan |
Vollständiger Scan (Findings + Mapping + anonymisierter Text) |
POST |
/anonymize |
Nur anonymisierten Text zurückgeben |
Request-Body
{
"text": "Hans Müller, IBAN DE89370400440532013000",
"detectors": ["IBAN", "EMAIL"],
"whitelist": ["Hans Müller"]
}
Beispiel mit curl
curl -X POST http://localhost:8000/scan \
-H "Content-Type: application/json" \
-d '{"text": "Kontakt: hans@example.de, IBAN DE89370400440532013000", "detectors": ["EMAIL", "IBAN"]}'
API-Konfiguration
| Variable | Standard | Bedeutung |
|---|---|---|
API_KEY |
leer | Wenn gesetzt, muss X-API-Key mitgesendet werden |
CORS_ORIGINS |
* |
Kommagetrennte Origins, z. B. https://app.example.com |
Beispiel:
services:
api:
image: noxway/privacy-guard:latest
ports:
- "8000:8000"
environment:
API_KEY: my-secret-key
CORS_ORIGINS: https://app.example.com
Roadmap-Ideen
- Verbesserte Entitäten-Erkennung für Adressen in DACH-Varianten
- Optionales Audit-Logging für Compliance-Reports
- Erweiterte Mehrsprachigkeit über Deutsch hinaus
Lizenz
MIT. Details in LICENSE.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file privacy_guard_scanner-1.0.2.tar.gz.
File metadata
- Download URL: privacy_guard_scanner-1.0.2.tar.gz
- Upload date:
- Size: 27.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cb95886f6cf59d4e526fcf8b47dd8d4b341e4ae922c00a885be7730eb80d9436
|
|
| MD5 |
9c941ea1723f44ba013dd8f7b9022842
|
|
| BLAKE2b-256 |
f079d6ca0081c77909a7f6d885aeaa98ecd382714eac94926db5c34f7d00ecaf
|
Provenance
The following attestation bundles were made for privacy_guard_scanner-1.0.2.tar.gz:
Publisher:
release.yml on adrian-lorenz/privacy-guard
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
privacy_guard_scanner-1.0.2.tar.gz -
Subject digest:
cb95886f6cf59d4e526fcf8b47dd8d4b341e4ae922c00a885be7730eb80d9436 - Sigstore transparency entry: 999609200
- Sigstore integration time:
-
Permalink:
adrian-lorenz/privacy-guard@7263e653109372a809d91564acf8f851a0213360 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/adrian-lorenz
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@7263e653109372a809d91564acf8f851a0213360 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file privacy_guard_scanner-1.0.2-py3-none-any.whl.
File metadata
- Download URL: privacy_guard_scanner-1.0.2-py3-none-any.whl
- Upload date:
- Size: 32.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8b80d4cad331f48d85e4dbf20d5730419fb6866299e5b2bc71b7d15a7d1a36ae
|
|
| MD5 |
262a85271768b8fa006ae62d57f947d9
|
|
| BLAKE2b-256 |
14778917cc371bc87e6ec5402f88d7f87514aa3528067c6611a649c56493746d
|
Provenance
The following attestation bundles were made for privacy_guard_scanner-1.0.2-py3-none-any.whl:
Publisher:
release.yml on adrian-lorenz/privacy-guard
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
privacy_guard_scanner-1.0.2-py3-none-any.whl -
Subject digest:
8b80d4cad331f48d85e4dbf20d5730419fb6866299e5b2bc71b7d15a7d1a36ae - Sigstore transparency entry: 999609254
- Sigstore integration time:
-
Permalink:
adrian-lorenz/privacy-guard@7263e653109372a809d91564acf8f851a0213360 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/adrian-lorenz
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@7263e653109372a809d91564acf8f851a0213360 -
Trigger Event:
workflow_dispatch
-
Statement type: