Skip to main content

DSGVO-konformes Erkennen und Ersetzen von PII in LLM-Prompts

Project description

privacy-guard

PyPI Python License: MIT Tests PyPI Publish Docker Publish Docker Hub

GDPR/DSGVO-compliant PII anonymisation for LLM workflows.

privacy-guard reliably detects personal data in German-language text, replaces it with stable placeholders, and enables clean restoration after processing. No ML-inference overhead at runtime for most detectors — clear results, API-ready.

privacy-guard hero

Highlights

  • 🔒 Compliance-first: protect sensitive data before it reaches external LLMs
  • ⚡ Runtime-friendly: regex/rule-based detectors without a heavy inference stack
  • 🔁 Deterministic: stable placeholders plus lossless restoration
  • 🐳 Deploy-ready: Python package and FastAPI/Docker available out of the box

Why privacy-guard?

  • Protects sensitive data before sending it to external models
  • Replaces PII with deterministic placeholders such as [NAME_1], [IBAN_1]
  • Restores original values via ScanResult.restore()
  • Resolves overlapping matches with priority logic (e.g. SECRET > IBAN > SOCIAL_SECURITY > EMAIL > …)
  • Supports Python-package and FastAPI/Docker operation

Detected PII Types

Type Example Method
NAME Dr. Anna Schmidt spaCy NER (de_core_news_sm)
IBAN DE89 3704 0044 0532 0130 00 Regex + ISO 7064 check digit
CREDIT_CARD 4111 1111 1111 1111 Regex + Luhn algorithm
PERSONAL_ID C22990047 Regex — Personalausweis & Reisepass (same format)
SOCIAL_SECURITY 12 345678 X 123 Regex — Rentenversicherungsnummer
TAX_ID 12 345 678 903 Regex + mod-11 check digit (§ 139b AO)
PHONE +49 89 12345678 Regex — DACH formats
EMAIL kontakt@example.de Regex
ADDRESS Hauptstraße 12, 79100 Freiburg Regex built from data files
SECRET AWS key, GitHub PAT, … 100+ pattern rules (TOML)
URL_SECRET ?token=abc123def456 Regex — query parameter values

Overlap priority: SECRET = URL_SECRET > IBAN = CREDIT_CARD = SOCIAL_SECURITY > PERSONAL_ID = TAX_ID = EMAIL > PHONE > ADDRESS > NAME

Public figures are excluded from masking by default via an internal whitelist (~1,000 entries).

Installation

Python Package

pip install privacy-guard-scanner

The name detector requires a spaCy model:

pip install "de_core_news_sm @ https://github.com/explosion/spacy-models/releases/download/de_core_news_sm-3.8.0/de_core_news_sm-3.8.0-py3-none-any.whl"
# or:
python -m spacy download de_core_news_sm

API Stack (local)

pip install -e ".[api]"
uvicorn api.main:app --reload --port 8000

Quickstart (Python)

from privacy_guard import PrivacyScanner

scanner = PrivacyScanner()

result = scanner.scan(
    "Bitte überweise 500 EUR an Hans Müller, "
    "IBAN DE89 3704 0044 0532 0130 00. "
    "Rückfragen an h.mueller@example.de oder +49 89 123456."
)

print(result.anonymised_text)
# Bitte überweise 500 EUR an [NAME_1], IBAN [IBAN_1]. Rückfragen an [EMAIL_1] oder [PHONE_1].

print(result.mapping)
# {'[NAME_1]': 'Hans Müller', '[IBAN_1]': 'DE89 3704 0044 0532 0130 00', ...}

llm_answer = "Vielen Dank, [NAME_1]. Die Daten zu [IBAN_1] sind verarbeitet."
print(result.restore(llm_answer))
# Vielen Dank, Hans Müller. Die Daten zu DE89 3704 0044 0532 0130 00 sind verarbeitet.

Configuring the Scanner

from privacy_guard import PiiType, PrivacyScanner

scanner = PrivacyScanner(extra_whitelist_names=["Erika Musterfrau"])
scanner.disable_detector(PiiType.NAME)
scanner.enable_detector(PiiType.NAME)

result = scanner.scan("Contact: erika@example.de")

Filtering specific findings:

from privacy_guard import PiiType

secrets = [f for f in result.findings if f.pii_type == PiiType.SECRET]
for finding in secrets:
    print(finding.rule_id, finding.text, finding.confidence)

Web UI

The API server includes a built-in HTMX interface — no separate process, no CDN dependencies.

img.png

uvicorn api.main:app --reload
# → http://localhost:8000

Login

An admin account with password admin is created by default (change via UI_ADMIN_PASSWORD). After login three tabs are available:

Tab Description
Live Test Enter text, select detectors, run a scan — view original and anonymised text side by side
History All your own scans (admins see all users); click a row to see finding details
Dashboard Overall statistics, PII-type bar chart, scans-per-day line chart (Chart.js)

Admins additionally see the API Keys tab.

API Key Management (Admin)

Use the 🔑 API Keys tab to create and revoke any number of API keys:

  1. Enter a name → Generate key
  2. Copy the full key (pg_…) — it is shown only once
  3. Only the SHA-256 hash is stored; the prefix (pg_xxxxxxxxx…) remains visible
  4. Keys can be revoked individually at any time

The key set via the API_KEY environment variable remains valid in parallel (backwards compatibility).

REST API (Docker)

docker run -p 8000:8000 noxway/privacy-guard:latest

Or via Compose:

docker compose up

Endpoints

Method Path Description
GET /health Liveness check
POST /scan Full scan (findings + mapping + anonymised text)
POST /anonymize Return anonymised text only

Request Body

{
  "text": "Hans Müller, IBAN DE89370400440532013000",
  "detectors": ["IBAN", "EMAIL"],
  "whitelist": ["Hans Müller"]
}

Example with curl

curl -X POST http://localhost:8000/scan \
  -H "Content-Type: application/json" \
  -d '{"text": "Contact: hans@example.de, IBAN DE89370400440532013000", "detectors": ["EMAIL", "IBAN"]}'

With API key authentication:

curl -X POST http://localhost:8000/scan \
  -H "Content-Type: application/json" \
  -H "X-API-Key: pg_…" \
  -d '{"text": "hans@example.de"}'

Configuration

Variable Default Description
API_KEY empty If set, X-API-Key must be sent with every request (env-var key or DB key)
CORS_ORIGINS * Comma-separated origins, e.g. https://app.example.com
UI_DB_PATH ui.db Path to the SQLite database (users, scans, API keys)
UI_ADMIN_PASSWORD admin Password for the automatically created admin account

Example:

services:
  api:
    image: noxway/privacy-guard:latest
    ports:
      - "8000:8000"
    environment:
      API_KEY: my-secret-key
      CORS_ORIGINS: https://app.example.com
      UI_DB_PATH: /data/ui.db
      UI_ADMIN_PASSWORD: secure123
    volumes:
      - ui_data:/data

volumes:
  ui_data:

Roadmap Ideas

  • Improved entity recognition for DACH address variants
  • Optional audit logging for compliance reports
  • Extended multilingual support beyond German
  • Check-digit validation for Personalausweis/Reisepass

License

MIT. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

privacy_guard_scanner-1.0.7.tar.gz (30.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

privacy_guard_scanner-1.0.7-py3-none-any.whl (37.7 kB view details)

Uploaded Python 3

File details

Details for the file privacy_guard_scanner-1.0.7.tar.gz.

File metadata

  • Download URL: privacy_guard_scanner-1.0.7.tar.gz
  • Upload date:
  • Size: 30.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for privacy_guard_scanner-1.0.7.tar.gz
Algorithm Hash digest
SHA256 d97ab20bfdf7a8ea709c5a204162f7aaea822e2c608334104291808c704b81d5
MD5 f651d58999f0d7d402a17ab310ec63c8
BLAKE2b-256 1ea5dfed4658b7d594700b582ebe5369406a8f8f088f25276f9c845fb6731bd0

See more details on using hashes here.

Provenance

The following attestation bundles were made for privacy_guard_scanner-1.0.7.tar.gz:

Publisher: release.yml on adrian-lorenz/privacy-guard

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file privacy_guard_scanner-1.0.7-py3-none-any.whl.

File metadata

File hashes

Hashes for privacy_guard_scanner-1.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 73992e1d4c9837ffb920c94ebcc60330d302bfa6b8d80a6474098b74de1aa3d3
MD5 54a35733898cdfd1a3d1b6762b785e2e
BLAKE2b-256 02e05b2eb1600794599946966ed5125f45777646fa4cd0a28911f0e3b1314803

See more details on using hashes here.

Provenance

The following attestation bundles were made for privacy_guard_scanner-1.0.7-py3-none-any.whl:

Publisher: release.yml on adrian-lorenz/privacy-guard

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page