Skip to main content

DSGVO-konformes Erkennen und Ersetzen von PII in LLM-Prompts

Project description

privacy-guard

DSGVO/GDPR-konformes Erkennen und Ersetzen von personenbezogenen Daten (PII) in Text — regel- und musterbasiert, kein ML-Inference zur Laufzeit.

Designed für den Einsatz vor LLM-Prompts: sensitive Daten werden durch stabile, umkehrbare Platzhalter ersetzt und können nach der LLM-Antwort wiederhergestellt werden.

Features

Typ Beispiele
Namen „Hans Müller", „Dr. Anna Schmidt" (via spaCy NER)
IBAN DE89 3704 0044 0532 0130 00 (+ ISO 7064 Prüfsumme)
Telefon +49 89 12345678, 0800 123456
E-Mail kontakt@example.de
Adresse Hauptstraße 12, 79100 Freiburg
Secrets API-Keys, Tokens, Passwörter (122 Muster)
  • Personen des öffentlichen Lebens (Politiker, CEOs, Prominente) werden nicht maskiert
  • Gleicher Originaltext → gleicher Platzhalter (Deduplication)
  • ScanResult.restore() ersetzt Platzhalter zurück in den LLM-Output

Installation

pip install privacy-guard

Der Namens-Detektor benötigt zusätzlich ein spaCy-Modell:

pip install "de_core_news_sm @ https://github.com/explosion/spacy-models/releases/download/de_core_news_sm-3.8.0/de_core_news_sm-3.8.0-py3-none-any.whl"
# oder:
python -m spacy download de_core_news_sm

Alle anderen Detektoren (IBAN, Telefon, E-Mail, Adresse, Secrets) funktionieren ohne das Modell.

Schnellstart

from privacy_guard import PrivacyScanner

scanner = PrivacyScanner()

result = scanner.scan(
    "Bitte überweise 500 € an Hans Müller, IBAN DE89 3704 0044 0532 0130 00. "
    "Rückfragen an h.mueller@example.de oder +49 89 123456."
)

print(result.anonymised_text)
# → "Bitte überweise 500 € an [NAME_1], IBAN [IBAN_1]. Rückfragen an [EMAIL_1] oder [PHONE_1]."

print(result.mapping)
# → {"[NAME_1]": "Hans Müller", "[IBAN_1]": "DE89 3704 0044 0532 0130 00", ...}

# LLM-Antwort wiederherstellen
llm_response = "Vielen Dank, [NAME_1]! Ihre Überweisung von [IBAN_1] wurde verarbeitet."
print(result.restore(llm_response))
# → "Vielen Dank, Hans Müller! Ihre Überweisung von DE89 3704 0044 0532 0130 00 wurde verarbeitet."

Detektoren einzeln steuern

from privacy_guard import PrivacyScanner, PiiType

scanner = PrivacyScanner()
scanner.disable_detector(PiiType.NAME)   # Namens-Detektor deaktivieren
scanner.enable_detector(PiiType.NAME)    # wieder aktivieren

# Eigene Whitelist-Einträge (werden nicht maskiert)
scanner = PrivacyScanner(extra_whitelist_names=["Erika Musterfrau"])

Nur bestimmte Findings auswerten

secrets = [f for f in result.findings if f.pii_type == PiiType.SECRET]
for s in secrets:
    print(f"  {s.rule_id}: {s.text!r}  (confidence={s.confidence})")

Lizenz

MIT — siehe LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

privacy_guard_scanner-0.1.0.tar.gz (26.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

privacy_guard_scanner-0.1.0-py3-none-any.whl (31.2 kB view details)

Uploaded Python 3

File details

Details for the file privacy_guard_scanner-0.1.0.tar.gz.

File metadata

  • Download URL: privacy_guard_scanner-0.1.0.tar.gz
  • Upload date:
  • Size: 26.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for privacy_guard_scanner-0.1.0.tar.gz
Algorithm Hash digest
SHA256 6d328a0d435366dbb5f89ea3c2f20b3db36fe0aa7f9dd679e55a50f9288f561c
MD5 bf18b8d30e118404880cfc3b8e33ebfe
BLAKE2b-256 f9e6b8ef670720c75437fa995200e37e599bfd8bc4e3f225c9759936fcb38281

See more details on using hashes here.

Provenance

The following attestation bundles were made for privacy_guard_scanner-0.1.0.tar.gz:

Publisher: release.yml on adrian-lorenz/privacy-guard

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file privacy_guard_scanner-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for privacy_guard_scanner-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 048ebe06559ff46a7f6856b2f3e10c3166b83abb7379f6552d2f79792de1b2ef
MD5 62efc0a8785ac9a3ec65d6e5cef4dbcb
BLAKE2b-256 e97e1b8e417a867cf6b1f21901d28c951d0dff4dda685d0bed0a1ded8715415f

See more details on using hashes here.

Provenance

The following attestation bundles were made for privacy_guard_scanner-0.1.0-py3-none-any.whl:

Publisher: release.yml on adrian-lorenz/privacy-guard

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page