Skip to main content

Automatic PII redaction for Pydantic v2 — masks sensitive data in logs and print statements. GDPR/HIPAA-friendly.

Project description

GhostPII 👻

Automatic PII redaction for Pydantic v2 — zero-config, GDPR/HIPAA-friendly.

PyPI version Python CI License Typed

Note: This project is published on PyPI as ghost-pii-pydantic.

GhostPII solves the "Logged Secret" problem: sensitive fields (emails, SSNs, credit card numbers, API keys) leaking into logs and tracebacks. It provides a smart string proxy that automatically redacts itself in unsafe contexts (logging, print, tracebacks) while remaining fully functional for business logic, databases, and APIs.

  • Drop-in Pydantic v2 Annotated type — no middleware, no post-processing
  • Tainted memory propagation — concatenated strings stay redacted
  • Strict mode for FinTech / HealthTech / high-compliance environments
  • Works with sync and async Python services

Features

Feature Description
Auto-Magical Redaction Automatically detects print(), logging, structlog, loguru, and more.
Partial Masking Show jo***@ex***.com instead of [REDACTED] — ideal for UIs and audit logs.
Pydantic Native First-class support for Pydantic v2 Annotated types.
Strict Mode Opt-in for 100% redaction everywhere unless explicitly unmasked.
Tainted Memory Operations on PII (like concatenation) stay PII. No accidental leaks.
Context Aware unmask_pii() context manager with optional audit callback.
asyncio Safe Uses contextvars.ContextVar — isolated per thread and per async task.
pytest Plugin Built-in ghost_pii_strict fixture and --ghost-pii-strict CLI flag.
Extensible Register custom unsafe modules (OpenTelemetry, Datadog, etc.) at runtime.

Installation

pip install ghost-pii-pydantic

Quick Start

from pydantic import BaseModel, EmailStr
from ghost_pii import PII, unmask_pii

class User(BaseModel):
    name: PII[str]
    email: PII[EmailStr] # Validates as email (via Pydantic), redacts in logs

user = User(name="John Doe", email="john@example.com")

# 1. Safe by Default: Redacts in logs/prints
print(user)
# Output: name=GhostString('[REDACTED]') email=GhostString('[REDACTED]')

# 2. Functional: Works in business logic/DBs
# (Internal calls to user.email return the real string)
db.execute("INSERT INTO users VALUES (?)", [user.email])
# Successfully inserts "john@example.com"

# 3. Explicit: Use context manager for sensitive tasks
with unmask_pii():
    print(user) 
    # Output: name=GhostString('John Doe') email=GhostString('john@example.com')

Advanced Scenarios

Nested Models and Collections

GhostPII seamlessly handles nested Pydantic models and lists of PII.

from typing import List
from ghost_pii import PII

class Address(BaseModel):
    street: PII[str]
    city: str

class Organization(BaseModel):
    name: str
    admin_emails: List[PII[EmailStr]]
    headquarters: Address

org = Organization(
    name="Acme Corp",
    admin_emails=["admin@acme.com", "sec@acme.com"],
    headquarters=Address(street="123 Secret Lane", city="New York")
)

print(org.model_dump())
# Output: {
#   'name': 'Acme Corp', 
#   'admin_emails': ['[REDACTED]', '[REDACTED]'], 
#   'headquarters': {'street': '[REDACTED]', 'city': 'New York'}
# }

Tainted Memory (Concatenation)

PII "infects" any string it touches. If you combine a PII field with a normal string, the result is a new GhostString that is also redacted by default.

labeled_name = "User: " + user.name
print(labeled_name) # Output: [REDACTED]

with unmask_pii():
    print(labeled_name) # Output: User: John Doe

Partial Masking

Use masked_pii() when you need identifiable-but-safe values — customer service UIs, audit logs, support dashboards.

from ghost_pii import masked_pii, MaskStrategy

class User(BaseModel):
    email: masked_pii(EmailStr, MaskStrategy.EMAIL)   # jo***@ex***.com
    ssn:   masked_pii(str,      MaskStrategy.SSN)     # ***-**-6789
    card:  masked_pii(str,      MaskStrategy.LAST4)   # ************1111
    phone: masked_pii(str,      MaskStrategy.PHONE)   # +44*****456

user = User(email="john@example.com", ssn="123-45-6789",
            card="4111111111111111", phone="+447911123456")

print(user.email)  # jo***@ex***.com
print(user.ssn)    # ***-**-6789

with unmask_pii():
    print(user.email)  # john@example.com

Audit Hook

Pass on_access to unmask_pii() to emit a compliance trail whenever PII is deliberately exposed — required for SOC2 / GDPR audit logs.

import logging
audit = logging.getLogger("audit")

with unmask_pii(on_access=lambda: audit.info("PII accessed by service X")):
    send_email(to=str(user.email))

Extending Unsafe Modules

GhostPII covers logging, structlog, loguru, rich, print, and test runners out of the box. Add your own:

from ghost_pii import add_unsafe_module

add_unsafe_module("opentelemetry")
add_unsafe_module("datadog")

Async Support

GhostPII works transparently in async services. The unmask_pii() context manager is sync-safe and can be used inside async functions:

import asyncio
from ghost_pii import PII, unmask_pii

class UserEvent(BaseModel):
    user_id: str
    email: PII[str]

async def send_confirmation(event: UserEvent):
    # Logging is safe — email is auto-redacted
    logger.info("Sending confirmation to %s", event.email)

    with unmask_pii():
        await smtp_client.send(to=str(event.email), subject="Confirm your account")

Enterprise Strategy

GhostPII is designed to adapt to different compliance levels:

Mode Recommended For Mechanism
Auto-Magical General microservices, high developer velocity. Uses stack inspection to detect logging, print, etc.
Strict Mode FinTech, HealthTech, High-Compliance environments. Redacts everywhere. Requires explicit unmask_pii() to access data.

Enabling Strict Mode

from ghost_pii import set_strict_mode

set_strict_mode(True) # Best practice for production PII handling

pytest Plugin

GhostPII ships a built-in pytest plugin (auto-registered via pytest11 entry point).

Per-test strict mode:

def test_no_pii_in_logs(ghost_pii_strict):
    user = User(name="John Doe", email="john@example.com")
    assert str(user.email) == "[REDACTED]"   # strict: always redacted
    with unmask_pii():
        assert str(user.email) == "john@example.com"

Session-wide (CI enforcement):

pytest --ghost-pii-strict

Disable the plugin:

pytest -p no:ghost-pii

Why GhostPII vs Alternatives

GhostPII presidio scrubadub Manual field redaction
Integration model Pydantic Annotated type NLP pipeline / scrubber String scrubber Ad-hoc
Auto-redacts in logs Yes — zero config No No No
Preserves value for DB/API Yes No (destructive) No (destructive) Depends
Tainted memory propagation Yes No No No
Strict / audit mode Yes No No Manual
Setup overhead pip install + type annotation NER models, language packs Pattern config High
Best for Pydantic services, FastAPI, microservices Bulk text anonymisation Legacy string scrubbing Simple one-off cases

TL;DR: presidio and scrubadub are great for scrubbing free-text blobs. GhostPII is purpose-built for Pydantic models where you need the real value to flow through your app but never appear in logs.

Contributing

We follow strict engineering standards. Please ensure you run linters and tests before submitting PRs.

pip install -e ".[dev]"
pytest                        # run test suite
ruff check src/ghost_pii      # lint
mypy src/ghost_pii            # type-check

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Copyright (c) 2026 Sthitaprajna Sahoo and contributors.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ghost_pii_pydantic-0.2.0.tar.gz (18.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ghost_pii_pydantic-0.2.0-py3-none-any.whl (15.9 kB view details)

Uploaded Python 3

File details

Details for the file ghost_pii_pydantic-0.2.0.tar.gz.

File metadata

  • Download URL: ghost_pii_pydantic-0.2.0.tar.gz
  • Upload date:
  • Size: 18.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ghost_pii_pydantic-0.2.0.tar.gz
Algorithm Hash digest
SHA256 04568e99bf90c1a57be4f08e5926c06db6486e3a5f781ec924eaf57e6ba1f169
MD5 1e5cf7fc3c830da6e2e05a1abd945077
BLAKE2b-256 827fd74590e0376166f7b5c1a36965c02ce7fabb7ebca007887f2c9e67d85cc3

See more details on using hashes here.

Provenance

The following attestation bundles were made for ghost_pii_pydantic-0.2.0.tar.gz:

Publisher: release.yml on STHITAPRAJNAS/ghost-pii-pydantic

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ghost_pii_pydantic-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for ghost_pii_pydantic-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9e3edcd541a735b0d36c45448fdef37acef8c4bd7812f8db277bb6da4cbcc521
MD5 04cff2796242965baef35576b425ac2b
BLAKE2b-256 66fad321644d8e78375e2b8f7a32426527b5b714a80d10e80a3615ff4ba91b3a

See more details on using hashes here.

Provenance

The following attestation bundles were made for ghost_pii_pydantic-0.2.0-py3-none-any.whl:

Publisher: release.yml on STHITAPRAJNAS/ghost-pii-pydantic

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page