Skip to main content

Sensitive-data classification and masking for Python frozen dataclasses

Project description

sensitivity-mixin

PyPI Version CI License: Apache 2.0 Python Versions

Decorator-based sensitivity classification and masking for Python frozen dataclasses.

Accidentally logging sensitive data—API tokens, passwords, session IDs, PII, healthcare data (PHI), credit card numbers, secrets—is a common source of security incidents and compliance violations. sensitivity-mixin solves this by providing a lightweight @sensitive decorator and taxonomy-based classification that automatically masks sensitive fields in logs and reprs.

Why?

When you log a dataclass instance or its repr, sensitive fields leak unless you explicitly redact them everywhere:

@dataclass(frozen=True)
class APICredentials:
    user_id: int
    api_token: str

creds = APICredentials(user_id=1, api_token="sk-abc123xyz")
logger.info("Creds: %s", creds)  # logs: "Creds: APICredentials(user_id=1, api_token='sk-abc123xyz')"
                                  # OOPS! Token is exposed.

This library makes it one-line per field: mark sensitive fields with a decorator, and let the classifier introspect and mask them automatically.

from dataclasses import dataclass, field
from sensitivity_mixin import sensitive, classify

@sensitive
@dataclass(frozen=True, slots=True)
class APICredentials:
    user_id: int
    api_token: str = field(metadata={"sensitivity": "secret"})

creds = APICredentials(user_id=1, api_token="sk-abc123xyz")

# Introspect sensitivity:
profile = classify(creds)
# → SensitivityProfile(classes=(('api_token', Sensitivity.SECRET),))

# Safe for reprs / tracebacks:
logger.error("Error: %s", repr(creds))
# → "APICredentials(user_id=1, api_token=***)"

Installation

pip install sensitivity-mixin

or with uv:

uv add sensitivity-mixin

Requires Python 3.11+.

Quick Start

1. Import the decorator and classifier

from dataclasses import dataclass, field
from sensitivity_mixin import sensitive, classify, Sensitivity

2. Decorate and mark sensitive fields

Use the @sensitive decorator on a frozen dataclass and tag fields with a sensitivity taxonomy:

@sensitive
@dataclass(frozen=True, slots=True)
class User:
    id: int
    api_token: str = field(metadata={"sensitivity": "secret"})
    email: str = field(metadata={"sensitivity": "pii"})
    ssn: str = field(metadata={"sensitivity": "phi"})
    name: str

Supported sensitivity tags: "phi" (healthcare data), "pii" (personal info), "pci" (payment card data), "secret" (credentials/tokens), or omit for non-sensitive. (Alternatively, use the Sensitivity enum: Sensitivity.PHI, Sensitivity.PII, Sensitivity.PCI, Sensitivity.SECRET.)

3. Use in your code

user = User(
    id=1,
    api_token="sk-123456",
    email="alice@example.com",
    ssn="123-45-6789",
    name="Alice"
)

# Introspect sensitivity:
profile = classify(user)
print(profile.has(Sensitivity.SECRET))  # → True
print(profile.fields_of(Sensitivity.PII))  # → ('email',)

# Masked for repr (safe in tracebacks, error messages):
print(repr(user))
# → User(id=1, api_token=***, email=***, ssn=***, name='Alice')

4. Use policy-driven masking (optional)

Wire per-class policies to customize masking placeholders:

from sensitivity_mixin import SensitiveDecorator
from sensitivity_mixin.decorators.classes.secret_aware import SecretPolicyAware
from sensitivity_mixin.decorators.classes.compliance import Compliance

secret_policy = SecretPolicyAware(
    compliance=Compliance.NONE,
    detection_hints=("api_token", "secret", "token", "password"),
    placeholder="[REDACTED]"
)

decorator = SensitiveDecorator(policies=((Sensitivity.SECRET, secret_policy),))

@decorator
@dataclass(frozen=True, slots=True)
class ApiClient:
    client_id: str
    api_token: str = field(metadata={"sensitivity": "secret"})

client = ApiClient(client_id="c1", api_token="sk-secret")
print(repr(client))
# → ApiClient(client_id='c1', api_token=[REDACTED])

API Reference

@sensitive decorator

Adds a sensitivity-aware __repr__() to a frozen dataclass. Fields marked with a sensitivity tag in metadata are redacted in repr output using a default placeholder (***).

Usage:

@sensitive
@dataclass(frozen=True, slots=True)
class Patient:
    name: str
    ssn: str = field(metadata={"sensitivity": "phi"})

repr(Patient(name="Alice", ssn="123"))
# → "Patient(name='Alice', ssn=***)"

With policies:

from sensitivity_mixin import SensitiveDecorator
from sensitivity_mixin.decorators.classes.phi_aware import PhiPolicyAware
from sensitivity_mixin.decorators.classes.compliance import Compliance

phi_policy = PhiPolicyAware(
    compliance=Compliance.HIPAA,
    detection_hints=("ssn", "name"),
    placeholder="[REDACTED]"
)

decorator = SensitiveDecorator(policies=((Sensitivity.PHI, phi_policy),))

@decorator
@dataclass(frozen=True, slots=True)
class Patient:
    name: str
    ssn: str = field(metadata={"sensitivity": "phi"})

repr(Patient(name="Alice", ssn="123"))
# → "Patient(name=[REDACTED], ssn=[REDACTED])"

classify(instance) → SensitivityProfile

Introspects a dataclass and returns a SensitivityProfile documenting all sensitivity-tagged fields.

Use case: Compliance auditing, field-level sensitivity introspection

@sensitive
@dataclass(frozen=True, slots=True)
class Credentials:
    username: str
    password: str = field(metadata={"sensitivity": "secret"})
    api_key: str = field(metadata={"sensitivity": "secret"})

creds = Credentials(username="alice", password="secret", api_key="sk-123")
profile = classify(creds)
# → SensitivityProfile(classes=(('password', Sensitivity.SECRET), ('api_key', Sensitivity.SECRET)))

# Query the profile:
print(profile.has(Sensitivity.SECRET))  # → True
print(profile.fields_of(Sensitivity.SECRET))  # → ('password', 'api_key')
print(profile.sensitivity_of('username'))  # → None (unclassified)

SensitivityProfile provides:

  • classes: tuple[tuple[str, Sensitivity], ...] — field name → sensitivity mapping
  • has(kind: Sensitivity) → bool — check for a sensitivity class
  • fields_of(kind: Sensitivity) → tuple[str, ...] — get field names of a class
  • sensitivity_of(name: str) → Sensitivity | None — get the class of a field
  • is_empty → bool — whether any fields are tagged

Field Metadata

Mark a field sensitive by adding metadata={"sensitivity": "<TAG>"} to field():

from dataclasses import dataclass, field
from sensitivity_mixin import sensitive

@sensitive
@dataclass(frozen=True, slots=True)
class Credentials:
    username: str
    password: str = field(metadata={"sensitivity": "secret"})
    email: str = field(metadata={"sensitivity": "pii"})
    created_at: str  # not sensitive — no metadata needed

Supported tags:

  • "phi" — Protected Health Information (healthcare/medical records)
  • "pii" — Personally Identifiable Information (names, emails, SSNs)
  • "pci" — Payment Card Industry data (credit card numbers)
  • "secret" — API tokens, passwords, secrets
  • Omitted — non-sensitive (passes through unmasked)

Any field without metadata or with metadata={"sensitivity": None} is treated as non-sensitive and passes through unmasked.

Security Boundary: What This Does and Does NOT Protect

@sensitive is a repr-layer masking tool, not a complete confidentiality boundary. It masks sensitive fields when you log or print the object itself, but does not protect against direct field access or serialization bypass.

Protected (Repr Layer Only)

  • repr(obj) — sensitive fields masked
  • str(obj) / print(obj) — uses masked repr
  • ✓ Logging the object: logger.info("Object: %s", obj) — masked
  • ✓ F-string with object: f"Object: {obj}" — masked

NOT Protected (Bypass Methods)

  • ✗ Direct field access: obj.api_token returns the full unmasked value
  • dataclasses.asdict(obj) returns a dict with full unmasked values
  • json.dumps(asdict(obj)) contains full unmasked values in JSON
  • ✗ Logging a field directly: logger.info(f"Token: {obj.api_token}") exposes the full value
  • ✗ Attribute introspection: getattr(obj, 'api_token') returns full unmasked value
  • ✗ Untagged fields are not masked — classification is explicit/opt-in

Example: Correct and Incorrect Usage

from dataclasses import dataclass, field
from sensitivity_mixin import sensitive
import logging

logger = logging.getLogger(__name__)

@sensitive
@dataclass(frozen=True, slots=True)
class APIKey:
    name: str
    secret: str = field(metadata={"sensitivity": "secret"})

key = APIKey(name="prod-key", secret="sk-abc123xyz")

# ✓ SAFE: logging the object uses masked repr
logger.info("API Key: %s", key)
# Output: "API Key: APIKey(name='prod-key', secret=<sensitive:redacted>)"

# ✗ UNSAFE: logging a field directly bypasses the decorator
logger.warning("Secret: %s", key.secret)
# Output: "Secret: sk-abc123xyz"  ← FULL VALUE EXPOSED!

# ✗ UNSAFE: serializing with asdict() bypasses the decorator
from dataclasses import asdict
logger.debug("Data: %s", asdict(key))
# Output: "Data: {'name': 'prod-key', 'secret': 'sk-abc123xyz'}"  ← FULL VALUES EXPOSED!

Use case: @sensitive is ideal for DTOs at the logging boundary. Keep sensitive fields wrapped in the dataclass; avoid field-level logging. For applications requiring stronger confidentiality guarantees, apply field-level masking at the serialization boundary or use dedicated encryption libraries.

Logging Integration

Pair with standard library logging for clean, safe logs:

import logging
from dataclasses import dataclass, field
from sensitivity_mixin import sensitive

logger = logging.getLogger(__name__)

@sensitive
@dataclass(frozen=True, slots=True)
class LoginAttempt:
    username: str
    password: str = field(metadata={"sensitivity": "secret"})
    ip_address: str

def handle_login(username, password, ip):
    attempt = LoginAttempt(username=username, password=password, ip_address=ip)
    logger.info("Login attempt: %s", repr(attempt))
    # Logs: "LoginAttempt(username='alice', password=<sensitive:redacted>, ip_address='192.168.1.1')"

Mask Strategies

By default, @sensitive masks all sensitive fields with *** (DEFAULT_PLACEHOLDER).

For customized masking, instantiate policy value objects and wire them into SensitiveDecorator:

from sensitivity_mixin import Sensitivity, SensitiveDecorator
from sensitivity_mixin.decorators.classes.secret_aware import SecretPolicyAware
from sensitivity_mixin.decorators.classes.compliance import Compliance

secret_policy = SecretPolicyAware(
    compliance=Compliance.NONE,
    detection_hints=("api_key", "secret", "token"),
    placeholder="***REDACTED***"
)

decorator = SensitiveDecorator(policies=((Sensitivity.SECRET, secret_policy),))

@decorator
@dataclass(frozen=True, slots=True)
class Config:
    api_key: str = field(metadata={"sensitivity": "secret"})

repr(Config(api_key="sk-123"))
# → "Config(api_key=***REDACTED***)"

See docs/apps/decorators/policies.md for policy customization details.

Migration from Earlier Versions

v0.3.0 introduces a taxonomy-driven architecture with broadened sensitivity classification.

Earlier versions (v0.1, v0.2)

from pii_aware_mixin import phi_aware

@phi_aware
@dataclass(frozen=True, slots=True)
class User:
    id: int
    api_token: str = field(metadata={"phi": True})

v0.3.0 (current)

from sensitivity_mixin import sensitive, classify

@sensitive
@dataclass(frozen=True, slots=True)
class User:
    id: int
    api_token: str = field(metadata={"sensitivity": "secret"})
    email: str = field(metadata={"sensitivity": "pii"})

profile = classify(user)  # introspect sensitivity

Key improvements:

  • Broadened taxonomy: PHI, PII, PCI, SECRET (not just phi)
  • Classification introspection: classify() returns a SensitivityProfile
  • Per-class policy value objects for specialized masking customization
  • Foundation for compliance-aware field governance

Design Principles

  • Decorator-based: Simple, non-intrusive. Works on plain frozen dataclasses.
  • Taxonomy-driven: Classify sensitivity at the field level: PHI, PII, PCI, or SECRET.
  • Introspectable: classify() exposes field-level sensitivity for compliance audits.
  • Type-safe: Works with frozen dataclasses, slots, type hints.
  • Zero-cost: Minimal introspection overhead at decoration time.
  • Canonical: Compatible with "no mixin inheritance on data DTOs" pattern.

License

Apache 2.0 — see LICENSE file.

Contributing

This library is maintained by James Ekhator. Contributions welcome via pull requests.

See Also

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sensitivity_mixin-0.3.1.tar.gz (79.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sensitivity_mixin-0.3.1-py3-none-any.whl (21.0 kB view details)

Uploaded Python 3

File details

Details for the file sensitivity_mixin-0.3.1.tar.gz.

File metadata

  • Download URL: sensitivity_mixin-0.3.1.tar.gz
  • Upload date:
  • Size: 79.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.3 {"installer":{"name":"uv","version":"0.11.3","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Amazon Linux","version":"2023","id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for sensitivity_mixin-0.3.1.tar.gz
Algorithm Hash digest
SHA256 a4c54819dce9d52f28a2ced2b5dfee55f7fe33300229f03982d774265b3475e0
MD5 31173c97658d2b76beed33c673586657
BLAKE2b-256 7b1534f40d1a1c45756e54e219f1f7af2501d29b2ecaa6d902b5d2df8842c6a5

See more details on using hashes here.

File details

Details for the file sensitivity_mixin-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: sensitivity_mixin-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 21.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.3 {"installer":{"name":"uv","version":"0.11.3","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Amazon Linux","version":"2023","id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for sensitivity_mixin-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 26142a5452c0320c3e18a30dc6fbf5c3277c524b2d192cfca0b45f62f6a29e65
MD5 6cc2e07f5fbcb37632223ee3efd0e2ff
BLAKE2b-256 7ffd34b38c07a9f00d348492404691e2b663ac855214570dcc6eb531fdde6e7a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page