Skip to main content

Utility to remove or replace sensitive data from complex structures.

Project description

Sanitary

Sanitary is a simple utility that can remove/mask sensitive information, such as PII, from any data structure. It also includes a Structlog-compatible processor to clean up structured log messages.

It will automatically mask information marked as sensitive. By default, the masked data is replaced by a generic string, which can be configured to use a hashing function instead.

Installation

Sanitizer needs to be installed like any other Python package:

> pip install sanitary

Base Usage

The first step is to instantiate a Sanitizer object:

>>> from sanitary import Sanitizer
>>> sanitizer = Sanitizer(keys={"foo", "bar"})
>>> sanitizer.sanitize({"foo": 123, "bar": "abc", "baz": "boom"})
{"foo": "********", "bar": "********", "baz": "boom"}

Configuration

The Sanitizer class accepts the following arguments:

  • keys: An iterator of key names that will be searched for recursively. Any of these keys will have its value replaced by the replacement value.
  • patterns: An iterator of regular expression patterns that will be used to search the textual values. A value that matches any of the patterns will be entirely replaced by the message value.
  • replacement: Can be any of the following types of values:
    1. A plain text, which will simply replace the sensitive value.
    2. A callable which takes a string as its single argument and returns another string, which will replace the value.
    3. A callable which takes a bytes object as its single argument and returns a "hash object"; this allows using the hashlib functions to mask the data.
  • message: The textual message which will replace the value that matches any of the defined patterns.

Data Hashing

If the replacement argument is a callable, the value of a corresponding sensitive key will be replaced with the return value of the callable (or its hexdigest). This way, the sanitized data can still be tracked (e.g. an email address will always have the same hash value) without exposing the actual value.

>>> import hashlib
>>> from sanitary import Sanitizer
>>> sanitizer = Sanitizer(keys={"password", "email"}, replacement=hashlib.sha256)
>>> sanitizer.sanitize({"event": "clean password", "password": "blabla", "foo": {"Email": "test@example.com"}})
{
    'event': 'clean password',
    'password': 'ccadd99b16cd3d200c22d6db45d8b6630ef3d936767127347ec8a76ab992c2ea',
    'foo': {'Email': '973dfe463ec85785f5f95af5ba3906eedb2d931c24e69824a89ea65dba4e813b'}
}
>>>

Sensitive Text Values

Sanitizer can also clean up any text values that match specific regular expression patterns; any such value is completely replaced with a hardcoded warning message.

>>> from sanitary import Sanitizer
>>> sanitizer = Sanitizer(patterns={r"""'Authentication':"""})
>>> sanitizer.sanitize("'Authentication': 1234")
"#### WARNING: Message replaced due to sensitive pattern: 'Authentication':"
>>> sanitizer.sanitize({"example": "'Authentication': 1234"})
{'example': "#### WARNING: Message replaced due to sensitive pattern: 'Authentication':"}
>>>

Structlog Processor

The special subclass, StructlogSanitizer, is provided to enable sanitizing the logging context managed by the structlog library. It needs to be instantiated and added to the list of configured processors:

import hashlib
import structlog
from sanitary import StructlogSanitizer

structlog.configure(
    processors=[
        StructlogSanitizer(keys={"foo", "bar", "baz"}, replacement=hashlib.sha256), 
        structlog.processors.JSONRenderer()
    ],
    logger_factory=structlog.stdlib.LoggerFactory(),
)

*[PII]: Personally Identifiable Information

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sanitary-0.1.0.tar.gz (14.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sanitary-0.1.0-py3-none-any.whl (6.7 kB view details)

Uploaded Python 3

File details

Details for the file sanitary-0.1.0.tar.gz.

File metadata

  • Download URL: sanitary-0.1.0.tar.gz
  • Upload date:
  • Size: 14.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.5

File hashes

Hashes for sanitary-0.1.0.tar.gz
Algorithm Hash digest
SHA256 b3cb8a9e368509a4c01addc5dd7f3f97c67735aa2ad14c90054ac6f63a1d23df
MD5 b6b49e20084c0cda176509175838d1b7
BLAKE2b-256 b01090aae1a55fea31a21ac0466f9ff4a351604977a3aa8438f68d67fc01f719

See more details on using hashes here.

File details

Details for the file sanitary-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: sanitary-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 6.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.5

File hashes

Hashes for sanitary-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 bf892d679c6a79ee2c25c03597cda2046bb9edff9b692ee2e0670819cbe5836b
MD5 5e517e4bdb817cf58fb48667a0e705dd
BLAKE2b-256 5942ebd115d683281b2650f32a936a108fd8f018655665df115b9252523b8e2c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page