Skip to main content

Redact data in logs based on regex filters and keys

Project description

Logging Redactor

PyPI version Supported Python versions

Logging Redactor is a Python library designed to redact sensitive data in logs based on regex mask_patterns or dictionary keys. It supports JSON logging formats and handles nested data at the message level, at the positional argument level and also in the extra keyword argument.

Installation

You can install Logging Redactor via pip:

pip install loggingredactor

Illustrative Examples

Below is a basic example that illustrates how to redact any digits in a logger message:

import re
import logging
import loggingredactor

# Create a logger
logger = logging.getLogger()
# Add the redact filter to the logger with your custom filters
redact_mask_patterns = [re.compile(r'\d+')]

# if no `mask` is passed in, 4 asterisks will be used
logger.addFilter(loggingredactor.RedactingFilter(redact_mask_patterns, mask='xx'))

logger.warning("This is a test 123...")
# Output: This is a test xx...

Python only applies the filter on that logger, so any other files using logging will not get the filter applied. To have this filter applied to all loggers do the following

import re
import logging
import loggingredactor
from pythonjsonlogger import jsonlogger

# Create a pattern to hide api key in url. This uses a _Positive Lookbehind_
redact_mask_patterns = [re.compile(r'(?<=api_key=)[\w-]+')]

# Override the logging handler that you want redacted
class RedactStreamHandler(logging.StreamHandler):
    def __init__(self, *args, **kwargs):
        logging.StreamHandler.__init__(self, *args, **kwargs)
        self.addFilter(loggingredactor.RedactingFilter(redact_mask_patterns))

root_logger = logging.getLogger()

sys_stream = RedactStreamHandler()
# Also set the formatter to use json, this is optional and all nested keys will get redacted too
sys_stream.setFormatter(jsonlogger.JsonFormatter('%(name)s %(message)s'))
root_logger.addHandler(sys_stream)

logger = logging.getLogger(__name__)

logger.error("Request Failed", extra={'url': 'https://example.com?api_key=my-secret-key'})
# Output: {"name": "__main__", "message": "Request Failed", "url": "https://example.com?api_key=****"}

You can also redact by dictionary keys, rather than by regex, in cases where certain fields should always be redacted. To achieve this, you can provide any iterable representing the keys that you would like to redact on. An example is shown below (this time with a different default mask):

import re
import logging
import loggingredactor
from pythonjsonlogger import jsonlogger

# This list now contains all the dictioanry keys that will have their values redacted in the logger object
redact_keys = ['email', 'password']

# Override the logging handler that you want redacted
class RedactStreamHandler(logging.StreamHandler):
    def __init__(self, *args, **kwargs):
        logging.StreamHandler.__init__(self, *args, **kwargs)
        self.addFilter(loggingredactor.RedactingFilter(mask='REDACTED', mask_keys=redact_keys))

root_logger = logging.getLogger()

sys_stream = RedactStreamHandler()
# Also set the formatter to use json, this is optional and all nested keys will get redacted too
sys_stream.setFormatter(jsonlogger.JsonFormatter('%(name)s %(message)s'))
root_logger.addHandler(sys_stream)

logger = logging.getLogger(__name__)

logger.warning("User %(firstname)s with email: %(email)s and password: %(password)s bought some food!", {'firstname': 'Arman', 'email': 'arman_jasuja@yahoo.com', 'password': '1234567'})
# Output: {"name": "__main__", "message": "User Arman with email: REDACTED and password: REDACTED bought some food"}

The above example also illustrates the logger redacting positional arguments provided to the message.

Integrating with already built logger configs

Logging Redactor also integrates quite well with already created logging configurations, for example, say you have your logging config set up in the following format:

import re
import logging.config
... # Other imports
LOGGING = {
    ... # Your other configs
    'filters':{ 
        ... # Some configs
        'pii': {
            '()': 'loggingredactor.RedactingFilter',
            'mask_keys': ('password', 'email', 'last_name', 'first_name', 'gender', 'lastname', 'firstname',),
            'mask_patterns': (re.compile(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b'), ) # email regex
            'mask': 'REDACTED',
        },
        ... # Some other configs
    }
    'handlers': {
        ... # Some handlers
        'stdout': {
            ... # Some configs
            'filters': ['pii', ...],
        },
        ... # Other handlers (add pii as a filter to all the ones where you want the appropriate information to be redacted)
    }
    ... # Rest of your configs
}

logging.config.dictConfig(LOGGING)
... # Use your logger as normal, the redaction will now be applied.

The essence boils down to adding the RedactingFilter to your logging config, and to the filters section of the associated handlers to which you want to apply the redaction.

Release Notes - v0.0.6:

Improvements and Changes

  • Allow redaction of any generic mapping type, including:
    1. dict
    2. collections.OrderedDict
    3. frozendict.frozendict
    4. collections.ChainMap
    5. types.MappingProxyType
    6. collections.UserDict and any other mapping class that inherits from collections.Mapping

Bug Fixes

  • Fix bug that was converting non-string data types to strings. (Reported in issue #7)

A Note about the Motivation behind Logging Redactor:

Logging Redactor started as a fork of logredactor. However, due to the bugs present in the original (specifically the data mutations), it was not usable in production environments where the purpose was to only redact variables in the logs, not in their usage in the code. This, along with the fact that the original package is no longer maintained lead to the creation of Logging Redactor.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

loggingredactor-0.0.6.tar.gz (5.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

loggingredactor-0.0.6-py3-none-any.whl (5.6 kB view details)

Uploaded Python 3

File details

Details for the file loggingredactor-0.0.6.tar.gz.

File metadata

  • Download URL: loggingredactor-0.0.6.tar.gz
  • Upload date:
  • Size: 5.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for loggingredactor-0.0.6.tar.gz
Algorithm Hash digest
SHA256 dee02bf684ab03f841d17e8afccaaedf6d79bc9099458c6c830537fd6f28388d
MD5 05f93175070efb2a6799d3cf4f05acef
BLAKE2b-256 1de73b7f75ef38eb0c889258e501e35423be05e5c09ab4fb40d36a25699000dd

See more details on using hashes here.

File details

Details for the file loggingredactor-0.0.6-py3-none-any.whl.

File metadata

File hashes

Hashes for loggingredactor-0.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 3fddce91e5719a03548968ddcd07eaa462148bfffabf579c7d856497e86b741b
MD5 51cfbb76156fc2e0a51733d5d8cebb4d
BLAKE2b-256 7a825817041ee7b77e4aa4fa0857eff4581d394e852006b84c92e4cf81b970d9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page