A simple, fast, and configurable URL sensitive data filter

These details have not been verified by PyPI

Project links

Project description

filter-url

A simple, fast, and configurable Python utility to censor sensitive data (passwords, API keys, tokens) from URLs, making them safe for logging, monitoring, and debugging.

Key Features

Comprehensive Censoring: Censors passwords in userinfo (user:[...]@host), query parameter values, and parts of the URL path.
Flexible Rules: Filter query parameters by exact key names or by powerful regular expressions.
Advanced Path Filtering: Use regex with named capture groups to censor specific dynamic parts of a URL path while leaving the rest intact.
Order Preserving: Guarantees that the order of query parameters in the output is identical to the input.
Logging Integration: Provides a ready-to-use logging.Filter subclass for seamless integration into your application's logging setup.
Lightweight: Zero external dependencies.

Installation

pip install filter-url

Quick Start

The quickest way to use the library is the standalone filter_url() function, which uses a default set of rules to catch common sensitive keys.

from filter_url import filter_url

dirty_url = "https://user:my-secret-password@example.com/data?token=abc-123-xyz"

# Use the function with default filters
clean_url = filter_url(dirty_url)

print(clean_url)
# >> https://user:[...]@example.com/data?token=[...]

Usage & Examples

Basic Filtering (Standalone Function)

The filter_url() function is great for one-off tasks. You can pass your own filtering rules directly to it. If a rule is not provided, a sensible default is used.

from filter_url import filter_url

# Define custom rules
custom_path_re = r'/user/(?P<user_id>\d+)/profile'

dirty_url = "https://example.com/user/123456/profile?credit_card_number=5555"

# Censor using a custom path regex
clean_url = filter_url(
    url=dirty_url,
    bad_path_re=custom_path_re
)

print(clean_url)
# >> https://example.com/user/[...]/profile?credit_card_number=5555

Advanced: Using the `FilterURL` Class for Performance

When you need to filter a large number of URLs with the same configuration, it's much more efficient to instantiate the FilterURL class once. This pre-compiles the regular expressions and avoids redundant work in a loop.

from filter_url import FilterURL

# Create the filter instance ONCE with your custom rules.
# The regexes are compiled here.
my_filter = FilterURL(
    bad_keys={'api_key'},
    bad_keys_re=[r'session']
)

urls_to_process = [
    "https://service.com/api?api_key=key-1",
    "https://service.com/api?user_session=sess-2",
    "https://service.com/api?id=3"
]

# Reuse the same instance in a loop for high performance
clean_urls = [my_filter.remove_sensitive(url) for url in urls_to_process]

# clean_urls will be:
# [
#   'https://service.com/api?api_key=[...]',
#   'https://service.com/api?user_session=[...]',
#   'https://service.com/api?id=3'
# ]

The class has an internal cache for filtered URLs, you can tune it or turn it off completely with the parameter cache_size (see API description below)

Integration with Python's `logging` Module

This is the most powerful feature for real-world applications. The URLFilter automatically censors URLs in your logs. The filter works in two ways:

(Preferred) It looks for a url key in the extra dictionary of your logging call.
(Fallback) If fallback=True (the default), it searches for URLs in the positional arguments of the log message.

    import logging
    import sys
    from filter_url import URLFilter

    # 1. Configure a logger

    logger = logging.getLogger('my_app')
    logger.setLevel(logging.INFO)
    if logger.hasHandlers():
        logger.handlers.clear()

    # 2. Simply add our filter. Let's use custom rules for this example

    custom_filter = URLFilter(
        bad_keys={'access_token'},
        fallback=True # Default, but shown for clarity
    )
    logger.addFilter(custom_filter)

    # 3. Use a standard Formatter. No special formatter is needed

    handler = logging.StreamHandler(sys.stdout)
    formatter = logging.Formatter('%(levelname)s: %(message)s')
    handler.setFormatter(formatter)
    logger.addHandler(handler)

    # --- Usage Examples ---

    # Case 1: (Preferred) Pass the URL via 'extra'

    logger.info(
        "User login attempt failed",
        extra={'url': "<https://auth.service.com/login?access_token=12345"}>
    )

    # Case 2: (Fallback) The URL is an argument in the message string

    logger.info(
        "API call to %s resulted in a 404 error.",
        "<https://api.service.com/data/v1/user?password=abc>"
    )

    # Case 3: No URL in the message. Nothing extra is added

    logger.info("Application started successfully.")

Be aware of a minor trade-off between using a filter for the logging module and the FilterURL class. Provided each URL is only output once, then a filter for logging is the perfect solution: it will make your code much more straightforward and cleaner. When processing URLs and outputting them multiple times during different stages, prepare them in advance using the FilterURL class to save CPU cycles. The filtered URTs are stored in the internal cache inside FilterURL to mitigate this difference. However, it can still be notable under load.

Expected Output:

INFO: User login attempt failed | (URL data: https://auth.service.com/login?access_token=[...])
INFO: API call to https://api.service.com/data/v1/user?password=[...] was made. | (URL data: https://api.service.com/data/v1/user?password=[...])
INFO: Application started successfully.

Corner Cases & Considerations

Log String vs. Valid URL: The primary goal of this library is to produce a human-readable, safe string for logging. The output string containing [...] in the userinfo (password) section is not a valid URL according to RFC standards and may fail if you try to parse it again with urllib.parse.
Performance: For filtering a large number of URLs, always instantiate the FilterURL class once and reuse the instance. The standalone filter_url() function re-compiles regexes on every call and is less performant for batch jobs.
Logging Filter Precedence: When using URLFilter, providing a URL in the extra dictionary is always the preferred method. The fallback search will only trigger if a url key is not found in extra. Also, using fallback option needs extra CPU cycles, which may be unwanted.

API Reference

filter_url(url, censored, bad_keys, bad_keys_re, bad_path_re): A standalone function for one-off URL censoring.
- url:str - (required) an URL to 'censor'
- censored:str - (optional) a placeholder to use insted aof redacted parts, '[...]' by default
- bad_keys:list: - (optional) a list of keys in the HTTP method GET that may contain a sensitive data. Default:
[ "password", "token", "key", "secret", "auth", "apikey", "credentials", ]
- bad_keys_re:list: - (optional) a list of regexs matching keys in the HTTP method GET that may contain a sensitive data. Default:
[ r"session", r"csrf", r"._secret", r"._token", r".*_key", ]
- bad_path_re:str: - (optional) a regex to match a path port of the URL, each defined group in it will be redacted. Default: None. Examples:
custom_path_re_named = r"/api/v1/(?P<api_key>[^/]+)/resource" custom_path_re_simple = r"(?<=/user/)\d+(?=/delete)"
FilterURL(bad_keys, bad_keys_re, bad_path_re, cache_size): A class that holds a compiled filter configuration for efficient, repeated use. Meaning of bad_keys:list, bad_keys_re:list, bad_path_re:str and their defaults are the same as for filter_url() (see above)
- cache_size:int - (optional) Size of the cache to keep filtered URLs, 0 or None means no caching. Default: 512
- .remove_sensitive(url, censored): The method that performs the censoring.
  - censored:str - (optional) a placeholder to use insted aof redacted parts, '[...]' by default
URLFilter(bad_keys, bad_keys_re, bad_path_re, fmt, url_filter_instance, fallback, cache_size, name): A logging.Filter subclass for easy integration with Python's logging module.
- bad_keys:list, bad_keys_re:list, bad_path_re:str are the same as for filter_url() (see above)
- fmt:str - (optional) Format to add an filtered URL into the log message, default: ' | (URL={filtered_url})' ({filtered_url} will be replaced with your filtered URL)
- url_filter_instance:FilterURL - (optional) Pre-configured instance of FilterURL-like class to use for filtering. Default: None (will be created by the filter)
- fallback:bool - (optional) Do we look for URL in the text when URL is not specified explicitly with extra={'url':...}? Default: True
- cache_size:int - (optional) Size of the cache to keep filtered URLs, 0 or None means no caching. Default: 512
- name:str - (optional) The name of the filter (inherited from the logging.Filter)

License

This project is licensed under the MIT License.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.2.0

Jul 15, 2025

1.1.0

Jul 15, 2025

1.0.0

Jul 14, 2025

0.2.2

Jul 14, 2025

0.2.1

Jul 14, 2025

0.2.0

Jul 14, 2025

0.1.0

Jul 14, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

filter_url-1.2.0.tar.gz (11.2 kB view details)

Uploaded Jul 15, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

filter_url-1.2.0-py3-none-any.whl (9.6 kB view details)

Uploaded Jul 15, 2025 Python 3

File details

Details for the file filter_url-1.2.0.tar.gz.

File metadata

Download URL: filter_url-1.2.0.tar.gz
Upload date: Jul 15, 2025
Size: 11.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.10

File hashes

Hashes for filter_url-1.2.0.tar.gz
Algorithm	Hash digest
SHA256	`d0138995c96917aa75048227d714e0eee849dfa4fdff28a918af3f228403a66e`
MD5	`e39d07221feda451fe039cb42455bb32`
BLAKE2b-256	`ca38b0243052d7f287f219bd47339f523ff97e3b38b8c7902b6576aa0866e67f`

See more details on using hashes here.

File details

Details for the file filter_url-1.2.0-py3-none-any.whl.

File metadata

Download URL: filter_url-1.2.0-py3-none-any.whl
Upload date: Jul 15, 2025
Size: 9.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.10

File hashes

Hashes for filter_url-1.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`37e47a9170d7bb7d2eb1f11bd9afc9d1d33fba9413b4ba8c503604fae853bf95`
MD5	`4c14f13da5c34b02335cac8483b805af`
BLAKE2b-256	`0b9ae38227deafaa6934017f94980286727548e4f4f44b0497cdb4be39649e2e`

See more details on using hashes here.

filter-url 1.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

filter-url

Key Features

Installation

Quick Start

Usage & Examples

Basic Filtering (Standalone Function)

Advanced: Using the `FilterURL` Class for Performance

Integration with Python's `logging` Module

Corner Cases & Considerations

API Reference

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

filter-url 1.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

filter-url

Key Features

Installation

Quick Start

Usage & Examples

Basic Filtering (Standalone Function)

Advanced: Using the FilterURL Class for Performance

Integration with Python's logging Module

Corner Cases & Considerations

API Reference

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Advanced: Using the `FilterURL` Class for Performance

Integration with Python's `logging` Module