Skip to main content

Deterministic privacy-preserving logger for Python.

Project description

BlindLog v1.1

GitHub

BlindLog is a zero-dependency, production-ready Privacy-Preserving Observability SDK for Python.

It solves the fundamental conflict in backend engineering: The developer needs to see everything to fix bugs, but compliance constraints (GDPR/HIPAA/SOC2) dictate that you cannot see anything personal.

By replacing raw Personal Identifiable Information (PII) with consistent, structure-preserving deterministic hashes, developers retain perfect system observability without leaking actual identities into application logs.


💡 The "Why": Why Use BlindLog?

The Problem with Redaction

Legacy redaction tools look for emails or credit cards and replace them with static text like ***** or [REDACTED]. The fatal flaw here is context destruction. If your logs show [REDACTED] failed to purchase order [REDACTED], you cannot trace a specific user's journey through your microservices when every trace of their identity maps to the exact same generic string.

The BlindLog Solution: Deterministic Pseudonymization

BlindLog uses natively-keyed BLAKE2b cryptography to consistently map data:

  • user1@gmail.com always logs as blnd_ref_8ax92bfac000...@masked.com.
  • user2@gmail.com always logs as blnd_ref_1c89f81ba000...@masked.com.

You instantly know if the same user triggered 50 errors across 4 microservices over a week, while remaining legally compliant because the raw identity is cryptographically destroyed.


🚀 Installation

BlindLog has zero external dependencies and runs natively on Python 3.8+.

pip install blindlog

🛠️ Exactly How to Use It

BlindLog is built on an extensible architecture that operates automatically once plugged into your existing application. It intercepts data, traverses JSON payloads recursively, and masks strings without breaking schema.

1. Mandatory Security Configuration

BlindLog operates on keyed hashes. To prevent rainbow-table reverse-engineering, you must supply a cryptographic secret.

Set the following environment variables on your production servers:

export BLINDLOG_SECRET="your-super-strong-random-secret-key"
export BLINDLOG_SALT="optional-additional-salt"

Warning: If BLINDLOG_SECRET is missing, BlindLog will violently crash on boot to protect your system from generating reversible, unkeyed hashes. For local development, you can set export BLINDLOG_DEBUG="true" to bypass this crash.


2. Standard Python Logging Interception

BlindLog ships with a logging.Formatter that hooks directly into Python's native logging module. It intercepts all string messages, dictionary args, and even Exception tracebacks to scrub PII before it hits your terminal or logging aggregator (like Datadog/Elasticsearch).

import logging
from blindlog.formatters import BlindLogFormatter

# 1. Initialize your logger
logger = logging.getLogger("my_application")
logger.setLevel(logging.INFO)

# 2. Create a handler
handler = logging.StreamHandler()

# 3. Attach the BlindLogFormatter!
handler.setFormatter(BlindLogFormatter())
logger.addHandler(handler)

# Usage A: Standard Strings (Slow Path - Regex Scanning)
logger.info("Failed login for akhand@gmail.com on card 4111-2222-3333-4444")
# Output: Failed login for blnd_ref_8a9df2c00000...@masked.com on card 4111-c918a2-f8b1c4-4444

# Usage B: Dictionary Arguments (Fast Path - Key Matching)
# BlindLog detects keys like 'password' or 'email' instantly.
logger.info("User created", {"email": "ceo@corp.com", "password": "super-secret"})
# Output: User created {'email': 'blnd_ref_9bf... masked', 'password': 'blind:838ab...'}

# Usage C: Safe Exceptions
try:
    raise ValueError("User akhand@gmail.com exhausted their API quota")
except ValueError:
    logger.exception("A system error occurred")
    # Output: The stack trace is fully processed, and akhand@gmail.com is masked inside the Traceback!

3. FastAPI & Starlette Middleware

BlindLog acts as a Pure ASGI Middleware. It intercepts raw HTTP traffic before it hits your application routers. It safely buffers HTTP payloads (up to 5MB to prevent OOM DOS) and logs masked Request Bodies, Response Bodies, and Headers.

from fastapi import FastAPI
from blindlog.integrations.fastapi import BlindLogFastAPIMiddleware

app = FastAPI()

# Attach the middleware
app.add_middleware(BlindLogFastAPIMiddleware)

@app.post("/checkout")
async def checkout(payload: dict):
    # If the user sends {"credit_card": "4111-...", "cookie": "session_123"},
    # The middleware automatically logs the sanitized payload to standard out.
    return {"status": "success"}

What the Middleware handles automatically:

  • Request/Response Bodies: Deeply nested JSON is recursively traversed and masked.
  • HTTP Headers: Sensitive context headers (like Authorization, Cookie, X-API-Key) are extracted and encrypted without losing duplicate associations.
  • Streaming Protections: Safe passage for WebSockets and SSE pipelines.

4. Customizing the Configuration (BlindLogConfig)

You can tune BlindLog's rules by defining a BlindLogConfig. Once created, the config is frozen (immutable) to prevent runtime tampering.

from blindlog.core import BlindLogger
from blindlog.config import BlindLogConfig

# 1. Define custom sensitive keys. 
# Note: This overwrites the defaults, so add your specific database fields.
custom_keys = frozenset({"internal_db_id", "auth_token", "email"})

config = BlindLogConfig(
    secret_key="my-custom-key", # Will fall back to ENV var if omitted
    sensitive_keys=custom_keys,
    debug_mode=False
)

logger = BlindLogger(config=config)

Key Matching Engine (Fast Path): BlindLog checks dictionary keys via:

  1. Exact Match: e.g., "email" == "email".
  2. Suffix Match: e.g., "user_password" ends with "_password".
  3. Normalization: Hyphens (x-api-key) and camelCase (APIKey) are normalized to snake_case before matching, ensuring maximum coverage across varying schemas.

5. Custom Format Registration (Adding Regex)

BlindLog implements a Registry Pattern. If you have custom internal tokens (e.g., specific AWS KMS IDs or internal Employee IDs) you can teach BlindLog to find and format them dynamically in free-text.

import re
from blindlog.core import BlindLogger

logger = BlindLogger()

# 1. Define a regex pattern
aws_pattern = re.compile(r"AWS-KMS-\d{6}")

# 2. Define a callback function that takes the matched string and returns a safe string
# Note: You can also hash it dynamically inside the callback if you wish!
def mask_aws(matched_string: str) -> str:
    return "blnd_aws_TOKEN_REDACTED"

# 3. Register the rule
logger.registry.register(aws_pattern, mask_aws)

masked = logger.mask("Exception: AWS-KMS-123456 failed to load.")
# Output: "Exception: blnd_aws_TOKEN_REDACTED failed to load."

🛡️ Default Out-Of-The-Box Protections

BlindLog actively protects the following data types automatically via RegEx and Key-discovery:

  • Emails: Truncated and hashed (blnd_ref_HASH...@masked.com)
  • Credit Cards: Preserves major industry format (4111-HASH-HASH-1234)
  • API Keys & Tokens: Covers OpenAI, Stripe, AWS, Slack, and GitHub PATs natively.
  • Phone Numbers: International and NANP routing.
  • Social Security Numbers (SSN): US Formats.
  • IPv4 Addresses: Validated octet arrays.
  • HTTP/Web Standard Keys: cookie, set_cookie, authorization, password, secret, private_key, credentials.

🧮 Idempotency & System Guarantees

  • Idempotent Masking Guarantees: BlindLog uses strict structural RegEx evaluations (MASKED_PATTERN). If you pass already-masked data into the engine multiple times, it skips it instantly. You will never double-hash a log.
  • ReDoS Mitigation: Slow-path Regex execution cuts off after 10,000 characters, averting CPU exhaustion DOS attacks.
  • OOM Prevention: Middlewares strictly cap at 5MB buffer payloads.
  • Type Sabotage Checks: Gracefully handles None, True, and circular nested dictionary references without crashing or returning unmasked memory addresses.

For further exploration, please review our Architecture Guide and the CHANGELOG.md!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

blindlog-1.1.0.tar.gz (21.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

blindlog-1.1.0-py3-none-any.whl (16.8 kB view details)

Uploaded Python 3

File details

Details for the file blindlog-1.1.0.tar.gz.

File metadata

  • Download URL: blindlog-1.1.0.tar.gz
  • Upload date:
  • Size: 21.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for blindlog-1.1.0.tar.gz
Algorithm Hash digest
SHA256 05f82e29e6218387aee76fbf47ccaf3241f4b6de0e97dc8cf29e83ecd806e65f
MD5 774dea2b53cebd6187c71de9e8b13099
BLAKE2b-256 5a03dd0f37cbbe5ee5a01f62d48c97afaebdadf0c0a58034afef4df7521943a0

See more details on using hashes here.

File details

Details for the file blindlog-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: blindlog-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 16.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for blindlog-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4dac24df63d3d0c9dae55037636b1e6b49fb5f3485e276d1840be45970fcfe09
MD5 ec5edbf64f006c5a6105f27088c4f9b8
BLAKE2b-256 2550fc767517dfa598d7ee643f68c31b62f2b4a1280d7381087f80ef02a0a784

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page