Skip to main content

Cloud-agnostic Python audit logger for emitting PHI-safe behavioral healthcare audit events conforming to bh-audit-schema v1.0

Project description

bh-audit-logger

Cloud-agnostic Python utilities for emitting privacy-preserving audit events for behavioral healthcare systems.

Events conform to bh-audit-schema v1.0: https://github.com/bh-healthcare/bh-audit-schema

Why

Audit logging in healthcare is often inconsistent across services and jobs. This library provides a small, boring, correct baseline for emitting structured audit events from any Python code — Lambdas, workers, CLIs, ETL jobs, cron scripts — without logging raw PHI.

It is not tied to FastAPI (see bh-fastapi-audit for middleware-based logging).

Quickstart

pip install bh-audit-logger
from bh_audit_logger import AuditLogger, AuditLoggerConfig

logger = AuditLogger(
    config=AuditLoggerConfig(
        service_name="overstory-datalake",
        service_environment="prod",
    )
)

logger.audit(
    "READ",
    actor={"subject_id": "service_lambda", "subject_type": "service"},
    resource={"type": "Patient", "id": "patient_123"},
    outcome={"status": "SUCCESS"},
    correlation={"request_id": "req_abc"},
)

By default, events are emitted as one compact JSON line via Python logging (stdout-friendly).

Example output

{"schema_version":"1.0","event_id":"6d3f0f6b-0c1a-4b9f-9d6f-9f6f7f5b2b0a","timestamp":"2026-02-17T12:00:00Z","service":{"name":"overstory-datalake","environment":"prod"},"actor":{"subject_id":"service_lambda","subject_type":"service"},"action":{"type":"READ","data_classification":"UNKNOWN"},"resource":{"type":"Patient","id":"patient_123"},"outcome":{"status":"SUCCESS"},"correlation":{"request_id":"req_abc"}}

Production usage: container logging

from bh_audit_logger import AuditLogger, AuditLoggerConfig, LoggingSink

logger = AuditLogger(
    config=AuditLoggerConfig(
        service_name="my-service",
        service_environment="prod",
    ),
    sink=LoggingSink(logger_name="bh.audit", level="INFO"),
)

Works anywhere stdout is collected: CloudWatch, GCP Cloud Logging, Azure Monitor, Kubernetes logging pipelines.

AWS Lambda / serverless

import json
import logging
from bh_audit_logger import AuditLogger, AuditLoggerConfig, LoggingSink

# Configure root logger for structured JSON to stdout (CloudWatch picks this up)
logging.basicConfig(level=logging.INFO)

audit = AuditLogger(
    config=AuditLoggerConfig(
        service_name="patient-export-lambda",
        service_environment="prod",
        service_version="2026.02.17.1",
    ),
    sink=LoggingSink(logger_name="bh.audit", level="INFO"),
)

def handler(event, context):
    audit.audit_access(
        "EXPORT",
        actor={"subject_id": "service_lambda", "subject_type": "service"},
        resource={"type": "PatientExport", "id": event.get("export_id", "unknown")},
        phi_touched=True,
        data_classification="PHI",
        correlation={"request_id": context.aws_request_id},
    )
    # ... do work ...

Each invocation emits one compact JSON line to stdout. Most platforms ingest stdout by default; configure your runtime logging pipeline as needed.

Production hardening

Sink failure isolation

By default, sink failures are logged but never propagate to your application logic:

config = AuditLoggerConfig(
    service_name="my-service",
    emit_failure_mode="log",       # "silent", "log" (default), or "raise"
    failure_logger_name="bh.audit.internal",
)
  • "silent" — swallow errors, increment counter only
  • "log" — log a compact summary (event_id, service, action, resource) without the full payload
  • "raise" — re-raise the original exception (use in dev/test)

Metadata restrictions

Metadata values are enforced to be scalar JSON types (str, int, float, bool, None). Dict, list, and tuple values are silently dropped. Long strings are truncated:

config = AuditLoggerConfig(
    service_name="my-service",
    metadata_allowlist={"batch_id", "region"},
    max_metadata_value_length=200,   # default; truncated strings end with "..."
)

Internal counters

Track emission health via lightweight counters:

logger = AuditLogger(config=config)
# ... emit events ...
print(logger.stats.snapshot())
# {"events_emitted_total": 42, "emit_failures_total": 0, "events_dropped_total": 0, "validation_failures_total": 0}

Synchronous emission

Audit emission is synchronous in v0.2.x. For high-throughput systems, use LoggingSink (which defers I/O to your logging pipeline) or plan for async sinks in v0.3.

Sinks

Sink Use case Notes
LoggingSink (default) Production One compact JSON line per event via Python logging; stdout-friendly
JsonlFileSink Local dev, demos Appends to a .jsonl file; thread-safe, flush-on-write by default
MemorySink Tests Stores events in a list; use len(sink) and sink.events in assertions

Pass any sink to AuditLogger(config=..., sink=...). Omit sink to get LoggingSink by default.

Configuration

AuditLoggerConfig fields:

Field Type Default Description
service_name str required Name of the service emitting events
service_environment str "unknown" Deployment environment (prod, staging, dev)
service_version str | None None Service version/build identifier
default_actor_id str "unknown" Default actor when none provided
default_actor_type str "service" Default actor type (human/service)
metadata_allowlist set[str] set() Allowed metadata keys (empty = no metadata)
sanitize_errors bool True Sanitize error messages (redact SSN/email/phone)
error_message_max_len int 200 Max length for sanitized error messages
time_source Callable utcnow Injectable time source for testing
id_factory Callable uuid4 Injectable ID factory for testing
schema_version str "1.0" Locked to 1.0 unless overridden

PHI-safe by default (via allowlists and error sanitization)

  • No request/response bodies — the library never tries to capture payloads
  • Metadata is opt-in and strictly allowlisted — only keys in metadata_allowlist pass through; values must be scalar JSON types (str, int, float, bool, null)
  • Error messages are sanitized — SSN, email, phone patterns are redacted and messages are length-capped
  • PHI safety is enforced by tests that assert synthetic PHI tokens never appear in emitted events

Important: This library does not attempt to detect or remove PHI from user-supplied IDs or free-text fields beyond the configured allowlist and error-message sanitization. Treat resource IDs (e.g. patient_id) as sensitive and prefer surrogate identifiers wherever possible. The goal is safe defaults, not total PHI stripping.

Do not do this

# BAD: patient name in metadata
logger.audit("READ", resource={"type": "Patient"}, metadata={"patient_name": "Jane Doe"})

# BAD: full stack trace in error (may contain PHI from variables)
logger.audit("READ", resource={"type": "Patient"}, error=traceback.format_exc())

# BAD: MRN or SSN as a resource ID
logger.audit("READ", resource={"type": "Patient", "id": "123-45-6789"})

Instead, use surrogate IDs, keep metadata to operational keys (job name, batch ID, region), and let sanitize_errors=True (the default) handle error messages.

Schema conformance

All events conform to bh-audit-schema v1.0. Required fields:

  • schema_version = "1.0"
  • event_id (UUID)
  • timestamp (UTC ISO 8601)
  • service (name, environment)
  • actor (subject_id, subject_type)
  • action (type)
  • resource (type)
  • outcome (status)

Optional schema validation

pip install bh-audit-logger[jsonschema]
from bh_audit_logger import validate_event

event = {...}
validate_event(event)  # raises ValidationError on failure

Validates against the vendored bh-audit-schema v1.0 JSON schema included in the package.

Related projects

License

Apache 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bh_audit_logger-0.2.0.tar.gz (21.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bh_audit_logger-0.2.0-py3-none-any.whl (22.7 kB view details)

Uploaded Python 3

File details

Details for the file bh_audit_logger-0.2.0.tar.gz.

File metadata

  • Download URL: bh_audit_logger-0.2.0.tar.gz
  • Upload date:
  • Size: 21.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for bh_audit_logger-0.2.0.tar.gz
Algorithm Hash digest
SHA256 e924a8728898cd7cd5193dcf07421ec5934052cb3028c36e289e539f717794f4
MD5 8cbf4f09e7ff10f35feea66b95aed712
BLAKE2b-256 ce4d95a3983d837973d639a0f870351db182f79d2f42e182adb1036dca7968e9

See more details on using hashes here.

Provenance

The following attestation bundles were made for bh_audit_logger-0.2.0.tar.gz:

Publisher: publish.yml on bh-healthcare/bh-audit-logger

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file bh_audit_logger-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for bh_audit_logger-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3b0c3bbc77346cf0460b49cb7f675345306fa89b23ecc2c5e0d2191fad3e0509
MD5 3db712c0f86a3594b27224d40f4ac994
BLAKE2b-256 33dd0f2a8babcc258f1285d435b7bc6c4dadf20ee192a3f0eb9bd45d56e82125

See more details on using hashes here.

Provenance

The following attestation bundles were made for bh_audit_logger-0.2.0-py3-none-any.whl:

Publisher: publish.yml on bh-healthcare/bh-audit-logger

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page