Production-grade logging for Spark data platforms (Fabric & Databricks)

These details have not been verified by PyPI

Project links

Project description

LakeTrace Logger

Production-grade logging for Spark data platforms (Microsoft Fabric & Databricks)

LakeTrace is a cross-platform Python logging module designed specifically for Spark data platforms. It provides safe, performant logging with structured output, local file rotation, and optional lakehouse storage integration.

Installation

pip install laketrace

✅ Why LakeTrace (vs SparkLogger)

LakeTrace is built for Spark data platforms and removes the common failure modes of basic Spark logging:

Driver-safe: no executor logging, no distributed file writes.
No remote appends: all logging stays local with rotation and retention.
Structured JSON: consistent records with runtime metadata and bound context.
Fabric + Databricks aware: platform detection built in.
Crash‑safe logging: optional catch prevents formatter errors from breaking jobs.
Scalable I/O: enqueue mode for high-throughput workloads.

✨ Key Features

Cross‑platform: Fabric notebooks, Fabric Spark jobs, Databricks notebooks/jobs
Structured JSON with context binding and runtime metadata
Local rotation, retention, and compression
Stdout emission for job logs
Optional end‑of‑run lakehouse upload
Thread‑safe and notebook re‑execution safe

🚀 Quick Start

from laketrace import get_logger

logger = get_logger("my_job")
logger.info("Starting data processing")

stage = logger.bind(stage="extract", dataset="sales")
stage.info("Extracting sales data")

logger.upload_log_to_lakehouse("Files/logs/my_job.log")

⚙️ Configuration Highlights

logger = get_logger(
    "my_job",
    config={
        "log_dir": "/tmp/laketrace_logs",
        "rotation": "500 MB",
        "retention": "7 days",
        "compression": "gz",
        "level": "INFO",
        "json": True,
        "stdout": True,
        "serialize": True,
        "enqueue": False,
        "filter": None,
        "formatter": None,
        "catch": True,
    }
)

� Supported Features

LakeTrace provides comprehensive logging capabilities organized by feature category:

Core Features (Proven & Stable)

Rotation: Size-based (MB), time-based (hourly/daily/weekly/monthly), interval-based, and callable rotation strategies
Retention: File count-based and time-based cleanup policies
Compression: Gzip, bzip2, and ZIP archive support for rotated logs
Handler Management: Track and manage multiple log file handlers with unique IDs
Async I/O: Enqueue mode for high-throughput workloads with background thread writing

Advanced Features

Custom Formatters: Apply custom message formatting rules
Custom Filters: Control which records get logged
Callbacks: Hook into log lifecycle events
Multiprocessing Safety: Thread-safe operations across distributed Spark environments
Error Catching: Optional exception handler prevents formatter errors from breaking jobs

Performance Features

Throughput Optimization: Handle high-volume logging without performance degradation
Memory Efficiency: Minimal overhead in memory usage during execution
Concurrency Support: Safe operation with concurrent logging from multiple threads

Security Features

Message Sanitization: Remove or mask sensitive data from logs
PII Masking: Automatic detection and redaction of personally identifiable information
Format String Escaping: Prevent format string vulnerabilities
Newline Escaping: Sanitize log content to prevent log injection attacks
Secure Permissions: Control file access in shared environments

�🔄 Migration Guides

From SparkLogger

SparkLogger often leads to executor logging overhead and cross-partition serialization issues. LakeTrace moves all logging to the driver:

Before (SparkLogger):

from pyspark.taskcontext import TaskContext
from delta.tables import DeltaTable

# Problem: Executors attempt to log, causing distributed serialization
for partition in range(num_partitions):
    df.filter(...).collect()  # Executor logs serialized back to driver

After (LakeTrace):

from laketrace import get_logger

logger = get_logger("my_job")  # Driver only

# Log from driver, use print() in executors
df = spark.read.parquet(path)
logger.info(f"Loaded {df.count()} rows")  # Clean, structured, driver-safe

From notebookutils.fs.append

Using notebookutils.fs.append() for logging causes performance degradation and can hang Spark jobs due to repeated remote I/O per log line. LakeTrace uses local rotation instead:

Before (notebookutils.fs.append):

from notebookutils.mssparkutils import fs

# Problem: Each log line triggers remote I/O → job hangs
for i in range(1000):
    fs.append("/mnt/logs/job.log", f"Processing {i}\n")  # Remote write per line
    process_data(i)

After (LakeTrace):

from laketrace import get_logger

logger = get_logger("my_job")

# Local rotation, zero remote I/O during execution
for i in range(1000):
    logger.info(f"Processing {i}")  # Fast local write, no hangs
    process_data(i)

# Upload once at the end
logger.upload_log_to_lakehouse("/Files/logs/job.log")

🧪 Workload Test Runner

Run the consolidated workload tests:

python tests/run_workloads.py

Workload groups live under:

✅ Safety Notes

Driver only: use print() in executors.
Single upload: call upload_log_to_lakehouse() once at the end.
No remote append: avoids Spark job hangs and retries.

📝 License

MIT License - see LICENSE file for details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.0.2

Feb 5, 2026

This version

1.0.1

Feb 4, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

laketrace-1.0.1.tar.gz (26.3 kB view details)

Uploaded Feb 4, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

laketrace-1.0.1-py3-none-any.whl (28.5 kB view details)

Uploaded Feb 4, 2026 Python 3

File details

Details for the file laketrace-1.0.1.tar.gz.

File metadata

Download URL: laketrace-1.0.1.tar.gz
Upload date: Feb 4, 2026
Size: 26.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for laketrace-1.0.1.tar.gz
Algorithm	Hash digest
SHA256	`bb727ce21712cbf891dbc4ecfda8a87843a1b8d450924e35f334cd8af3f64bdb`
MD5	`7d00e9f45ca6111e549901989c348eb9`
BLAKE2b-256	`a9792248593b614525793f3c52eb9eeb9893e2694c6ced487873feea0c053bed`

See more details on using hashes here.

Provenance

The following attestation bundles were made for laketrace-1.0.1.tar.gz:

Publisher: publish-to-pypi.yml on Keayoub/laketrace

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: laketrace-1.0.1.tar.gz
- Subject digest: bb727ce21712cbf891dbc4ecfda8a87843a1b8d450924e35f334cd8af3f64bdb
- Sigstore transparency entry: 913639357
- Sigstore integration time: Feb 4, 2026
Source repository:
- Permalink: Keayoub/laketrace@4ea5be7692aa92966ec7878ed0709e61025d287f
- Branch / Tag: refs/tags/v1.0.1
- Owner: https://github.com/Keayoub
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-to-pypi.yml@4ea5be7692aa92966ec7878ed0709e61025d287f
- Trigger Event: push

File details

Details for the file laketrace-1.0.1-py3-none-any.whl.

File metadata

Download URL: laketrace-1.0.1-py3-none-any.whl
Upload date: Feb 4, 2026
Size: 28.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for laketrace-1.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ea74ee28317954ac9e28cdcdbe2e2f5a94e009977b0ebb3bbccd89b2bc42e8a2`
MD5	`46ccdc6f2e7c35010b82108d78415c8d`
BLAKE2b-256	`bd5a2ec41427234f6822ede25e590a4c505a6e0badf513ff9b565bb2c3330e5a`

See more details on using hashes here.

Provenance

The following attestation bundles were made for laketrace-1.0.1-py3-none-any.whl:

Publisher: publish-to-pypi.yml on Keayoub/laketrace

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: laketrace-1.0.1-py3-none-any.whl
- Subject digest: ea74ee28317954ac9e28cdcdbe2e2f5a94e009977b0ebb3bbccd89b2bc42e8a2
- Sigstore transparency entry: 913639411
- Sigstore integration time: Feb 4, 2026
Source repository:
- Permalink: Keayoub/laketrace@4ea5be7692aa92966ec7878ed0709e61025d287f
- Branch / Tag: refs/tags/v1.0.1
- Owner: https://github.com/Keayoub
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-to-pypi.yml@4ea5be7692aa92966ec7878ed0709e61025d287f
- Trigger Event: push

laketrace 1.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

LakeTrace Logger

Installation

✅ Why LakeTrace (vs SparkLogger)

✨ Key Features

🚀 Quick Start

⚙️ Configuration Highlights

� Supported Features

Core Features (Proven & Stable)

Advanced Features

Performance Features

Security Features

�🔄 Migration Guides

From SparkLogger

From notebookutils.fs.append

🧪 Workload Test Runner

✅ Safety Notes

📝 License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance