A lightweight Python package for collecting LLM traces
Project description
Burt Logger
A lightweight, production-ready Python package for collecting LLM training data. Automatically pipe LLM request/response data to your backend for model fine-tuning and dataset creation.
Features
✨ Non-blocking & Asynchronous - Uses background threads and queues to ensure zero impact on your application performance
🔄 Intelligent Batching - Automatically batches logs by size or time interval for optimal network efficiency
🛡️ Production-Ready - Thread-safe, graceful error handling, and automatic retry with exponential backoff
🚀 Minimal Dependencies - Only requires requests library, everything else from Python stdlib
⚙️ Highly Configurable - Customize batch sizes, flush intervals, queue sizes, retry logic, and more
🔌 Provider Agnostic - Works with OpenAI, Anthropic, or any LLM provider
Installation
pip install burt-logger
Or install from source:
git clone https://github.com/trainburt/burt-logger-python.git
cd burt-logger-python
pip install -e .
Quick Start
from burt_logger import LLMLogger
# Initialize the logger
logger = LLMLogger(
endpoint="https://your-api.com/logs",
api_key="your-api-key"
)
# Log your LLM requests and responses
response = openai.ChatCompletion.create(...) # Your existing LLM call
logger.log(
request={
"model": "gpt-3.5-turbo",
"messages": [...],
},
response={
"content": response.choices[0].message.content,
"usage": {
"prompt_tokens": usage.get("prompt_tokens", 0),
"completion_tokens": usage.get("completion_tokens", 0),
"total_tokens": usage.get("total_tokens", 0),
},
}
)
# Gracefully shutdown (flushes remaining logs)
logger.shutdown()
That's it! The logger handles everything asynchronously in the background.
Using Context Manager
The logger supports context managers for automatic cleanup:
with LLMLogger(endpoint="...", api_key="...") as logger:
# Your code here
logger.log(request=..., response=...)
# Automatic shutdown and flush on exit
Configuration
The LLMLogger class accepts the following parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
endpoint |
str | Required | Backend API endpoint to send logs to |
api_key |
str | Required | API key for authentication |
batch_size |
int | 10 | Number of logs to batch before sending |
flush_interval |
float | 5.0 | Seconds to wait before flushing incomplete batch |
max_queue_size |
int | 10000 | Maximum number of logs to queue |
max_retries |
int | 3 | Maximum number of retry attempts |
initial_retry_delay |
float | 1.0 | Initial delay for exponential backoff (seconds) |
max_retry_delay |
float | 60.0 | Maximum retry delay (seconds) |
timeout |
float | 10.0 | HTTP request timeout (seconds) |
debug |
bool | False | Enable debug logging |
Example with Custom Configuration
logger = LLMLogger(
endpoint="https://your-api.com/logs",
api_key="your-api-key",
batch_size=20, # Send in batches of 20
flush_interval=10.0, # Or every 10 seconds
max_queue_size=50000, # Large queue for high-volume apps
max_retries=5, # More retries for flaky networks
debug=True, # See what's happening
)
API Reference
log(request, response, metadata=None)
Log an LLM request/response pair.
Parameters:
request(dict): The LLM request data (prompt, model, parameters, etc.)response(dict): The LLM response data (completion, tokens, etc.)metadata(dict, optional): Additional metadata (user_id, session_id, etc.)
Returns:
bool:Trueif log was queued successfully,Falseif queue is full
Example:
success = logger.log(
request={"model": "gpt-4", "prompt": "..."},
response={"completion": "...", "tokens": 150},
metadata={"user_id": "123", "environment": "production"}
)
flush(timeout=None)
Flush all queued logs and wait for them to be sent.
Parameters:
timeout(float, optional): Maximum time to wait in seconds.Nonemeans wait indefinitely.
Example:
logger.flush(timeout=5.0) # Wait up to 5 seconds
shutdown(timeout=10.0)
Gracefully shutdown the logger, flushing all remaining logs.
Parameters:
timeout(float): Maximum time to wait for shutdown in seconds
Example:
logger.shutdown(timeout=10.0)
get_stats()
Get statistics about logger performance.
Returns:
dict: Dictionary containing statistics
Example:
stats = logger.get_stats()
print(stats)
# {
# 'logs_queued': 150,
# 'logs_sent': 145,
# 'logs_failed': 5,
# 'batches_sent': 15,
# 'batches_failed': 1
# }
How It Works
-
Queueing: When you call
log(), the entry is immediately added to a thread-safe queue and the method returns instantly (non-blocking) -
Batching: A background worker thread monitors the queue and batches logs based on:
- Batch size (e.g., 10 logs)
- Time interval (e.g., every 5 seconds)
-
Sending: Batches are sent to your backend API via HTTP POST with proper authentication headers
-
Retry Logic: If sending fails:
- 5xx errors: Retries with exponential backoff
- 429 (rate limit): Retries with exponential backoff
- 4xx errors: No retry (client error)
- Network errors: Retries with exponential backoff
-
Shutdown: On program exit or explicit shutdown, all remaining logs are flushed
Backend API Expected Format
Your backend should expect POST requests with the following format:
Headers:
Content-Type: application/json
Authorization: Bearer <api_key>
Payload:
{
"logs": [
{
"request": { /* your request data */ },
"response": { /* your response data */ },
"metadata": { /* optional metadata */ },
"timestamp": 1234567890.123
},
...
],
"timestamp": 1234567890.456
}
Expected Response:
- Success: HTTP 200, 201, or 202
- Server Error: HTTP 5xx (will retry)
- Client Error: HTTP 4xx (will not retry)
- Rate Limited: HTTP 429 (will retry)
Error Handling
The logger is designed to be resilient and never crash your application:
- Queue Full: If the queue is full,
log()returnsFalseand the log is dropped - Network Errors: Automatic retry with exponential backoff
- Backend Down: Retries up to
max_retriestimes, then drops the batch - Thread Crashes: The worker thread is monitored and restarted if needed
All errors are logged to Python's logging system. Enable debug mode to see detailed logs:
logger = LLMLogger(..., debug=True)
Testing
Run the test suite:
# Install dev dependencies
pip install -e ".[dev]"
# Run tests
pytest tests/ -v
# With coverage
pytest tests/ --cov=burt_logger --cov-report=html
Development
# Clone the repository
git clone https://github.com/trainburt/burt-logger-python.git
cd burt-logger-python
# Install in development mode
pip install -e ".[dev]"
# Run tests
pytest
# Format code
black burt_logger/ tests/
# Lint
flake8 burt_logger/ tests/
Performance Considerations
- Non-blocking:
log()calls take ~0.001ms (just queue insertion) - Memory: Each log entry is ~1-5KB. Default max queue size is 10,000 logs = ~10-50MB
- Network: Batching reduces network overhead. 1000 logs/second = 100 batches (batch_size=10)
- Threads: Uses a single background worker thread
Production Recommendations
-
Set appropriate batch_size: Larger batches are more efficient but increase memory usage
logger = LLMLogger(..., batch_size=50) # For high-volume apps
-
Monitor queue size: If logs are being dropped, increase
max_queue_sizeor reduce trafficstats = logger.get_stats() if stats['logs_failed'] > 0: # Handle appropriately
-
Use metadata: Add user_id, session_id, etc. for better data analysis
logger.log(..., metadata={"user_id": user_id, "env": "prod"})
-
Graceful shutdown: Always call
shutdown()or use context managerimport atexit atexit.register(logger.shutdown)
License
MIT License - see LICENSE file for details
Support
Contributing
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Ensure all tests pass
- Submit a pull request
Changelog
0.1.0 (Initial Release)
- ✅ Non-blocking asynchronous logging
- ✅ Intelligent batching (by size and time)
- ✅ Thread-safe operations
- ✅ Retry with exponential backoff
- ✅ Graceful shutdown and cleanup
- ✅ Comprehensive test suite
- ✅ Context manager support
- ✅ Statistics tracking
Built with ❤️ for the LLM training data collection community
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file burt_logger-0.1.1.tar.gz.
File metadata
- Download URL: burt_logger-0.1.1.tar.gz
- Upload date:
- Size: 32.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a855967e810951020405b8f4c60850e29d700aa536dc9fbf9958bd3b283ca2b2
|
|
| MD5 |
36d1be59b9ef1ba7ace0918c44da58aa
|
|
| BLAKE2b-256 |
f20f7d5ca883eca8f7bae4fe0fc6557aa787dc885ca5461e50c93a8709bcebe0
|
File details
Details for the file burt_logger-0.1.1-py3-none-any.whl.
File metadata
- Download URL: burt_logger-0.1.1-py3-none-any.whl
- Upload date:
- Size: 13.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
23c3c7f00379d6bd9d0c6816d039ded724852a29d6457b64525aaa4f8810badc
|
|
| MD5 |
97bd2c1d95ede0f5b18c99a9117fad24
|
|
| BLAKE2b-256 |
9155298ba7d21b4a1f939822e892af0f60e32ae478b6770d053e593ce4d95ae9
|