Centralized checkpoint trace collection and query service for distributed systems
Project description
TraceHub
Centralized checkpoint trace collection and query service for distributed systems.
License: Apache 2.0
Overview
TraceHub collects checkpoint traces from distributed services and provides:
- Centralized Storage: All traces in one place (SQLite)
- Real-time Streaming: SSE endpoint for live trace updates
- Correlation Tracking: Query all traces for a request chain
- Auto-cleanup: Configurable retention (default: 24 hours)
Architecture
┌─────────────────────┐ ┌─────────────────────┐
│ Manager (muid.io) │ │ Worker (kiberos.ai)│
│ │ │ │
│ checkpoint_logger │ │ checkpoint_logger │
│ │ │ │ │ │
└─────────┼───────────┘ └─────────┼───────────┘
│ │
└─────────┬─────────────────┘
│
▼
┌────────────────┐
│ TraceHub │
│ (FastAPI) │
│ │
│ SQLite + SSE │
└───────┬────────┘
│
▼
┌────────────────┐
│ CLI / Grafana │
│ (Query API) │
└────────────────┘
Quick Start
1. Start TraceHub Server
# Using Python directly
cd tracehub
pip install -r requirements.txt
python server.py --port 8099
# Or via Docker (coming soon)
docker run -p 8099:8099 tracehub
2. Configure Services
Set environment variables on services using checkpoint_logger:
# On Manager (muid.io)
export TRACEHUB_URL=http://tracehub-host:8099
export CHECKPOINT_TRACING=1
# On Worker (kiberos.ai)
export TRACEHUB_URL=http://tracehub-host:8099
export CHECKPOINT_TRACING=1
3. Query Traces
# List recent correlation IDs
curl http://tracehub:8099/correlations
# Get traces for specific correlation ID
curl http://tracehub:8099/traces/cli-12345-abc
# Stream traces in real-time
curl http://tracehub:8099/traces/cli-12345-abc/stream
API Endpoints
Ingest
| Endpoint | Method | Description |
|---|---|---|
/ingest |
POST | Batch ingest traces |
/ingest/single |
POST | Ingest single trace |
Query
| Endpoint | Method | Description |
|---|---|---|
/traces/{corr_id} |
GET | Get all traces for correlation ID |
/traces/{corr_id}/stream |
GET | SSE stream for real-time traces |
/correlations |
GET | List recent correlation IDs |
Admin
| Endpoint | Method | Description |
|---|---|---|
/health |
GET | Health check |
/cleanup |
DELETE | Force cleanup old traces |
Configuration
| Environment Variable | Default | Description |
|---|---|---|
TRACEHUB_DB |
/tmp/tracehub.db |
SQLite database path |
TRACEHUB_PORT |
8099 |
Server port |
TRACEHUB_RETENTION_HOURS |
24 |
Trace retention period |
Client Configuration
| Environment Variable | Default | Description |
|---|---|---|
TRACEHUB_URL |
`` | TraceHub server URL (empty = disabled) |
TRACEHUB_BATCH_SIZE |
10 |
Traces per batch |
TRACEHUB_FLUSH_INTERVAL |
1.0 |
Flush interval in seconds |
Integration
With checkpoint_logger (Automatic)
When TRACEHUB_URL is set, checkpoint_logger automatically sends traces:
from checkpoint_logger import get_checkpoint_logger, set_correlation_id
# Enable tracing
os.environ['CHECKPOINT_TRACING'] = '1'
os.environ['TRACEHUB_URL'] = 'http://tracehub:8099'
# Use as normal - traces auto-sent to TraceHub
log = get_checkpoint_logger("MA")
set_correlation_id("req-12345")
log.checkpoint_entry("REST", "/api/agents", {"id": "123"})
log.checkpoint_exit("REST", "/api/agents", {"status": "ok"})
Manual Integration
from tracehub.client import TraceHubClient, TraceEntry
client = TraceHubClient("http://tracehub:8099")
entry = TraceEntry(
source_id="MY",
correlation_id="req-12345",
timestamp=time.time() * 1000,
suffix="abc",
direction="->",
operation="HTTP",
endpoint="/api/test",
)
client.send(entry)
CLI Usage
# Via kbc
kbc tracehub start # Start TraceHub server
kbc tracehub status # Check status
kbc trace show # Show recent traces (uses TraceHub if available)
# Via kiberos CLI
kiberos trace list # List traces from TraceHub
kiberos trace current # Show current trace ID
Trace Format
Each trace entry contains:
{
"source_id": "MA",
"correlation_id": "cli-12345-abc",
"timestamp": 1706803200123,
"suffix": "x7K",
"direction": "->",
"operation": "REST",
"endpoint": "/api/agents",
"data": {"binding_id": "123"},
"hostname": "muid.io"
}
Source IDs
| ID | Component |
|---|---|
| MA | Manager API (REST) |
| WS | WebSocket handlers |
| WK | Worker client |
| VM | VM Agent |
| MB | MessageBridge |
| JW | JWT Authority |
| SP | Spawner |
License
Apache License 2.0 - see LICENSE
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tracehub-0.1.0.tar.gz.
File metadata
- Download URL: tracehub-0.1.0.tar.gz
- Upload date:
- Size: 14.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1838ec6816ac0a552a0ddc72677c41079473cf3302d11c8a2a6a1c47de99fa8e
|
|
| MD5 |
d65aa78c4ce3f97ddbb9af6f6d10c2a8
|
|
| BLAKE2b-256 |
0b2ec34d2a3a807dcc51d9dfe9176197c5663db124c19409936d9997895463e9
|
File details
Details for the file tracehub-0.1.0-py3-none-any.whl.
File metadata
- Download URL: tracehub-0.1.0-py3-none-any.whl
- Upload date:
- Size: 7.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
851ff43b5ebe7dc594702845bf2486e63fc9ca7b6598c61a7e5adfadefd4bc40
|
|
| MD5 |
380112fc50e5ffc736cb118082e71815
|
|
| BLAKE2b-256 |
709867ef6c59e2eb0d186022be76875bf85692afb42533f11d032fa81d9c3201
|