TraceML: Lightweight ML Profiler
Project description
TraceML
Real-time training observability and failure attribution tool for PyTorch — lightweight, always-on, and actionable.
The Problem TraceML Solves
Training deep learning models shouldn't feel like debugging a black box. Yet we constantly face:
- 💥 CUDA OOM errors with no insight into which layer caused the memory spike
- 🐌 Slow training without knowing if the bottleneck is data loading, forward pass, backward pass, or optimizer
- 🔍 Layer-level mysteries — which layers consume the most memory? Which are slowest?
- 📊 Heavy profilers that are impractical to keep running during actual training
TraceML changes this with continuous, low-overhead visibility while your training runs.
What TraceML Does
TraceML answers the questions you actually need answered:
| Question | TraceML Answer |
|---|---|
| Which layer caused OOM ? | Automatic detection of the failing layer during forward or backward pass |
| What's slowing down my training step? | Step-level timing: dataloader → forward → backward → optimizer |
| Where did that memory spike happen? | Step-level (batch-level) memory tracking with peak attribution |
| Which layer is eating my GPU memory? | Per-layer memory breakdown (params + forward + backward) |
| Which layer is slow? | Per-layer compute time (forward + backward) |
Three ways to view the results:
- 🖥️ Terminal — live updates in your console
- 🌐 Web UI — local browser at
localhost:8765 - 📓 Jupyter notebooks — inline visualizations
Tracking Profiles (New)
TraceML supports two tracking profiles so you can choose the right trade-off between insight and overhead.
ESSENTIAL mode (lightweight, always-on)
Best for day-to-day training and long runs.
Tracks:
- Dataloader fetch time
- Training step time (GPU-aware)
- Step GPU memory (allocated + peak)
- System stats (CPU, RAM, GPU)
DEEP-DIVE mode (diagnostic)
Best for debugging OOMs and performance pathologies.
Tracks everything in Essential, plus:
- Per-layer memory (parameters, activations, gradients)
- Per-layer forward and backward time
Timed Regions (optional, very low overhead)
Optional instrumentation for specific code blocks.
Executed once per step per decorated function.
Tracks:
- Custom timing blocks (e.g. dataloader, forward, backward, optimizer)
- CPU or GPU time (via CUDA events)
- Low (one timing measurement per step)
Installation
pip install traceml-ai
For development:
git clone https://github.com/traceopt-ai/traceml.git
cd traceml
pip install -e '.[dev]'
Requirements: Python 3.9-3.13, PyTorch 1.12+
Platform support: macOS (Intel/ARM), Linux. Single-GPU training (DDP support coming soon).
Quick Start (Important)
Step-level tracking (required for all modes)
from traceml.decorators import trace_step
for batch in dataloader:
with trace_step():
outputs = model(batch["x"])
loss = criterion(outputs, batch["y"])
loss.backward()
optimizer.step()
optimizer.zero_grad(set_to_none=True)
Without trace_step(...):
- Step timing is not computed
- Step memory is not recorded
- Live dashboards will not update
Optional: Timing Specific Code Regions
Use @trace_time to time specific functions.
This works in all modes and has low overhead.
from traceml.decorators import trace_time
@trace_time("backward", use_gpu=True)
def backward_pass(loss, scaler=None):
loss.backward()
Notes:
use_gpu=Trueuses CUDA events (correct for async GPU work)use_gpu=Falseuses CPU wall-clock time
Deprecation (⚠️ Breaking change)
@trace_timestepis deprecated, use@trace_timeinstead
Deep-Dive: Model Registration
Required only for Deep-Dive mode.
from traceml.decorators import trace_model_instance
trace_model_instance(model)
Notes:
- Enables forward/backward hooks
- Required for per-layer memory and timing
- Required for OOM layer attribution
Running TraceML
traceml run train.py
You'll immediately see a live terminal dashboard tracking:
- System resources (CPU, RAM, GPU)
- Dataloader fetch time, training step time and training step GPU memory
- (Deep-Dive only) Per-layer memory and compute time
🌐 Web Dashboard
traceml run train.py --mode=dashboard
Opens http://localhost:8765 with interactive charts and real-time updates.
📓 Jupyter Notebooks
Please see the notebook example for inline visualizations.
Roadmap
-
Performance & Stability Improvements: Continuous reduction of tracing overhead, improved robustness for long-running training jobs, and better defaults for production-scale workloads.
-
Distributed Training Support: Support for multi-GPU training (DDP / FSDP) and, over time, multi-node distributed setups with clear failure and performance attribution.
-
Framework Integrations: Native integrations with popular training frameworks such as PyTorch Lightning and Hugging Face Accelerate.
-
Advanced Diagnostics: Memory leak detection, clearer attribution of performance regressions, and richer debugging signals for complex training runs.
-
Actionable Insights & Automation: Smarter summaries and recommendations to help users identify bottlenecks and optimize training configurations.
Contributing
We welcome contributions! Here's how to help:
- ⭐ Star the repo to show support
- 🐛 Report bugs via GitHub Issues
- 💡 Request features we should prioritize
- 🔧 Submit PRs for improvements
Community & Support
- 📧 Email: abhinav@traceopt.ai
- 🐙 LinkedIn: Abhinav Srivastav
- 📋 User Survey: Help shape the roadmap (2 minutes)
License
TraceML is released under the MIT License with Commons Clause.
What this means:
- ✅ Free for personal use
- ✅ Free for research and academic use
- ✅ Free for internal company use
- ❌ Not allowed for resale or SaaS products
For commercial licensing inquiries, contact abhinav@traceopt.ai.
See LICENSE for full details.
Citation
If TraceML helps your research, please cite:
@software{traceml2024,
author = {TraceOpt AI},
title = {TraceML: Real-time Training Observability for PyTorch},
year = {2024},
url = {https://github.com/traceopt-ai/traceml}
}
TraceML — Stop guessing. Start profiling.
Made with ❤️ by TraceOpt AI
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file traceml_ai-0.1.9.tar.gz.
File metadata
- Download URL: traceml_ai-0.1.9.tar.gz
- Upload date:
- Size: 70.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.23
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9d695859507010593c2a54b365045e2f54d7760d790220d08dff30cb459dfc21
|
|
| MD5 |
f1ebfe2e6e412afda4cafd94149b5e91
|
|
| BLAKE2b-256 |
28d39325578e66452476a734447b454a52078fcbfd24a4e6a6e54fc1b4b5174b
|
File details
Details for the file traceml_ai-0.1.9-py3-none-any.whl.
File metadata
- Download URL: traceml_ai-0.1.9-py3-none-any.whl
- Upload date:
- Size: 103.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.23
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d0af8b99d1f6fdd9bfb1d598029935757e91b9ea31512e9e6c8a131777b8ca45
|
|
| MD5 |
1a168e94aeeceed8a2d4816bbba6dad7
|
|
| BLAKE2b-256 |
bd213378922698270247330846d367426ca8324c0f3e373e82bde4577c66d452
|