TraceML: Lightweight ML Profiler

Project description

TraceML

_{If you find useful, consider giving it a ⭐ on GitHub — it helps others discover the project!}

A lightweight library to make PyTorch training memory and timing visible in real time (in CLI and Notebook).

The Problem

Training large machine learning models often feels like a black box. One minute everything's running and the next, you're staring at a cryptic "CUDA out of memory" error or wondering why a single step is so slow.

Pinpointing which part of the model is consuming too much memory or slowing things down is frustrating and time-consuming. Traditional profiling tools can be overly complex or lack the granularity deep learning developers need.

💡 Why TraceML?

traceml is designed to give you real-time, granular observability for both memory usage and timing without heavy overhead. It works both in the terminal (CLI) and inside Jupyter notebooks, so you can pick the workflow that fits you best:

✅ System + process-level usage (CPU, RAM, GPU)

✅ PyTorch layer-level memory allocation (parameters, activations, gradients)

✅ Step-level timing (forward, backward, optimizer, etc.)

✅ Lightweight — minimal overhead

No config, no setup, just plug-and-trace.

📦 Installation

pip install .

For developer mode:

pip install '.[dev]'

🚀 Usage

Registering your model for tracing

To capture memory usage, you first need to register your model with TraceML. There are two simple ways:

1. With a class decorator (recommended)

import torch.nn as nn
from traceml.decorator import trace_model

@trace_model()
class TinyNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc = nn.Linear(100, 10)

    def forward(self, x):
        return self.fc(x)

✅ Any instance of TinyNet will now be automatically traced.

2. With an explicit model instance

import torch.nn as nn
from traceml.decorator import trace_model_instance

model = nn.Sequential(
    nn.Linear(100, 50),
    nn.ReLU(),
    nn.Linear(50, 10)
).to("cuda")

# Attach hooks so TraceML can see memory events
trace_model_instance(model)

✅ Best when you build models dynamically or don't want to decorate the class.

Then, choose whichever fits your workflow.

📓 Notebook

Run TraceML directly in Jupyter/Colab:

from traceml.decorator import trace_model_instance
from traceml.manager.tracker_manager import TrackerManager

# Attach TraceML hooks
trace_model_instance(model)

# Start live tracker
tracker = TrackerManager(interval_sec=1.0, mode="notebook")
tracker.start()

# 🔄 Train as usual
train_model(model, train_loader, val_loader, optimizer, scheduler, scaler, device, dtype)

# Stop and show summaries
tracker.stop()
tracker.log_summaries()

Step Timing and Performance Tracing

TraceML now supports fine-grained step timing, letting you measure CPU/GPU latency for every major operation including data loading, device transfer, forward pass, backward pass, and optimizer steps. Simply decorate any function with @trace_timestep:

from traceml.decorator import trace_timestep

@trace_timestep("forward", use_gpu=True)
def forward_pass(model, batch, dtype):
    with torch.cuda.amp.autocast():
        return model(**batch)

@trace_timestep("backward", use_gpu=True)
def backward_pass(loss, scaler):
    scaler.scale(loss).backward()

@trace_timestep("optimizer_step", use_gpu=True)
def optimizer_step(scaler, optimizer, scheduler):
    scaler.step(optimizer)
    scaler.update()
    scheduler.step()

Top timing data appears automatically in your live dashboard and notebook summary. 🟢 Works seamlessly with your activation + gradient dashboards — all visible together in real-time.

Terminal/CLI

Wrap your training script to see live dashboards in your terminal:

traceml run <your_training_script.py>

Examples

# Trace an explicitly defined model instance
traceml run src/examples/tracing_with_model_instance

# Trace a model using a class decorator (recommended)
traceml run src/examples/tracing_with_class_decorator

TraceML Live Dashboard

📓 Notebook Example

You can also run TraceML inside Jupyter/Colab. See the full example notebook for a working demo.

Notebook output will refresh live per interval, similar to the terminal dashboard.

🔎 How the Samplers Work

TraceML introduces samplers that collect memory usage at intervals, not layer-by-layer traces only:

SystemSampler → CPU, RAM, GPU usage sampled at a fixed frequency.
LayerMemorySampler → Parameter allocation (per module, not per parameter).
ActivationMemorySampler → Tracks per-layer forward activations. Maintains current and global peak values, and estimates total activation memory for a forward pass.
GradientMemorySampler → Tracks per-layer backward gradients. Maintains current and global peak values, and estimates total gradient memory during backpropagation.
StepTimeSampler -> CPU/GPU event durations (forward, backward, optimizer, etc.)

Because TraceML samples asynchronously, it stays lightweight while providing practical observability.

📊 Current Features

Live CPU, RAM, and GPU usage (System + Current Process)
PyTorch module-level memory tracking
Live activation memory tracking (per layer, plus totals)
Live gradient memory tracking (per layer, plus totals)
Real-time terminal dashboards via Rich
Notebook support
Step & operation timers (forward, backward, optimizer)

Coming Soon

Export logs as JSON / CSV
More visual dashboards

🙌 Contribute & Feedback

TraceML is early-stage and evolving quickly. Contributions, feedback, and ideas are welcome!

Found it useful? Please ⭐ the repo to support development.
Issues / feature requests → open a GitHub issue.
Want to contribute? See CONTRIBUTING.md (coming soon).

📧 Contact: traceml.ai@gmail.com

TraceML - Making PyTorch memory usage visible, one trace at a time.

Project details

Release history Release notifications | RSS feed

0.3.0

May 26, 2026

0.2.15

May 19, 2026

0.2.14

May 7, 2026

0.2.13

Apr 30, 2026

0.2.12

Apr 27, 2026

0.2.11

Apr 23, 2026

0.2.10

Apr 22, 2026

0.2.9

Apr 17, 2026

0.2.8

Apr 13, 2026

0.2.7

Apr 7, 2026

0.2.6

Apr 4, 2026

0.2.5

Mar 20, 2026

0.2.4

Mar 15, 2026

0.2.3

Mar 7, 2026

0.2.2

Feb 28, 2026

0.2.1

Feb 26, 2026

0.2.0

Feb 9, 2026

0.2.0a0 pre-release

Jan 27, 2026

0.1.9

Jan 3, 2026

0.1.8

Dec 25, 2025

0.1.6

Dec 11, 2025

0.1.5

Dec 10, 2025

This version

0.1.3

Oct 8, 2025

0.1.1

Oct 2, 2025

0.1.0

Oct 2, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

traceml_ai-0.1.3.tar.gz (42.9 kB view details)

Uploaded Oct 8, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

traceml_ai-0.1.3-py3-none-any.whl (58.4 kB view details)

Uploaded Oct 8, 2025 Python 3

File details

Details for the file traceml_ai-0.1.3.tar.gz.

File metadata

Download URL: traceml_ai-0.1.3.tar.gz
Upload date: Oct 8, 2025
Size: 42.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.23

File hashes

Hashes for traceml_ai-0.1.3.tar.gz
Algorithm	Hash digest
SHA256	`f037db897a2e15c2cfdd47010b222daf9e4834008323d36ecf197652490d2ad7`
MD5	`d3948098a6f6c493fef4b62dd3134b0f`
BLAKE2b-256	`849b32e4e765c10ec0777ed066c7f6a924e989da3c2b807095807da4eddccdcb`

See more details on using hashes here.

File details

Details for the file traceml_ai-0.1.3-py3-none-any.whl.

File metadata

Download URL: traceml_ai-0.1.3-py3-none-any.whl
Upload date: Oct 8, 2025
Size: 58.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.23

File hashes

Hashes for traceml_ai-0.1.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7e7a700c251d142dd295df47f5726d39a15057f1b73e1fd3fa48c4231ad72812`
MD5	`d1a6d7ee3b146a0e53699d2eea18c8cd`
BLAKE2b-256	`715feae824fd0ce228a817d1f8b72cf63adcbed664fff751c6154bf9a2c33556`

See more details on using hashes here.

traceml-ai 0.1.3

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

TraceML

The Problem

💡 Why TraceML?

📦 Installation

🚀 Usage

Registering your model for tracing

1. With a class decorator (recommended)

2. With an explicit model instance

📓 Notebook

Step Timing and Performance Tracing

Terminal/CLI

Examples

📓 Notebook Example

🔎 How the Samplers Work

📊 Current Features

Coming Soon

🙌 Contribute & Feedback

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes