TraceML: Lightweight ML Profiler

Project description

TraceML

_{If you find it useful, consider giving it a ⭐ on GitHub — it helps others discover the project!}

A lightweight, always-on profiler for PyTorch that makes memory, timing, and system usage visible in real time via:

Terminal dashboards
Jupyter notebooks
A lightweight local web dashboard/server
JSON logging for offline analysis

Minimal configuration. Minimal overhead. Plug-and-trace.

📊 Quick User Survey (2 min)

Using TraceML? Help shape the roadmap: https://forms.gle/vaDQao8L81oAoAkv9

🚨 The Problem

Training deep learning models often feels like debugging a black box:

CUDA OOM errors appear without warning
Step times are slow with no visibility
Existing profilers are heavy, complicated, or lack activation/gradient memory details

TraceML provides continuous, lightweight observability without slowing down training.

💡 Why TraceML?

TraceML is designed to stay lightweight, always-on, and practical:

Module-level memory tracking (params, activations, gradients)
Step timing (forward, backward, optimizer, dataloader)
Terminal + Notebook + Local Web Dashboard (port 8765)
Minimal overhead (sampling-based — NOT full graph tracing)

A tool you can safely keep on in every training loop.

⭐ Quick Start

1. Installation

pip install .

Developer mode:

pip install '.[dev]'

🔧 2. Model Registration (Required)

TraceML needs to attach hooks to your model. Two ways:

A. Decorator (recommended)

from traceml.decorators import trace_model
import torch.nn as nn

@trace_model()
class TinyNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc = nn.Linear(100, 10)

    def forward(self, x):
        return self.fc(x)

B. Register a model instance

from traceml.decorators import trace_model_instance
import torch.nn as nn

model = nn.Sequential(
    nn.Linear(100, 50),
    nn.ReLU(),
    nn.Linear(50, 10)
)

trace_model_instance(model)

This is all you need to enable memory + timing tracing across all workflows.

🚀 3. Running TraceML

You can run TraceML in three modes:

✅ A. CLI Mode (Terminal Dashboard — default)

traceml run your_script.py

This launches a live terminal dashboard showing:

System metrics (CPU, RAM, GPU)
Layer memory
Activation + gradient memory
Step timings

TraceML CLI Live

✅ B. Dashboard Mode (Local Web UI)

Run your training script with:

traceml run your_script.py --mode=dashboard

Opens a live dashboard at:

http://localhost:8765

Includes:

Real-time charts
Per-layer memory
Peaks and summaries

TraceML Dashboard Live

✅ C. Notebook Mode

from traceml.decorators import trace_model_instance
from traceml.manager.tracker_manager import TrackerManager

trace_model_instance(model)

tracker = TrackerManager(interval_sec=1.0, mode="notebook")
tracker.start()

train(model)

tracker.stop()
tracker.log_summaries()

Notebook UI updates automatically.

⏱ Step Timing Example

from traceml.decorators import trace_timestep

@trace_timestep("forward", use_gpu=True)
def forward_pass(model, batch):
    return model(**batch)

@trace_timestep("backward", use_gpu=True)
def backward_pass(loss, scaler):
    scaler.scale(loss).backward()

Timings automatically appear in CLI, dashboard, and notebook summaries.

📤 Exporting Logs as JSON

Enable JSON logging:

traceml run your_script.py --enable-logging

Logs are stored in:

./logs/

Useful for plotting, analytics, or offline dashboards.

📊 How TraceML Works (Lightweight Samplers)

TraceML uses asynchronous samplers (NOT full tracing):

SystemSampler — CPU, RAM, GPU
LayerMemorySampler — Params
ActivationMemorySampler — Forward activations
GradientMemorySampler — Backward gradients
StepTimeSampler — Forward/backward/optimizer timings

This keeps overhead extremely low.

📦 Current Features

Live system usage (CPU, RAM, GPU)
Per-layer memory tracking
Activation & gradient memory
Step timing
Terminal UI
Notebook display
Local web dashboard
JSON logging

🛠 Coming Soon

Multi-node distributed tracing
PyTorch Lightning / Accelerate integration

🤝 Contribute

⭐ the repo to support development
Open issues for improvements or bugs
Contributions welcome

📧 Contact: abhinavsriva@gmail.com

🧾 License

TraceML uses MIT License + Commons Clause:

Free for personal, research, academic, and internal use
Not allowed for resale, SaaS, or commercial redistribution

For commercial licensing, contact abhinavsriva@gmail.com.

TraceML — Lightweight, real-time visibility for PyTorch training.

Project details

Release history Release notifications | RSS feed

0.3.0

May 26, 2026

0.2.15

May 19, 2026

0.2.14

May 7, 2026

0.2.13

Apr 30, 2026

0.2.12

Apr 27, 2026

0.2.11

Apr 23, 2026

0.2.10

Apr 22, 2026

0.2.9

Apr 17, 2026

0.2.8

Apr 13, 2026

0.2.7

Apr 7, 2026

0.2.6

Apr 4, 2026

0.2.5

Mar 20, 2026

0.2.4

Mar 15, 2026

0.2.3

Mar 7, 2026

0.2.2

Feb 28, 2026

0.2.1

Feb 26, 2026

0.2.0

Feb 9, 2026

0.2.0a0 pre-release

Jan 27, 2026

0.1.9

Jan 3, 2026

0.1.8

Dec 25, 2025

0.1.6

Dec 11, 2025

This version

0.1.5

Dec 10, 2025

0.1.3

Oct 8, 2025

0.1.1

Oct 2, 2025

0.1.0

Oct 2, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

traceml_ai-0.1.5.tar.gz (52.6 kB view details)

Uploaded Dec 10, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

traceml_ai-0.1.5-py3-none-any.whl (75.3 kB view details)

Uploaded Dec 10, 2025 Python 3

File details

Details for the file traceml_ai-0.1.5.tar.gz.

File metadata

Download URL: traceml_ai-0.1.5.tar.gz
Upload date: Dec 10, 2025
Size: 52.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.23

File hashes

Hashes for traceml_ai-0.1.5.tar.gz
Algorithm	Hash digest
SHA256	`13a2a572f565e99958a351ffd117f811058bf8729f5ab1c6be2a9a5bc8d5da07`
MD5	`6e3b2cead14fec375cf07650bfbd0c2e`
BLAKE2b-256	`e86e1842a1ed9ccc158cc200aafdad281c033c329eb85e871564d6c071172be6`

See more details on using hashes here.

File details

Details for the file traceml_ai-0.1.5-py3-none-any.whl.

File metadata

Download URL: traceml_ai-0.1.5-py3-none-any.whl
Upload date: Dec 10, 2025
Size: 75.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.23

File hashes

Hashes for traceml_ai-0.1.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6f04c5bd6d40e13e8f60d9bc932b34c6cad65cc83832bbd57693a0ca5bee62a8`
MD5	`010de6d57057f3d1b6ec44e57b1f772a`
BLAKE2b-256	`0fd5016603981656091807126a77a6ba2077f0f0ef6c7c931ec95ebb00a38a11`

See more details on using hashes here.

traceml-ai 0.1.5

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

TraceML

📊 Quick User Survey (2 min)

🚨 The Problem

💡 Why TraceML?

⭐ Quick Start

1. Installation

🔧 2. Model Registration (Required)

A. Decorator (recommended)

B. Register a model instance

🚀 3. Running TraceML

✅ A. CLI Mode (Terminal Dashboard — default)

✅ B. Dashboard Mode (Local Web UI)

✅ C. Notebook Mode

⏱ Step Timing Example

📤 Exporting Logs as JSON

📊 How TraceML Works (Lightweight Samplers)

📦 Current Features

🛠 Coming Soon

🤝 Contribute

🧾 License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes