TraceML: Lightweight runtime bottleneck diagnostics for PyTorch training.

These details have not been verified by PyPI

Project description

TraceML

Catch PyTorch training slowdowns early, while the job is still running.

Quickstart • Compare Runs • How to Read Output • FAQ • Use with W&B / MLflow • Issues

TraceML is an open-source tool for catching PyTorch training slowdowns early, so bad runs do not quietly waste costly compute.

It gives you lightweight step-level signals while the job is still running, so you can quickly tell whether the slowdown looks input-bound, compute-bound, wait-heavy, imbalanced across ranks, or memory-related.

Use TraceML when you want a fast answer before reaching for a heavyweight profiler.

⭐ If TraceML helps you, please consider starring the repo.

Upcoming rename: TraceML will transition to TraceOpt in a future release. For now, the active package remains traceml-ai and Python imports remain traceml. The future PyPI package name traceopt-ai is now in place as we prepare the migration.

The fastest way to try it

Install:

pip install traceml-ai

Initialize TraceML and wrap your training step:

import traceml

traceml.init()

for batch in dataloader:
    with traceml.trace_step(model):
        optimizer.zero_grad(set_to_none=True)
        outputs = model(batch["x"])
        loss = criterion(outputs, batch["y"])
        loss.backward()
        optimizer.step()

Run:

traceml run train.py

During training, TraceML opens a live terminal view alongside your logs.

TraceML terminal dashboard

At the end of the run, it prints a compact summary you can review or share.

TraceML summary

Start with traceml run train.py. Most users do not need watch or deep first.

For custom training loops, manual and selective instrumentation are available in the Quickstart.

Core workflows

1. Live diagnosis

Use the default workflow when you want live step-aware diagnosis during training plus the end-of-run summary.

traceml run train.py

2. Low-noise summary runs

Use summary mode when you mainly want the structured final summary for logging into W&B or MLflow.

traceml run train.py --mode=summary

Then call traceml.final_summary() near the end of your script.

TraceML also writes canonical summary artifacts for the run, including final_summary.json, which is the intended machine-readable output for downstream logging and later run comparison.

3. Compare two runs

If you have final_summary.json from two runs, compare them directly:

traceml compare run_a.json run_b.json

TraceML writes both a structured compare JSON and a compact text report.

See docs/user_guide/compare.md.

What TraceML helps you see

TraceML helps answer questions like:

Is the run input-bound, compute-bound, wait-heavy, or memory-constrained?
Are some distributed ranks slower than others?
Is memory usage drifting upward over time?
Where is time showing up across dataloader, forward, backward, and optimizer phases?

It is designed to help you decide quickly whether a run looks healthy or whether it is worth digging deeper.

Overhead

TraceML adds fixed per-step instrumentation overhead, so the relative cost is highest when training steps are very short. In larger or distributed workloads, that fixed cost is amortized over a longer end-to-end step. In our early DDP benchmarks, TraceML did not produce a measurable slowdown beyond normal run-to-run variation.

When to use TraceML

Use TraceML when training feels:

slower than expected
unstable from step to step
imbalanced across distributed ranks
fine in dashboards but still underperforming

Start with TraceML when you need a fast answer in the terminal. Reach for torch.profiler once you know where to dig deeper.

How it fits with your stack

TraceML is designed to work alongside tools like W&B, MLflow, and TensorBoard, not replace them.

Use experiment trackers for dashboards, artifacts, and team reporting. Use TraceML for live bottleneck diagnosis, structured final summaries, and simple run-to-run comparison from saved TraceML summary JSON files.

See Use TraceML with W&B / MLflow.

Current support

Works today:

single GPU
single-node DDP/FSDP

Next:

multi-node training support

Learn more

Need a lighter zero-code first look or a deeper follow-up run? See the Quickstart and FAQ for watch and deep.

Feedback

If TraceML helped you catch a slowdown, please open an issue and include:

hardware / CUDA / PyTorch versions
single GPU or multi-GPU
whether you used run, watch, or deep
the end-of-run summary
a minimal repro if possible

GitHub issues: https://github.com/traceopt-ai/traceml/issues

Email: support@traceopt.ai

Contributing

Contributions are welcome, especially:

reproducible slowdown cases
bug reports
docs improvements
integrations
examples

License

Apache 2.0. See LICENSE.

TraceOpt is a trademark of OptAI UG (haftungsbeschränkt).

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.3.0

May 26, 2026

0.2.15

May 19, 2026

This version

0.2.14

May 7, 2026

0.2.13

Apr 30, 2026

0.2.12

Apr 27, 2026

0.2.11

Apr 23, 2026

0.2.10

Apr 22, 2026

0.2.9

Apr 17, 2026

0.2.8

Apr 13, 2026

0.2.7

Apr 7, 2026

0.2.6

Apr 4, 2026

0.2.5

Mar 20, 2026

0.2.4

Mar 15, 2026

0.2.3

Mar 7, 2026

0.2.2

Feb 28, 2026

0.2.1

Feb 26, 2026

0.2.0

Feb 9, 2026

0.2.0a0 pre-release

Jan 27, 2026

0.1.9

Jan 3, 2026

0.1.8

Dec 25, 2025

0.1.6

Dec 11, 2025

0.1.5

Dec 10, 2025

0.1.3

Oct 8, 2025

0.1.1

Oct 2, 2025

0.1.0

Oct 2, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

traceml_ai-0.2.14.tar.gz (268.4 kB view details)

Uploaded May 7, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

traceml_ai-0.2.14-py3-none-any.whl (388.1 kB view details)

Uploaded May 7, 2026 Python 3

File details

Details for the file traceml_ai-0.2.14.tar.gz.

File metadata

Download URL: traceml_ai-0.2.14.tar.gz
Upload date: May 7, 2026
Size: 268.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for traceml_ai-0.2.14.tar.gz
Algorithm	Hash digest
SHA256	`32031208b2c02f876966e2850cac1a359a32bcb1ecc04c1658f7d991f1a7739d`
MD5	`359e743a8465676ea509c17e97f572c6`
BLAKE2b-256	`87788e81780f40daf114a49da11190b9d86e7ddaa6224c11140a134ca42c1568`

See more details on using hashes here.

File details

Details for the file traceml_ai-0.2.14-py3-none-any.whl.

File metadata

Download URL: traceml_ai-0.2.14-py3-none-any.whl
Upload date: May 7, 2026
Size: 388.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for traceml_ai-0.2.14-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c32b7899dd602623d9e1ccd02f9927ec685d6e722dc36e5b21fade81dc4f191c`
MD5	`c85997916671d472a27a9e8ab7b1cb06`
BLAKE2b-256	`d1d6f2ffbb8817bacd4dc087414d1f6d839b100628fa3604a93f7b4bda1cdf8a`

See more details on using hashes here.

traceml-ai 0.2.14

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

TraceML

The fastest way to try it

Core workflows

1. Live diagnosis

2. Low-noise summary runs

3. Compare two runs

What TraceML helps you see

Overhead

When to use TraceML

How it fits with your stack

Current support

Learn more

Feedback

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes