This tool analyzes performance traces from TT-Metal operations, providing insights into throughput, bottlenecks, and optimization opportunities.

These details have been verified by PyPI

Project links

Repository

Owner

Tenstorrent

GitHub Statistics

Project description

Performance Report Analysis Tool

Example perf report

This tool analyzes performance traces from Metal operations, providing insights into throughput, bottlenecks, and optimization opportunities.

Installation

This tool can be installed from PyPI:

pipx install tt-perf-report

Installing with pipx will automatically create a virtual environment and make the tt-perf-report command available.

Generating Performance Traces

Build Metal with performance tracing (enabled in default build):

./build_metal

Run your test in TT-Metal with the tracy module to capture traces:

python -m tracy -r -p -v -m pytest path/to/test.py

This generates a CSV file containing operation timing data.

Using Tracy Signposts

Tracy signposts mark specific sections of code for analysis. Add signposts to your Python code:

import tracy

# Mark different sections of your code
tracy.signpost("Compilation pass")
model(input_data)

tracy.signpost("Performance pass")
for _ in range(10):
    model(input_data)

The tool uses the last signpost by default, which is typically the most relevant section for a performance test(e.g., the final iteration after compilation / warmup).

Common signpost usage:

--start-signpost NAME: Analyze ops after the specified signpost
--end-signpost NAME: Analyze ops before the specified signpost
--ignore-signposts: Analyze the entire trace

Filtering Operations

The output of the performance report is a table of operations. Each operation is assigned a unique ID starting from 1. You can re-run the tool with different IDs to focus on specific sections of the trace.

Use --id-range to analyze specific sections:

# Analyze ops 5 through 10
tt-perf-report trace.csv --id-range 5-10

# Analyze from op 31 onwards
tt-perf-report trace.csv --id-range 31-

# Analyze up to op 12
tt-perf-report trace.csv --id-range -12

This is particularly useful for:

Isolating decode pass in prefill+decode LLM inference
Analyzing single transformer layers without embeddings/projections
Focusing on specific model components

Output Options

--min-percentage value: Hide ops below specified % of total time (default: 0.5)
--color/--no-color: Force colored/plain output
--csv FILENAME: Output the table to CSV format for further analysis or inclusion into automated reporting pipelines
--no-advice: Show only performance table, skip optimization advice

Understanding the Performance Report

The performance report provides several key metrics for analyzing operation performance:

Core Metrics

Device Time: Time spent executing the operation on device (in microseconds)
Op-to-op Gap: Time between operations, including host overhead and kernel dispatch (in microseconds)
Total %: Percentage of total execution time spent on this operation
Cores: Number of cores used by the operation (max 64 on Wormhole)

Performance Metrics

DRAM: Memory bandwidth achieved (in GB/s)
DRAM %: Percentage of theoretical peak DRAM bandwidth (288 GB/s on Wormhole)
FLOPs: Compute throughput achieved (in TFLOPs)
FLOPs %: Percentage of theoretical peak compute for the given math fidelity
Bound: Performance classification of the operation:
- DRAM: Memory bandwidth bound (>65% of peak DRAM)
- FLOP: Compute bound (>65% of peak FLOPs)
- BOTH: Both memory and compute bound
- SLOW: Neither memory nor compute bound
- HOST: Operation running on host CPU

Additional Fields

Math Fidelity: Precision configuration used for matrix operations:
- HiFi4: Highest precision (74 TFLOPs/core)
- HiFi2: Medium precision (148 TFLOPs/core)
- LoFi: Lowest precision (262 TFLOPs/core)

The tool automatically highlights potential optimization opportunities:

Red op-to-op times indicate high host or kernel launch overhead (>6.5μs)
Red core counts indicate underutilization (<10 cores)
Green metrics indicate good utilization of available resources
Yellow metrics indicate room for optimization

Examples

Note:
trace.csv in the examples below refers to your input CSV file (the performance trace you want to analyze).

Typical use:

tt-perf-report trace.csv

Merge traces captured on multiple machines from the same workload run:

tt-perf-report trace_host0.csv trace_host1.csv trace_host2.csv

Build a table of all ops with no advice:

tt-perf-report trace.csv --no-advice

View ops 100-200 with advice:

tt-perf-report trace.csv --id-range 100-200

Export the table of ops and columns as a CSV file:

tt-perf-report trace.csv --csv my_report.csv

Project details

These details have been verified by PyPI

Project links

Repository

Owner

Tenstorrent

GitHub Statistics

Release history Release notifications | RSS feed

1.2.4

Apr 27, 2026

1.2.3

Apr 9, 2026

1.2.2

Mar 18, 2026

1.2.1

Jan 23, 2026

1.1.14

Dec 9, 2025

1.1.13

Dec 9, 2025

This version

1.1.12

Dec 4, 2025

1.1.11

Dec 3, 2025

1.1.10

Dec 2, 2025

1.1.9

Nov 17, 2025

1.1.8

Oct 13, 2025

1.1.7

Oct 9, 2025

1.1.6

Oct 6, 2025

1.1.5

Oct 1, 2025

1.1.4

Sep 25, 2025

1.1.3

Sep 19, 2025

1.1.2

Sep 17, 2025

1.1.1

Sep 4, 2025

1.1.0

Aug 19, 2025

1.0.7

Jul 3, 2025

1.0.6

Mar 4, 2025

1.0.5

Mar 3, 2025

1.0.4

Feb 28, 2025

1.0.3

Feb 18, 2025

1.0.1

Feb 18, 2025

1.0.0

Feb 7, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tt_perf_report-1.1.12.tar.gz (28.1 kB view details)

Uploaded Dec 4, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tt_perf_report-1.1.12-py3-none-any.whl (26.1 kB view details)

Uploaded Dec 4, 2025 Python 3

File details

Details for the file tt_perf_report-1.1.12.tar.gz.

File metadata

Download URL: tt_perf_report-1.1.12.tar.gz
Upload date: Dec 4, 2025
Size: 28.1 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tt_perf_report-1.1.12.tar.gz
Algorithm	Hash digest
SHA256	`acbfad265999fec78e582e1a4db6de7e07247d1058fad8ec2b9f693e00f10d35`
MD5	`20efb0626341c5916cff84b23d04488f`
BLAKE2b-256	`c2d216b938f28fe6a77baba7475cf3823cd098fd665e84c81ca6e131e149bc0b`

See more details on using hashes here.

Provenance

The following attestation bundles were made for tt_perf_report-1.1.12.tar.gz:

Publisher: build-pypi.yml on tenstorrent/tt-perf-report

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: tt_perf_report-1.1.12.tar.gz
- Subject digest: acbfad265999fec78e582e1a4db6de7e07247d1058fad8ec2b9f693e00f10d35
- Sigstore transparency entry: 741097800
- Sigstore integration time: Dec 4, 2025
Source repository:
- Permalink: tenstorrent/tt-perf-report@e538064f4cd72610c2d70d66a0805f0da07f0ba6
- Branch / Tag: refs/tags/v1.1.12
- Owner: https://github.com/tenstorrent
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: build-pypi.yml@e538064f4cd72610c2d70d66a0805f0da07f0ba6
- Trigger Event: push

File details

Details for the file tt_perf_report-1.1.12-py3-none-any.whl.

File metadata

Download URL: tt_perf_report-1.1.12-py3-none-any.whl
Upload date: Dec 4, 2025
Size: 26.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tt_perf_report-1.1.12-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a99d6087ccae02bf3f8f1ba73929cc0b607104ad6d7b2fcc3b2ff883ed87672d`
MD5	`64909814382dd3506a1d70551402a607`
BLAKE2b-256	`9863eaaeb3e01b761717421df6df0445ce00b405a0676d476c2a830e1be9ca23`

See more details on using hashes here.

Provenance

The following attestation bundles were made for tt_perf_report-1.1.12-py3-none-any.whl:

Publisher: build-pypi.yml on tenstorrent/tt-perf-report

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: tt_perf_report-1.1.12-py3-none-any.whl
- Subject digest: a99d6087ccae02bf3f8f1ba73929cc0b607104ad6d7b2fcc3b2ff883ed87672d
- Sigstore transparency entry: 741097814
- Sigstore integration time: Dec 4, 2025
Source repository:
- Permalink: tenstorrent/tt-perf-report@e538064f4cd72610c2d70d66a0805f0da07f0ba6
- Branch / Tag: refs/tags/v1.1.12
- Owner: https://github.com/tenstorrent
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: build-pypi.yml@e538064f4cd72610c2d70d66a0805f0da07f0ba6
- Trigger Event: push

tt-perf-report 1.1.12

Navigation

Verified details

Project links

Owner

GitHub Statistics

Unverified details

Meta

Project description

Performance Report Analysis Tool

Installation

Generating Performance Traces

Using Tracy Signposts

Filtering Operations

Output Options

Understanding the Performance Report

Core Metrics

Performance Metrics

Additional Fields

Examples

Project details

Verified details

Project links

Owner

GitHub Statistics

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance