Skip to main content

General compute framework for Tenstorrent devices

Project description

tt-metal CI Ask DeepWiki

Install | Buy Hardware | Bounty $ | Join Us | Discord

ttnn logo

TT-NN is a Python & C++ Neural Network OP library.

API Reference | Model Demos

Latest Releases

Release Release Date
0.62.0 ETA Aug 13, 2025
0.61.0 Skipped
0.60.1 Jul 22, 2025
0.59.0 Jun 18, 2025
0.58.0 May 13, 2025
0.57.0 Apr 15, 2025
0.56.0 Mar 7, 2025

LLMs

Model Batch Hardware ttft (ms) t/s/u Target
t/s/u
t/s TT-Metalium Release vLLM Tenstorrent Repo Release
Qwen 3 32B (TP=8) 32 QuietBox (Wormhole) 109 22.1 30 707.2 v0.59.0-rc52 f028da1
QwQ 32B (TP=8) 32 QuietBox (Wormhole) 133 25.2 30 806.4 v0.56.0-rc51 e2e0002
DeepSeek R1 Distill Llama 3.3 70B (TP=8) 32 QuietBox (Wormhole) 159 15.9 20 508.8 v0.59.0-rc53 f028da1
Llama 3.1 70B (TP=32) 32 Galaxy 68 66.7 80 2134.4 v0.60.0-rc20 5cbc982
Llama 3.1 70B (TP=8) 32 QuietBox (Wormhole) 159 15.9 20 508.8 v0.59.0-rc53 f028da1
Llama 3.1 70B (TP=4) 32 QuietBox (Blackhole) 195* 14.9* 476.5* v0.59.0-rc53 f028da1
Llama 3.2 11B Vision (TP=2) 16 n300 2550 15.8 17 252.8 v0.56.0-rc6 e2e0002
Qwen 2.5 7B (TP=2) 32 n300 126 32.5 38 1040.0 v0.56.0-rc33 e2e0002
Qwen 2.5 72B (TP=8) 32 QuietBox (Wormhole) 319 14.6 20 467.2 v0.59.0-rc52 f028da1
Falcon 7B 32 n150 70 18.5 26 592.0 v0.60.0-rc20
Falcon 7B (DP=8) 256 QuietBox (Wormhole) 87 15.9 26 4070.4 v0.60.0-rc20
Falcon 7B (DP=32) 1024 Galaxy 121 13.2 26 13516.8 v0.60.0-rc20
Falcon 40B (TP=8) 32 QuietBox (Wormhole) 11.9 36 380.8 v0.59.0-rc38
Llama 3.1 8B 32 p100 87* 26.5* 848.0* v0.59.0-rc3 739dcaa
Llama 3.1 8B 32 p150 69* 29.1* 931.2* v0.59.0-rc3 739dcaa
Llama 3.1 8B (DP=2) 64 2 x p150 64* 18.6* 1190.4* v0.59.0-rc3 739dcaa
Llama 3.1 8B 32 n150 104 24.8 23 793.6 v0.59.0-rc52 f028da1
Llama 3.2 1B 32 n150 23 72.6 160 2323.2 v0.59.0-rc52 f028da1
Llama 3.2 3B 32 n150 53 43.5 60 1392.0 v0.59.0-rc52 f028da1
Mamba 2.8B 32 n150 35 14.1 41 451.2 v0.59.0-rc38
Mistral 7B 32 n150 101 28.3 23 905.6 v0.59.0-rc52 f028da1
Mixtral 8x7B (TP=8) 32 QuietBox (Wormhole) 207 16.6 33 531.2 v0.59.0-rc53

Last Update: July 21, 2025

Notes:

  • ttft = time to first token | t/s/u = tokens/second/user | t/s = tokens/second; where t/s = t/s/u * batch.
  • TP = Tensor Parallel, DP = Data Parallel; Defines parallelization factors across multiple devices.
  • The reported LLM performance is for an input sequence length (number of rows filled in the KV cache) of 128 for all models except Mamba (which can accept any sequence length).
  • The t/s/u reported is the throughput of the first token generated after prefill, i.e. 1 / inter token latency.
  • Performance numbers were collected using the tt-metal model demos (accessible via the model links). If running with a vLLM inference server, performance may be different.
  • * Blackhole software optimization is under active development. Please join us in shaping the future of open source AI!
    [Discord] [Developer Hub]
  • For more information regarding vLLM installation and environment creation visit the Tenstorrent vLLM repository.

Speech-to-Text

Model Batch Hardware ttft (ms) t/s/u Target t/s/u t/s TT-Metalium Release
Whisper (distil-large-v3) 1 n150 232 58.1 45 58.1 v0.59.0-rc52

Diffusion Models

Model Batch Hardware Sec/Image Target Sec/Image Release
Stable Diffusion 1.4 (512x512) 1 n150 6.25 3
Stable Diffusion 3.5 Medium (512x512) 1 n150 16 10

Notes:

  • Stable Diffusion sec/image is based on the time elapsed from submitting the input prompt to receiving the image from the VAE decoder.

CNNs and Vision Transformers

Classification models

Model Batch Hardware Image/sec Target Image/sec Release
ResNet-50 (224x224) 16 n150 4,700 7,000 v0.59.0
ResNet-50 (224x224) (DP=2) 32 n300 9,200 14,000 v0.59.0
ResNet-50 (224x224) (DP=8) 128 QuietBox (Wormhole) 35,800 56,000 v0.59.0
ResNet-50 (224x224) (DP=32) 512 Galaxy 96,800 224,000 v0.59.0
ViT-base (224x224) 8 n150 1,370 1,600 v0.60.0-rc4
ViT-base (224x224) (DP=2) 16 n300 1,900 3,200 v0.60.0-rc4
ViT-base (224x224) (DP=8) 64 QuietBox (Wormhole) 7,700 12,800 v0.60.0-rc4
MobileNet-v2 (224x224) 10 n150 2,808 3,500

Object Detection

Model Batch Hardware Frame/sec (FPS) Target FPS Release
YOLOv4 (320x320) 1 n150 120 320
YOLOv4 (640x640) 1 n150 50 180
YOLOv8x (640x640) 1 n150 45 100
YOLOv8s (640x640) 1 n150 175 320
YOLOv8s_world (640x640) 1 n150 57 200
YOLOv9c (640x640) 1 n150 55 320
YOLOv10x (640x640) 1 n150 26 200

Segmentation

Model Batch Hardware Frame/sec (FPS) Target FPS Release
UNet - VGG19 (256x256) 1 n150 77 150
SegFormer Semantic Segmentation (512x512) 1 n150 84 300
YOLOv9c (640x640) 1 n150 40 240
UFLD - v2 (320x800) 1 n150 255 2000

NLPs

Model Batch Hardware Sentence/sec Target sentence/sec Release
BERT-Large 8 n150 270 400
Sentence-Bert (backbone: bert-base) 8 n150 403 550
Sentence-Bert (backbone: bert-base) 64 QuietBox 2961 4400

Model Updates

For the latest model updates and features, please see MODEL_UPDATES.md

Model Bring-Up and Testing

For information on initial model procedures, please see Model Bring-Up and Testing

TT-NN Tech Reports

Benchmarks


TT-Metalium logo

TT-Metalium is our low-level programming model, enabling kernel development for Tenstorrent hardware.

Programming Guide | API Reference

Getting started

Get started with simple kernels.

TT-Metalium Tech Reports

TT-Metalium Programming Examples

Hello World

Add Integers

Simple Tensor Manipulation

DRAM Data Movement

Eltwise

Matmul

Tools and Instruments

TT_NN Visualizer

A comprehensive tool for visualizing and analyzing model execution, offering interactive graphs, memory plots, tensor details, buffer overviews, operation flow graphs, and multi-instance support with file or SSH-based report loading. Install via pip or build from source:

pip install ttnn-visualizer

Tenstorrent Bounty Program Terms and Conditions

This repo is a part of Tenstorrent’s bounty program. If you are interested in helping to improve tt-metal, please make sure to read the Tenstorrent Bounty Program Terms and Conditions before heading to the issues tab. Look for the issues that are tagged with both “bounty” and difficulty level!

License

TT-Metalium and TTNN are licensed under the Apache 2.0 License, as detailed in LICENSE and LICENSE_understanding.txt.

Some distributable forms of this project—such as manylinux-compliant wheels—may need to bundle additional libraries beyond the standard Linux system libraries. For example:

  • libnuma
  • libhwloc
  • openmpi (when built with multihost support)
  • libevent (when built with multihost support)

These libraries are bound by their own license terms.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ttnn-0.62.0rc29-cp310-cp310-manylinux_2_34_x86_64.whl (26.4 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.34+ x86-64

File details

Details for the file ttnn-0.62.0rc29-cp310-cp310-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for ttnn-0.62.0rc29-cp310-cp310-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 0c08851d677bf40ed8bcc3057bf303b44fa641f617016167d15ce4a6f90cbc70
MD5 1908b2b81f52081bf5e637dc73158047
BLAKE2b-256 fcacee2f65ce62dd9565d02e3082f489c82925388681de87ead7058474160d89

See more details on using hashes here.

Provenance

The following attestation bundles were made for ttnn-0.62.0rc29-cp310-cp310-manylinux_2_34_x86_64.whl:

Publisher: package-and-release.yaml on tenstorrent/tt-metal

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page