Lightning Thunder is a source-to-source compiler for PyTorch, enabling PyTorch programs to run on different hardware accelerators and graph compilers.

These details have not been verified by PyPI

Project links

Project description

Give your PyTorch models superpowers ⚡

Source-to-source compiler for PyTorch. Understandable. Inspectable. Extensible.

✅ Run PyTorch 40% faster   ✅ Quantization                ✅ Kernel fusion        
✅ Training recipes         ✅ FP4/FP6/FP8 precision       ✅ Distributed TP/PP/DP 
✅ Inference recipes        ✅ Ready for NVIDIA Blackwell  ✅ CUDA Graphs          
✅ LLMs, non LLMs and more  ✅ Custom Triton kernels       ✅ Compose all the above

Thunder is a source-to-source deep learning compiler for PyTorch that focuses on making it simple to optimize models for training and inference.

It provides:

a simple, Pythonic IR capturing the entire computation
a rich system of transforms that simultaneously operate on the computation IR, the model, and the weights
an extensible dispatch mechanism to fusers and optimized kernel libraries

With Thunder you can:

profile deep learning programs easily, map individual ops to kernels and inspect programs interactively
programmatically replace sequences of operations with optimized ones and see the effect on performance
acquire full computation graphs without graph breaks by flexibly extending the interpreter
modify programs to fully utilize bleeding edge kernel libraries on specific hardware
write models for single GPU and transform them to run distributed
quickly iterate on mixed precision and quantization strategies to search for combinations that minimally affect quality
bundle all optimizations in composable recipes, so they can be ported across model families

Ultimately, you should think about Thunder as a highly efficient tool to go from “unoptimized” to “optimized”.

If that is of interest for you, read on to Install Thunder and get started quickly.

Quick start • Examples • Performance • Docs

Quick start

Install Thunder via pip (more options):

pip install lightning-thunder

pip install -U torch torchvision
pip install nvfuser-cu128-torch28 nvidia-cudnn-frontend  # if NVIDIA GPU is present

For older versions of torch

torch==2.7 + CUDA 12.8

pip install lightning-thunder

pip install torch==2.7.0 torchvision==0.22
pip install nvfuser-cu128-torch27 nvidia-cudnn-frontend  # if NVIDIA GPU is present

torch==2.6 + CUDA 12.6

pip install lightning-thunder

pip install torch==2.6.0 torchvision==0.21
pip install nvfuser-cu126-torch26 nvidia-cudnn-frontend  # if NVIDIA GPU is present

torch==2.5 + CUDA 12.4

pip install lightning-thunder

pip install torch==2.5.0 torchvision==0.20
pip install nvfuser-cu124-torch25 nvidia-cudnn-frontend  # if NVIDIA GPU is present

Advanced install options

Install optional executors

# Float8 support (this will compile from source, be patient)
pip install "transformer_engine[pytorch]"

Install Thunder bleeding edge

pip install git+https://github.com/Lightning-AI/lightning-thunder.git@main

Install Thunder for development

git clone https://github.com/Lightning-AI/lightning-thunder.git
cd lightning-thunder
pip install -e .

Hello world

Define a function or a torch module:

import torch.nn as nn

model = nn.Sequential(nn.Linear(2048, 4096), nn.ReLU(), nn.Linear(4096, 64))

Optimize it with Thunder:

import thunder
import torch

thunder_model = thunder.compile(model)

x = torch.randn(64, 2048)

y = thunder_model(x)

torch.testing.assert_close(y, model(x))

Examples

LLM training

Install LitGPT (without updating other dependencies)

pip install --no-deps 'litgpt[all]'

and run

import thunder
import torch
import litgpt

with torch.device("cuda"):
    model = litgpt.GPT.from_name("Llama-3.2-1B").to(torch.bfloat16)

thunder_model = thunder.compile(model)

inp = torch.ones((1, 2048), device="cuda", dtype=torch.int64)

out = thunder_model(inp)
out.sum().backward()

HuggingFace BERT inference

Install Hugging Face Transformers (recommended version is 4.50.2 and above)

pip install -U transformers

and run

import thunder
import torch
import transformers

model_name = "bert-large-uncased"

tokenizer = transformers.AutoTokenizer.from_pretrained(model_name)

with torch.device("cuda"):
    model = transformers.AutoModelForCausalLM.from_pretrained(
        model_name, torch_dtype=torch.bfloat16
    )
    model.requires_grad_(False)
    model.eval()

    inp = tokenizer(["Hello world!"], return_tensors="pt")

thunder_model = thunder.compile(model)

out = thunder_model(**inp)
print(out)

HuggingFace DeepSeek R1 distill inference

Install Hugging Face Transformers (recommended version is 4.50.2 and above)

pip install -U transformers

and run

import torch
import transformers
import thunder

model_name = "deepseek-ai/DeepSeek-R1-Distill-Llama-8B"

tokenizer = transformers.AutoTokenizer.from_pretrained(model_name)

with torch.device("cuda"):
    model = transformers.AutoModelForCausalLM.from_pretrained(
        model_name, torch_dtype=torch.bfloat16
    )
    model.requires_grad_(False)
    model.eval()

    inp = tokenizer(["Hello world! Here's a long story"], return_tensors="pt")

thunder_model = thunder.compile(model)

out = thunder_model.generate(
    **inp, do_sample=False, cache_implementation="static", max_new_tokens=100
)
print(out)

Vision Transformer inference

import thunder
import torch
import torchvision as tv

with torch.device("cuda"):
    model = tv.models.vit_b_16()
    model.requires_grad_(False)
    model.eval()

    inp = torch.randn(128, 3, 224, 224)

out = model(inp)

thunder_model = thunder.compile(model)

out = thunder_model(inp)

Benchmarks

Although is Thunder a tool for optimizing models, rather than an opaque compiler that gets you speedups out of the box, here is a set of benchmarks.

Perf-wise, out of the box Thunder is in the ballpark of torch compile, especially when using CUDAGraphs. Note however that Thunder is not a competitor to torch compile! It can actually use torch compile as one of its fusion executors.

The script examples/quickstart/hf_llm.py demonstrates how to benchmark a model for text generation, forward pass, forward pass with loss, and a full forward + backward computation.

On an H100 with torch=2.8.0 and nvfuser-cu128-torch28 and Transformers 4.55.4 running Llama 3.2 1B we see the following timings:

Transformers with torch.compile and CUDAGraphs (reduce-overhead mode):  521ms
Transformers with torch.compile but no CUDAGraphs (default mode):       814ms
Transformers without torch.compile:                                    1493ms
Thunder with CUDAGraphs:                                                542ms

Plugins

Plugins are a way to apply optimizations to a model, such as parallelism and quantization.

Thunder comes with a few plugins included of the box, but it's easy to write new ones.

scale up with distributed strategies with DDP, FSDP, TP ()
optimize numerical precision with FP8, MXFP8
save memory with quantization
reduce latency with CUDAGraphs
debugging and profiling

For example, in order to reduce CPU overheads via CUDAGraphs you can add "reduce-overhead" to the plugins= argument of thunder.compile:

thunder_model = thunder.compile(model, plugins="reduce-overhead")

This may or may not make a big difference. The point of Thunder is that you can easily swap optimizations in and out and explore the best combination for your setup.

How it works

Thunder works in three stages:

⚡️ It acquires your model by interpreting Python bytecode and producing a straight-line Python program
️⚡️ It transforms the model and computation trace to make it distributed, change precision
⚡️ It routes parts of the trace for execution
- fusion (NVFuser, torch.compile)
- specialized libraries (e.g. cuDNN SDPA, TransformerEngine)
- custom Triton and CUDA kernels
- PyTorch eager operations

This is how the trace looks like for a simple MLP:

import thunder
import torch
import torch.nn as nn

model = nn.Sequential(nn.Linear(1024, 2048), nn.ReLU(), nn.Linear(2048, 256))

thunder_model = thunder.compile(model)
y = thunder_model(torch.randn(4, 1024))

print(thunder.last_traces(thunder_model)[-1])

This is the acquired trace, ready to be transformed and executed:

def computation(input, t_0_bias, t_0_weight, t_2_bias, t_2_weight):
# input: "cuda:0 f32[4, 1024]"
# t_0_bias: "cuda:0 f32[2048]"
# t_0_weight: "cuda:0 f32[2048, 1024]"
# t_2_bias: "cuda:0 f32[256]"
# t_2_weight: "cuda:0 f32[256, 2048]"
t3 = ltorch.linear(input, t_0_weight, t_0_bias) # t3: "cuda:0 f32[4, 2048]"
t6 = ltorch.relu(t3, False) # t6: "cuda:0 f32[4, 2048]"
t10 = ltorch.linear(t6, t_2_weight, t_2_bias) # t10: "cuda:0 f32[4, 256]"
return (t10,)

Note how Thunder's intermediate representation is just (a subset of) Python!

Performance

Thunder is fast. Here are the speed-ups obtained on a pre-training task using LitGPT on H100 and B200 hardware, relative to PyTorch eager.

Community

Thunder is an open source project, developed in collaboration with the community with significant contributions from NVIDIA.

💬 Get help on Discord 📋 License: Apache 2.0

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.7.dev20260201 pre-release

Feb 1, 2026

0.2.7.dev20260125 pre-release

Jan 25, 2026

0.2.7.dev20260118 pre-release

Jan 18, 2026

0.2.7.dev20260111 pre-release

Jan 11, 2026

0.2.7.dev20260104 pre-release

Jan 4, 2026

0.2.7.dev20251228 pre-release

Dec 28, 2025

0.2.7.dev20251221 pre-release

Dec 21, 2025

0.2.7.dev20251214 pre-release

Dec 14, 2025

0.2.7.dev20251207 pre-release

Dec 7, 2025

0.2.7.dev20251130 pre-release

Nov 30, 2025

0.2.7.dev20251123 pre-release

Nov 23, 2025

0.2.7.dev20251116 pre-release

Nov 16, 2025

0.2.7.dev20251109 pre-release

Nov 9, 2025

This version

0.2.7.dev20251102 pre-release

Nov 2, 2025

0.2.7.dev20251026 pre-release

Oct 26, 2025

0.2.6

Oct 22, 2025

0.2.6.dev20251019 pre-release

Oct 19, 2025

0.2.6.dev20251012 pre-release

Oct 12, 2025

0.2.6.dev20251005 pre-release

Oct 5, 2025

0.2.6.dev20250928 pre-release

Sep 28, 2025

0.2.6.dev20250921 pre-release

Sep 21, 2025

0.2.6.dev20250914 pre-release

Sep 14, 2025

0.2.5

Sep 10, 2025

0.2.5.dev20250907 pre-release

Sep 7, 2025

0.2.5.dev20250831 pre-release

Aug 31, 2025

0.2.5.dev20250824 pre-release

Aug 24, 2025

0.2.5.dev20250817 pre-release

Aug 17, 2025

0.2.5.dev20250810 pre-release

Aug 10, 2025

0.2.5.dev20250803 pre-release

Aug 3, 2025

0.2.5.dev20250727 pre-release

Jul 27, 2025

0.2.5.dev20250720 pre-release

Jul 20, 2025

0.2.5.dev20250713 pre-release

Jul 13, 2025

0.2.5.dev20250706 pre-release

Jul 6, 2025

0.2.5.dev20250629 pre-release

Jun 29, 2025

0.2.4

Jun 24, 2025

0.2.4.dev20250622 pre-release

Jun 22, 2025

0.2.4.dev20250615 pre-release

Jun 15, 2025

0.2.4.dev20250608 pre-release

Jun 8, 2025

0.2.4.dev20250601 pre-release

Jun 1, 2025

0.2.4.dev20250525 pre-release

May 25, 2025

0.2.3

May 23, 2025

0.2.3.dev20250518 pre-release

May 18, 2025

0.2.3.dev20250511 pre-release

May 11, 2025

0.2.3.dev20250504 pre-release

May 4, 2025

0.2.3.dev20250420 pre-release

Apr 20, 2025

0.2.3.dev20250413 pre-release

Apr 13, 2025

0.2.3.dev20250406 pre-release

Apr 6, 2025

0.2.3.dev20250330 pre-release

Mar 30, 2025

0.2.3.dev20250323 pre-release

Mar 23, 2025

0.2.2

Mar 20, 2025

0.2.2.dev20250316 pre-release

Mar 16, 2025

0.2.2.dev20250312 pre-release

Mar 12, 2025

0.2.2.dev20250309 pre-release

Mar 9, 2025

0.2.2.dev20250302 pre-release

Mar 2, 2025

0.2.2.dev20250223 pre-release

Feb 23, 2025

0.2.2.dev20250216 pre-release

Feb 16, 2025

0.2.2.dev20250209 pre-release

Feb 9, 2025

0.2.2.dev0 pre-release

Mar 20, 2025

0.2.1

Feb 4, 2025

0.2.1.dev20250202 pre-release

Feb 2, 2025

0.2.0.dev20250126 pre-release

Jan 26, 2025

0.2.0.dev20250124 pre-release

Jan 24, 2025

0.2.0.dev20250119 pre-release

Jan 19, 2025

0.2.0.dev20250112 pre-release

Jan 12, 2025

0.2.0.dev20250105 pre-release

Jan 5, 2025

0.2.0.dev20241229 pre-release

Dec 29, 2024

0.2.0.dev20241222 pre-release

Dec 22, 2024

0.2.0.dev20241215 pre-release

Dec 15, 2024

0.2.0.dev20241208 pre-release

Dec 8, 2024

0.2.0.dev20241201 pre-release

Dec 1, 2024

0.2.0.dev20241124 pre-release

Nov 24, 2024

0.2.0.dev20241117 pre-release

Nov 17, 2024

0.2.0.dev20241110 pre-release

Nov 10, 2024

0.2.0.dev20241103 pre-release

Nov 3, 2024

0.2.0.dev20241027 pre-release

Oct 27, 2024

0.2.0.dev20241020 pre-release

Oct 20, 2024

0.2.0.dev20241013 pre-release

Oct 13, 2024

0.2.0.dev20241006 pre-release

Oct 6, 2024

0.2.0.dev20240929 pre-release

Sep 29, 2024

0.2.0.dev20240922 pre-release

Sep 22, 2024

0.2.0.dev20240915 pre-release

Sep 15, 2024

0.2.0.dev20240908 pre-release

Sep 8, 2024

0.2.0.dev20240901 pre-release

Sep 1, 2024

0.2.0.dev20240825 pre-release

Aug 25, 2024

0.2.0.dev20240818 pre-release

Aug 18, 2024

0.2.0.dev20240811 pre-release

Aug 11, 2024

0.2.0.dev20240804 pre-release

Aug 4, 2024

0.2.0.dev20240728 pre-release

Jul 28, 2024

0.2.0.dev20240721 pre-release

Jul 21, 2024

0.2.0.dev20240714 pre-release

Jul 14, 2024

0.2.0.dev20240707 pre-release

Jul 7, 2024

0.2.0.dev20240630 pre-release

Jun 30, 2024

0.2.0.dev20240623 pre-release

Jun 23, 2024

0.2.0.dev20240616 pre-release

Jun 16, 2024

0.2.0.dev20240609 pre-release

Jun 9, 2024

0.2.0.dev20240602 pre-release

Jun 2, 2024

0.2.0.dev20240526 pre-release

May 26, 2024

0.2.0.dev20240519 pre-release

May 19, 2024

0.2.0.dev20240513 pre-release

May 13, 2024

0.2.0.dev20240505 pre-release

May 5, 2024

0.2.0.dev20240428 pre-release

Apr 28, 2024

0.2.0.dev20240421 pre-release

Apr 21, 2024

0.2.0.dev20240414 pre-release

Apr 14, 2024

0.2.0.dev20240407 pre-release

Apr 7, 2024

0.2.0.dev20240404 pre-release

Apr 4, 2024

0.1.0

Mar 20, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lightning_thunder-0.2.7.dev20251102.tar.gz (639.7 kB view details)

Uploaded Nov 2, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

lightning_thunder-0.2.7.dev20251102-py3-none-any.whl (1.0 MB view details)

Uploaded Nov 2, 2025 Python 3

File details

Details for the file lightning_thunder-0.2.7.dev20251102.tar.gz.

File metadata

Download URL: lightning_thunder-0.2.7.dev20251102.tar.gz
Upload date: Nov 2, 2025
Size: 639.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for lightning_thunder-0.2.7.dev20251102.tar.gz
Algorithm	Hash digest
SHA256	`ca6e8830d9d0492541d4a28d993d4117131788fdf6464ba037ef2f5463a44e89`
MD5	`2940ac4a1299710b1ad9ee406c86186b`
BLAKE2b-256	`ff8d158a3f086c1d4e3d34522ce0db99058f48b129a046ab78b07086880beb3d`

See more details on using hashes here.

File details

Details for the file lightning_thunder-0.2.7.dev20251102-py3-none-any.whl.

File metadata

Download URL: lightning_thunder-0.2.7.dev20251102-py3-none-any.whl
Upload date: Nov 2, 2025
Size: 1.0 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for lightning_thunder-0.2.7.dev20251102-py3-none-any.whl
Algorithm	Hash digest
SHA256	`aa09dc74952a48b681d7dc030a8cadff8f13f07e6d130c9540220326bc4012fa`
MD5	`93a368c114d3db14a682d2727936ba94`
BLAKE2b-256	`644f1d2c2f12d9684d1c4be578b4fe0f548dc46de35ab522bb4a1c36a68d10b0`

See more details on using hashes here.

lightning-thunder 0.2.7.dev20251102

Navigation

Verified details

Owner

Unverified details

Project links

Meta

Classifiers

Project description

Give your PyTorch models superpowers ⚡

Quick start

Install optional executors

Install Thunder bleeding edge

Install Thunder for development

Hello world

Examples

LLM training

HuggingFace BERT inference

HuggingFace DeepSeek R1 distill inference

Vision Transformer inference

Benchmarks

Plugins

How it works

Performance

Community

Project details

Verified details

Owner

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes