Torch-TensorRT is a package which allows users to automatically compile PyTorch and TorchScript modules to TensorRT while remaining in PyTorch
Project description
Torch-TensorRT
Easily achieve the best inference performance for any PyTorch model on the NVIDIA platform.
Torch-TensorRT brings the power of TensorRT to PyTorch. Accelerate inference latency by up to 5x compared to eager execution in just one line of code.
Installation
Stable versions of Torch-TensorRT are published on PyPI
pip install torch-tensorrt
Nightly versions of Torch-TensorRT are published on the PyTorch package index
pip install --pre torch-tensorrt --index-url https://download.pytorch.org/whl/nightly/cu124
Torch-TensorRT is also distributed in the ready-to-run NVIDIA NGC PyTorch Container which has all dependencies with the proper versions and example notebooks included.
For more advanced installation methods, please see here
Quickstart
Option 1: torch.compile
You can use Torch-TensorRT anywhere you use torch.compile
:
import torch
import torch_tensorrt
model = MyModel().eval().cuda() # define your model here
x = torch.randn((1, 3, 224, 224)).cuda() # define what the inputs to the model will look like
optimized_model = torch.compile(model, backend="tensorrt")
optimized_model(x) # compiled on first run
optimized_model(x) # this will be fast!
Option 2: Export
If you want to optimize your model ahead-of-time and/or deploy in a C++ environment, Torch-TensorRT provides an export-style workflow that serializes an optimized module. This module can be deployed in PyTorch or with libtorch (i.e. without a Python dependency).
Step 1: Optimize + serialize
import torch
import torch_tensorrt
model = MyModel().eval().cuda() # define your model here
inputs = [torch.randn((1, 3, 224, 224)).cuda()] # define a list of representative inputs here
trt_gm = torch_tensorrt.compile(model, ir="dynamo", inputs)
torch_tensorrt.save(trt_gm, "trt.ep", inputs=inputs) # PyTorch only supports Python runtime for an ExportedProgram. For C++ deployment, use a TorchScript file
torch_tensorrt.save(trt_gm, "trt.ts", output_format="torchscript", inputs=inputs)
Step 2: Deploy
Deployment in PyTorch:
import torch
import torch_tensorrt
inputs = [torch.randn((1, 3, 224, 224)).cuda()] # your inputs go here
# You can run this in a new python session!
model = torch.export.load("trt.ep").module()
# model = torch_tensorrt.load("trt.ep").module() # this also works
model(*inputs)
Deployment in C++:
#include "torch/script.h"
#include "torch_tensorrt/torch_tensorrt.h"
auto trt_mod = torch::jit::load("trt.ts");
auto input_tensor = [...]; // fill this with your inputs
auto results = trt_mod.forward({input_tensor});
Further resources
- Up to 50% faster Stable Diffusion inference with one line of code
- Optimize LLMs from Hugging Face with Torch-TensorRT [coming soon]
- Run your model in FP8 with Torch-TensorRT
- Tools to resolve graph breaks and boost performance [coming soon]
- Tech Talk (GTC '23)
- Documentation
Platform Support
Platform | Support |
---|---|
Linux AMD64 / GPU | Supported |
Windows / GPU | Supported (Dynamo only) |
Linux aarch64 / GPU | Native Compilation Supported on JetPack-4.4+ (use v1.0.0 for the time being) |
Linux aarch64 / DLA | Native Compilation Supported on JetPack-4.4+ (use v1.0.0 for the time being) |
Linux ppc64le / GPU | Not supported |
Note: Refer NVIDIA L4T PyTorch NGC container for PyTorch libraries on JetPack.
Dependencies
These are the following dependencies used to verify the testcases. Torch-TensorRT can work with other versions, but the tests are not guaranteed to pass.
- Bazel 6.3.2
- Libtorch 2.5.0.dev (latest nightly) (built with CUDA 12.4)
- CUDA 12.4
- TensorRT 10.3.0.26
Deprecation Policy
Deprecation is used to inform developers that some APIs and tools are no longer recommended for use. Beginning with version 2.3, Torch-TensorRT has the following deprecation policy:
Deprecation notices are communicated in the Release Notes. Deprecated API functions will have a statement in the source documenting when they were deprecated. Deprecated methods and classes will issue deprecation warnings at runtime, if they are used. Torch-TensorRT provides a 6-month migration period after the deprecation. APIs and tools continue to work during the migration period. After the migration period ends, APIs and tools are removed in a manner consistent with semantic versioning.
Contributing
Take a look at the CONTRIBUTING.md
License
The Torch-TensorRT license can be found in the LICENSE file. It is licensed with a BSD Style licence
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Hashes for torch_tensorrt-2.5.0-cp312-cp312-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b9e7ee1510e644106e9fb6f194a227a1394d8646fea049e5dba572f99dc38b24 |
|
MD5 | 923ef5334e0be429d72f836b2e25adb5 |
|
BLAKE2b-256 | e070ed10a8ce0f30bc938a53fcb0066fe2fb34dac33c6f240c24d3d9276444c1 |
Hashes for torch_tensorrt-2.5.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_34_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d317a244284f5fe2f57b78ec895ff6da0b23ff7b208a5f3001a61a77f3b9ec35 |
|
MD5 | 027b179949a0bfb2afdaa8d76c4340ee |
|
BLAKE2b-256 | 3b63f2d800a2353c1b03d6000ff686f30dae4fa746fdf8dd85b6d4245e5593ff |
Hashes for torch_tensorrt-2.5.0-cp311-cp311-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a893b933d6432c8a07ce0bf9b394c5b200ba3d620250bef4d89b0a734189cd84 |
|
MD5 | ccfdaab726a319eebc18a056c3855a87 |
|
BLAKE2b-256 | 10762345bee0199424846d5708254ed1c71293f4825b15c6b824d7fae32aac55 |
Hashes for torch_tensorrt-2.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_34_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1b408fe06ba0e855cb6a142a2ac27815ab45f43c05d3e3880c8602429cddb27f |
|
MD5 | 6408b1f067b61f75ad9f9f599a9420c1 |
|
BLAKE2b-256 | 87c611f4e6300bd135dc7f8160670f6b2b10618e5112c6ee368dfcc4f7dc5cfc |
Hashes for torch_tensorrt-2.5.0-cp310-cp310-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fca7984736394cfc4685460cdde5a3b0e697bd23a7aeba91e66daa0619c91170 |
|
MD5 | 926c43f170e9afcbd4237c1b7fcee45a |
|
BLAKE2b-256 | 5f2b4662215a1b7ac311dea83ebd2dd7accb72618231d4600ea172e370c1766d |
Hashes for torch_tensorrt-2.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_34_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3b059b1e024e1ae0f37ab2da32f15077e53a18294368f74c8d92ebc24fb1c5f3 |
|
MD5 | c38228b27343f568bf4318b266461bca |
|
BLAKE2b-256 | 13490bd42291b2bd6bdc64e51299e731be09f3e8b8bbee939d943240c7b63dca |
Hashes for torch_tensorrt-2.5.0-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b0a2ed43fb047ed7294c3213c52367676d6581483f6fae045d744ce2549d7781 |
|
MD5 | 02d67545bf99f2b995369eaad6b0703b |
|
BLAKE2b-256 | 4eea2dba63cf0f929abe83ea915f34ba973a9029fe9b26015e50b17670c79b0a |
Hashes for torch_tensorrt-2.5.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_34_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6076cac847127bfea3cce3bb50aa7a7465510c0bf3eb085bd49dbfde1998471a |
|
MD5 | 7d2fdc7fc385ae5ebb8ec5838dde75ad |
|
BLAKE2b-256 | 0ac3dc2c0580d4ee49714e1dca7199dba065168759d0e375838d9f31b4cc5855 |