Skip to main content

Torch-TensorRT is a package which allows users to automatically compile PyTorch and TorchScript modules to TensorRT while remaining in PyTorch

Project description

torch_tensorrt

Ahead of Time (AOT) compiling for PyTorch JIT

Torch-TensorRT is a compiler for PyTorch/TorchScript, targeting NVIDIA GPUs via NVIDIA's TensorRT Deep Learning Optimizer and Runtime. Unlike PyTorch's Just-In-Time (JIT) compiler, Torch-TensorRT is an Ahead-of-Time (AOT) compiler, meaning that before you deploy your TorchScript code, you go through an explicit compile step to convert a standard TorchScript program into an module targeting a TensorRT engine. Torch-TensorRT operates as a PyTorch extention and compiles modules that integrate into the JIT runtime seamlessly. After compilation using the optimized graph should feel no different than running a TorchScript module. You also have access to TensorRT's suite of configurations at compile time, so you are able to specify operating precision (FP32/FP16/INT8) and other settings for your module.

Example Usage

import torch_tensorrt

...

trt_ts_module = torch_tensorrt.compile(torch_script_module,
    inputs = [example_tensor, # Provide example tensor for input shape or...
        torch_tensorrt.Input( # Specify input object with shape and dtype
            min_shape=[1, 3, 224, 224],
            opt_shape=[1, 3, 512, 512],
            max_shape=[1, 3, 1024, 1024],
            # For static size shape=[1, 3, 224, 224]
            dtype=torch.half) # Datatype of input tensor. Allowed options torch.(float|half|int8|int32|bool)
    ],
    enabled_precisions = {torch.half}, # Run with FP16)

result = trt_ts_module(input_data) # run inference
torch.jit.save(trt_ts_module, "trt_torchscript_module.ts") # save the TRT embedded Torchscript

Installation

ABI / Platform Installation command
Pre CXX11 ABI (Linux x86_64) python3 setup.py install
CXX ABI (Linux x86_64) python3 setup.py install --use-cxx11-abi
Pre CXX11 ABI (Jetson platform aarch64) python3 setup.py install --jetpack-version 4.6
CXX11 ABI (Jetson platform aarch64) python3 setup.py install --jetpack-version 4.6 --use-cxx11-abi

For Linux x86_64 platform, Pytorch libraries default to pre cxx11 abi. So, please use python3 setup.py install.

On Jetson platforms, NVIDIA hosts pre-built Pytorch wheel files. These wheel files are built with CXX11 ABI. So on jetson platforms, please use python3 setup.py install --jetpack-version 4.6 --use-cxx11-abi

Under the Hood

When a traced module is provided to Torch-TensorRT, the compiler takes the internal representation and transforms it into one like this:

graph(%input.2 : Tensor):
    %2 : Float(84, 10) = prim::Constant[value=<Tensor>]()
    %3 : Float(120, 84) = prim::Constant[value=<Tensor>]()
    %4 : Float(576, 120) = prim::Constant[value=<Tensor>]()
    %5 : int = prim::Constant[value=-1]() # x.py:25:0
    %6 : int[] = prim::Constant[value=annotate(List[int], [])]()
    %7 : int[] = prim::Constant[value=[2, 2]]()
    %8 : int[] = prim::Constant[value=[0, 0]]()
    %9 : int[] = prim::Constant[value=[1, 1]]()
    %10 : bool = prim::Constant[value=1]() # ~/.local/lib/python3.6/site-packages/torch/nn/modules/conv.py:346:0
    %11 : int = prim::Constant[value=1]() # ~/.local/lib/python3.6/site-packages/torch/nn/functional.py:539:0
    %12 : bool = prim::Constant[value=0]() # ~/.local/lib/python3.6/site-packages/torch/nn/functional.py:539:0
    %self.classifer.fc3.bias : Float(10) = prim::Constant[value= 0.0464  0.0383  0.0678  0.0932  0.1045 -0.0805 -0.0435 -0.0818  0.0208 -0.0358 [ CUDAFloatType{10} ]]()
    %self.classifer.fc2.bias : Float(84) = prim::Constant[value=<Tensor>]()
    %self.classifer.fc1.bias : Float(120) = prim::Constant[value=<Tensor>]()
    %self.feat.conv2.weight : Float(16, 6, 3, 3) = prim::Constant[value=<Tensor>]()
    %self.feat.conv2.bias : Float(16) = prim::Constant[value=<Tensor>]()
    %self.feat.conv1.weight : Float(6, 1, 3, 3) = prim::Constant[value=<Tensor>]()
    %self.feat.conv1.bias : Float(6) = prim::Constant[value= 0.0530 -0.1691  0.2802  0.1502  0.1056 -0.1549 [ CUDAFloatType{6} ]]()
    %input0.4 : Tensor = aten::_convolution(%input.2, %self.feat.conv1.weight, %self.feat.conv1.bias, %9, %8, %9, %12, %8, %11, %12, %12, %10) # ~/.local/lib/python3.6/site-packages/torch/nn/modules/conv.py:346:0
    %input0.5 : Tensor = aten::relu(%input0.4) # ~/.local/lib/python3.6/site-packages/torch/nn/functional.py:1063:0
    %input1.2 : Tensor = aten::max_pool2d(%input0.5, %7, %6, %8, %9, %12) # ~/.local/lib/python3.6/site-packages/torch/nn/functional.py:539:0
    %input0.6 : Tensor = aten::_convolution(%input1.2, %self.feat.conv2.weight, %self.feat.conv2.bias, %9, %8, %9, %12, %8, %11, %12, %12, %10) # ~/.local/lib/python3.6/site-packages/torch/nn/modules/conv.py:346:0
    %input2.1 : Tensor = aten::relu(%input0.6) # ~/.local/lib/python3.6/site-packages/torch/nn/functional.py:1063:0
    %x.1 : Tensor = aten::max_pool2d(%input2.1, %7, %6, %8, %9, %12) # ~/.local/lib/python3.6/site-packages/torch/nn/functional.py:539:0
    %input.1 : Tensor = aten::flatten(%x.1, %11, %5) # x.py:25:0
    %27 : Tensor = aten::matmul(%input.1, %4)
    %28 : Tensor = trt::const(%self.classifer.fc1.bias)
    %29 : Tensor = aten::add_(%28, %27, %11)
    %input0.2 : Tensor = aten::relu(%29) # ~/.local/lib/python3.6/site-packages/torch/nn/functional.py:1063:0
    %31 : Tensor = aten::matmul(%input0.2, %3)
    %32 : Tensor = trt::const(%self.classifer.fc2.bias)
    %33 : Tensor = aten::add_(%32, %31, %11)
    %input1.1 : Tensor = aten::relu(%33) # ~/.local/lib/python3.6/site-packages/torch/nn/functional.py:1063:0
    %35 : Tensor = aten::matmul(%input1.1, %2)
    %36 : Tensor = trt::const(%self.classifer.fc3.bias)
    %37 : Tensor = aten::add_(%36, %35, %11)
    return (%37)
(CompileGraph)

The graph has now been transformed from a collection of modules much like how your PyTorch Modules are collections of modules, each managing their own parameters into a single graph with the parameters inlined into the graph and all of the operations laid out. Torch-TensorRT has also executed a number of optimizations and mappings to make the graph easier to translate to TensorRT. From here the compiler can assemble the TensorRT engine by following the dataflow through the graph.

When the graph construction phase is complete, Torch-TensorRT produces a serialized TensorRT engine. From here depending on the API, this engine is returned to the user or moves into the graph construction phase. Here Torch-TensorRT creates a JIT Module to execute the TensorRT engine which will be instantiated and managed by the Torch-TensorRT runtime.

Here is the graph that you get back after compilation is complete:

graph(%self.1 : __torch__.___torch_mangle_10.LeNet_trt,
    %2 : Tensor):
    %1 : int = prim::Constant[value=94106001690080]()
    %3 : Tensor = trt::execute_engine(%1, %2)
    return (%3)
(AddEngineToGraph)

You can see the call where the engine is executed, based on a constant which is the ID of the engine, telling JIT how to find the engine and the input tensor which will be fed to TensorRT. The engine represents the exact same calculations as what is done by running a normal PyTorch module but optimized to run on your GPU.

Torch-TensorRT converts from TorchScript by generating layers or subgraphs in correspondance with instructions seen in the graph. Converters are small modules of code used to map one specific operation to a layer or subgraph in TensorRT. Not all operations are support, but if you need to implement one, you can in C++.

Registering Custom Converters

Operations are mapped to TensorRT through the use of modular converters, a function that takes a node from a the JIT graph and produces an equivalent layer or subgraph in TensorRT. Torch-TensorRT ships with a library of these converters stored in a registry, that will be executed depending on the node being parsed. For instance a aten::relu(%input0.4) instruction will trigger the relu converter to be run on it, producing an activation layer in the TensorRT graph. But since this library is not exhaustive you may need to write your own to get Torch-TensorRT to support your module.

Shipped with the Torch-TensorRT distribution are the internal core API headers. You can therefore access the converter registry and add a converter for the op you need.

For example, if we try to compile a graph with a build of Torch-TensorRT that doesn’t support the flatten operation (aten::flatten) you may see this error:

terminate called after throwing an instance of 'torch_tensorrt::Error'
what():  [enforce fail at core/conversion/conversion.cpp:109] Expected converter to be true but got false
Unable to convert node: %input.1 : Tensor = aten::flatten(%x.1, %11, %5) # x.py:25:0 (conversion.AddLayer)
Schema: aten::flatten.using_ints(Tensor self, int start_dim=0, int end_dim=-1) -> (Tensor)
Converter for aten::flatten requested, but no such converter was found.
If you need a converter for this operator, you can try implementing one yourself
or request a converter: https://www.github.com/NVIDIA/Torch-TensorRT/issues

We can register a converter for this operator in our application. All of the tools required to build a converter can be imported by including Torch-TensorRT/core/conversion/converters/converters.h. We start by creating an instance of the self-registering class torch_tensorrt::core::conversion::converters::RegisterNodeConversionPatterns() which will register converters in the global converter registry, associating a function schema like aten::flatten.using_ints(Tensor self, int start_dim=0, int end_dim=-1) -> (Tensor) with a lambda that will take the state of the conversion, the node/operation in question to convert and all of the inputs to the node and produces as a side effect a new layer in the TensorRT network. Arguments are passed as a vector of inspectable unions of TensorRT ITensors and Torch IValues in the order arguments are listed in the schema.

Below is a implementation of a aten::flatten converter that we can use in our application. You have full access to the Torch and TensorRT libraries in the converter implementation. So for example we can quickly get the output size by just running the operation in PyTorch instead of implementing the full calculation outself like we do below for this flatten converter.

#include "torch/script.h"
#include "torch_tensorrt/torch_tensorrt.h"
#include "torch_tensorrt/core/conversion/converters/converters.h"

static auto flatten_converter = torch_tensorrt::core::conversion::converters::RegisterNodeConversionPatterns()
    .pattern({
        "aten::flatten.using_ints(Tensor self, int start_dim=0, int end_dim=-1) -> (Tensor)",
        [](torch_tensorrt::core::conversion::ConversionCtx* ctx,
           const torch::jit::Node* n,
           torch_tensorrt::core::conversion::converters::args& args) -> bool {
            auto in = args[0].ITensor();
            auto start_dim = args[1].unwrapToInt();
            auto end_dim = args[2].unwrapToInt();
            auto in_shape = torch_tensorrt::core::util::toVec(in->getDimensions());
            auto out_shape = torch::flatten(torch::rand(in_shape), start_dim, end_dim).sizes();

            auto shuffle = ctx->net->addShuffle(*in);
            shuffle->setReshapeDimensions(torch_tensorrt::core::util::toDims(out_shape));
            shuffle->setName(torch_tensorrt::core::util::node_info(n).c_str());

            auto out_tensor = ctx->AssociateValueAndTensor(n->outputs()[0], shuffle->getOutput(0));
            return true;
        }
    });

To use this converter in Python, it is recommended to use PyTorch’s C++ / CUDA Extention template to wrap your library of converters into a .so that you can load with ctypes.CDLL() in your Python application.

You can find more information on all the details of writing converters in the contributors documentation (Writing Converters). If you find yourself with a large library of converter implementations, do consider upstreaming them, PRs are welcome and it would be great for the community to benefit as well.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

torch_tensorrt-2.3.0-cp311-cp311-win_amd64.whl (324.3 kB view details)

Uploaded CPython 3.11Windows x86-64

torch_tensorrt-2.3.0-cp311-cp311-manylinux_2_31_x86_64.manylinux_2_34_x86_64.whl (18.3 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.31+ x86-64manylinux: glibc 2.34+ x86-64

torch_tensorrt-2.3.0-cp310-cp310-win_amd64.whl (324.3 kB view details)

Uploaded CPython 3.10Windows x86-64

torch_tensorrt-2.3.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_34_x86_64.whl (18.3 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64manylinux: glibc 2.34+ x86-64

torch_tensorrt-2.3.0-cp39-cp39-win_amd64.whl (324.3 kB view details)

Uploaded CPython 3.9Windows x86-64

torch_tensorrt-2.3.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_34_x86_64.whl (18.4 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64manylinux: glibc 2.34+ x86-64

torch_tensorrt-2.3.0-cp38-cp38-win_amd64.whl (324.3 kB view details)

Uploaded CPython 3.8Windows x86-64

torch_tensorrt-2.3.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_34_x86_64.whl (18.3 MB view details)

Uploaded CPython 3.8manylinux: glibc 2.17+ x86-64manylinux: glibc 2.34+ x86-64

File details

Details for the file torch_tensorrt-2.3.0-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for torch_tensorrt-2.3.0-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 6f635cb7cae864168f12ea4fd8b101195fbe06bbd9d110b7214252e99be4c1aa
MD5 1849557c8224b0c0f4a71505dc76a912
BLAKE2b-256 0ec6562224e4ff79bfa90035c0ffa038842bca5a8ef45e9a86e18c070c7b3dc8

See more details on using hashes here.

File details

Details for the file torch_tensorrt-2.3.0-cp311-cp311-manylinux_2_31_x86_64.manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for torch_tensorrt-2.3.0-cp311-cp311-manylinux_2_31_x86_64.manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 a2c5afc5d8dc93efb1b08e3b8cbbdab161ee76df2b27141c1ad97ee09f6f67b6
MD5 843ed11594e0dfc59124e143ce6d375e
BLAKE2b-256 a312c4b14d317aeb24db83f1b00f72e998b2785a826e43909f74ce40c26527ba

See more details on using hashes here.

File details

Details for the file torch_tensorrt-2.3.0-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for torch_tensorrt-2.3.0-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 85a1dc179938705bfe59b7e7acedf63c25856ab6ace57b73679ec3f636fe2efe
MD5 06e0be6ce363d4fe8b365d1f9656fe49
BLAKE2b-256 d46751b075ab8a67d28695b46dd49df48ab385ccca666055f1ac096e5b21004e

See more details on using hashes here.

File details

Details for the file torch_tensorrt-2.3.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for torch_tensorrt-2.3.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 a3aa62a83ea949e91d1e71c764db683100d135f65488eea485cfa7dec44cb114
MD5 f16294d595fb86b788dd75570caab68b
BLAKE2b-256 c74c5a5ae92f6a17151c3791eecdd89920bb80f15c4e6734cbebbf84d4b48292

See more details on using hashes here.

File details

Details for the file torch_tensorrt-2.3.0-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for torch_tensorrt-2.3.0-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 056a4e7b2f35ed904b5bdfef5d353063f35ade1f63302c7a2f5ffdb952169a8f
MD5 85e13789d1363ee17019ca8ddef6f7e1
BLAKE2b-256 1d7af0ff6675dc95286ac695d2a2369a692f0914af9719ff841d57295c8529cd

See more details on using hashes here.

File details

Details for the file torch_tensorrt-2.3.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for torch_tensorrt-2.3.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 048aebf943fa40271d83c4a64ec96ce682922454b3fac5c7eaa1183b5241a051
MD5 01620e26a85d79adca6ed2bf1f9ca5f0
BLAKE2b-256 2fe2fd063cc348abdd7d7575eda5c1fdf29b04eec7545d4d39a084ebe72bc3f8

See more details on using hashes here.

File details

Details for the file torch_tensorrt-2.3.0-cp38-cp38-win_amd64.whl.

File metadata

File hashes

Hashes for torch_tensorrt-2.3.0-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 f96aef20d8458258fdd3525dcabd0c5844cb16aa463e8272dac330714ed991e4
MD5 d5655d825d0c8b53921097047de0dbf8
BLAKE2b-256 65cecd30e72506bcdde60c7f0bba9031ad58e6651c06f01c96734bf80a60d7fb

See more details on using hashes here.

File details

Details for the file torch_tensorrt-2.3.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for torch_tensorrt-2.3.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 1041626c0ee15cb7c4fc0cfcf9de21a6ed476a4868a0640b5af060e133062567
MD5 6bc172ddb916403d055c5a3bf86e7d13
BLAKE2b-256 2ff81b2a8187316b9d66c6cc3e9a413881327f59470a5f7d4f1b16f738b218c7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page