TensorRT-based GPU inference plugin for VapourSynth
Project description
VapourSynth-MLRT-TRT
This package contains the TensorRT backend implementation of the vs-mlrt plugin.
Installation
pip install vapoursynth-mlrt-trt --extra-index-url https://jaded-encoding-thaumaturgy.github.io/vs-wheels/simple
Building from source
Requirements
- C++ Compiler: C++20 compatible (e.g. MSVC 2019+, GCC, Clang)
- Dependencies:
CUDAToolkitTensorRTSDK (including runtime and plugins library)
- Environment Variables:
TENSORRT_HOME: Path to the TensorRT installation directory (must containinclude,lib, andbin).
Compilation
Set the TENSORRT_HOME environment variable before running the build:
$env:TENSORRT_HOME="C:\Path\To\TensorRT"
uv build --package vapoursynth-mlrt-trt
On Linux:
export TENSORRT_HOME="/path/to/TensorRT"
uv build --package vapoursynth-mlrt-trt
Detailed parameter information from the parent project follows.
VapourSynth TensorRT & TensorRT-RTX
The vs-tensorrt plugin provides optimized CUDA runtime for some popular AI filters.
Usage
Prototype: core.{trt, trt_rtx}.Model(clip[] clips, string engine_path[, int[] overlap, int[] tilesize, int device_id=0, bint use_cuda_graph=False, int num_streams=1, int verbosity=2, string flexible_output_prop=""])
Arguments:
-
clip[] clips: the input clips, only 32-bit floating point RGB or GRAY clips are supported. For model specific input requirements, please consult our wiki. -
string engine_path: the path to the prebuilt engine (see below) -
int[] overlap: some networks (e.g. CNN) support arbitrary input shape where other networks might only support fixed input shape and the input clip must be processed in tiles. Theoverlapargument specifies the overlapping (horizontal and vertical, or both, in pixels) between adjacent tiles to minimize boundary issues. Please refer to network specific docs on the recommended overlapping size. -
int[] tilesize: Even for CNN where arbitrary input sizes could be supported, sometimes the network does not work well for the entire range of input dimensions, and you have to limit the size of each tile. This parameter specify the tile size (horizontal and vertical, or both, including the overlapping). Please refer to network specific docs on the recommended tile size. -
int device_id: Specifies the GPU device id to use, default 0. Requires Nvidia GPUs with second-generation Kepler architecture onwards. -
bint use_cuda_graph: whether to use CUDA Graphs to improve performance and reduce CPU overhead. -
int num_streams: number of concurrent CUDA streams to use. Default 1. Increase if GPU not saturated. -
verbosity: The verbosity level of TensorRT runtime. The message writes tostderr.0: Internal error.1: Application error.2: Warning.3: Informational messages with instructional information.4: Verbose messages with debugging information. -
string flexible_output_prop: used to support onnx models with arbitrary number of output planes.from typing import TypedDict class Output(TypedDict): clip: vs.VideoNode num_planes: int prop = "planes" # arbitrary non-empty string output = core.trt.Model(src, engine_path, flexible_output_prop=prop) # type: Output clip = output["clip"] num_planes = output["num_planes"] output_planes = [ clip.std.PropToClip(prop=f"{prop}{i}") for i in range(num_planes) ] # type: list[vs.VideoNode]
When overlap and tilesize are not specified, the filter will internally try to resize the network to fit the input clips. This might not always work (for example, the network might require the width to be divisible by 8), and the filter will error out in this case.
The general rule is to either:
- left out
overlap,tilesizeat all and just process the input frame in one tile, or - set all three so that the frame is processed in
tilesize[0]xtilesize[1]tiles, and adjacent tiles will have an overlap ofoverlap[0]xoverlap[1]pixels on each direction. The overlapped region will be throw out so that only internal output pixels are used.
Instructions for TensorRT
Build engine with dynamic shape support
- Requires models with built-in dynamic shape support, e.g.
waifu2x_v3.7zanddpir_v3.7z.
-
Build engine
trtexec --onnx=drunet_gray.onnx --minShapes=input:1x2x8x8 --optShapes=input:1x2x64x64 --maxShapes=input:1x2x1080x1920 --saveEngine=dpir_gray_1080p_dynamic.engine
The engine will be optimized for
64x64input and can be applied to eligible inputs with shape from8x8to1920x1080by specifying parametertilesizein thetrtplugin.Also check trtexec useful arguments
Run model
In vpy script:
# DPIR
src = core.std.BlankClip(src, width=640, height=360, format=vs.GRAYS)
sigma = 10.0
flt = core.trt.Model([src, core.std.BlankClip(src, color=sigma/255.0)], engine_path="dpir_gray_1080p_dynamic.engine", tilesize=[640, 360])
trtexec useful arguments
-
--workspace=N: Set workspace size in megabytes (default = 16) -
--fp16: Enable fp16 precision, in addition to fp32 (default = disabled) -
--noTF32: Disable tf32 precision (default is to enable tf32, in addition to fp32, Ampere only) -
--device=N: Select cuda device N (default = 0) -
--timingCacheFile=<file>: Save/load the serialized global timing cache -
--verbose: Use verbose logging (default = false) -
--profilingVerbosity=mode: Specify profiling verbosity.mode ::= layer_names_only|detailed|none(default = layer_names_only)
-
--tacticSources=tactics: Specify the tactics to be used by adding (+) or removing (-) tactics from the defaulttactic sources (default = all available tactics).
Note: Currently only cuDNN, cuBLAS and cuBLAS-LT are listed as optional tactics.
Tactic Sources:
tactics ::= [","tactic] tactic ::= (+|-)lib lib ::= "CUBLAS"|"CUBLAS_LT"|"CUDNN"For example, to disable cudnn and enable cublas: --tacticSources=-CUDNN,+CUBLAS
-
--useCudaGraph: Use CUDA graph to capture engine execution and then launch inference (default = disabled). This flag may be ignored if the graph capture fails. -
--noDataTransfers: Disable DMA transfers to and from device (default = enabled). -
--saveEngine=<file>: Save the serialized engine -
--loadEngine=<file>: Load a serialized engine
Instructions for TensorRT-RTX
Replace the trtexec executable by the tensorrt_rtx executable. Some options may not be supported, e.g. --fp16.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file vapoursynth_mlrt_trt-15.16.post1.tar.gz.
File metadata
- Download URL: vapoursynth_mlrt_trt-15.16.post1.tar.gz
- Upload date:
- Size: 674.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9e0e4b2e225161cc806870f060695d8afbb81572c300f76e7b07a1d98d81535b
|
|
| MD5 |
5b2aaa0a725f39b1d07856fa9773adc5
|
|
| BLAKE2b-256 |
a3140c3e78c2c3d2c1872c1d7ced3c35cd9d425276f190aebc74812b3232351a
|
Provenance
The following attestation bundles were made for vapoursynth_mlrt_trt-15.16.post1.tar.gz:
Publisher:
cd-publish.yml on Jaded-Encoding-Thaumaturgy/vs-wheels
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
vapoursynth_mlrt_trt-15.16.post1.tar.gz -
Subject digest:
9e0e4b2e225161cc806870f060695d8afbb81572c300f76e7b07a1d98d81535b - Sigstore transparency entry: 1734597079
- Sigstore integration time:
-
Permalink:
Jaded-Encoding-Thaumaturgy/vs-wheels@95078546e4473fa4f2439d6fbfc2f675805b7832 -
Branch / Tag:
refs/heads/master - Owner: https://github.com/Jaded-Encoding-Thaumaturgy
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
cd-publish.yml@95078546e4473fa4f2439d6fbfc2f675805b7832 -
Trigger Event:
workflow_dispatch
-
Statement type: