Skip to main content

TensorRT-based for RTX GPU inference plugin for VapourSynth

Project description

VapourSynth-MLRT-TRT-RTX

This package contains the TensorRT-based for RTX GPU inference backend implementation of the vs-mlrt plugin.

Installation

pip install vapoursynth-mlrt-trt-rtx

Building from source

Requirements

  • C++ Compiler: C++20 compatible (e.g. MSVC 2019+, GCC, Clang)
  • Dependencies:
    • CUDAToolkit
    • TensorRT-RTX SDK
  • Environment Variables:
    • TENSORRT_RTX_HOME: Path to the TensorRT-RTX installation directory (must contain include, lib, and bin).

Compilation

Set the TENSORRT_RTX_HOME environment variable before running the build:

$env:TENSORRT_RTX_HOME="C:\Path\To\TensorRT-RTX"

uv build --package vapoursynth-mlrt-trt-rtx

On Linux:

export TENSORRT_RTX_HOME="/path/to/TensorRT-RTX"

uv build --package vapoursynth-mlrt-trt-rtx

Detailed parameter information from the parent project follows.


VapourSynth TensorRT & TensorRT-RTX

The vs-tensorrt plugin provides optimized CUDA runtime for some popular AI filters.

Usage

Prototype: core.{trt, trt_rtx}.Model(clip[] clips, string engine_path[, int[] overlap, int[] tilesize, int device_id=0, bint use_cuda_graph=False, int num_streams=1, int verbosity=2, string flexible_output_prop=""])

Arguments:

  • clip[] clips: the input clips, only 32-bit floating point RGB or GRAY clips are supported. For model specific input requirements, please consult our wiki.

  • string engine_path: the path to the prebuilt engine (see below)

  • int[] overlap: some networks (e.g. CNN) support arbitrary input shape where other networks might only support fixed input shape and the input clip must be processed in tiles. The overlap argument specifies the overlapping (horizontal and vertical, or both, in pixels) between adjacent tiles to minimize boundary issues. Please refer to network specific docs on the recommended overlapping size.

  • int[] tilesize: Even for CNN where arbitrary input sizes could be supported, sometimes the network does not work well for the entire range of input dimensions, and you have to limit the size of each tile. This parameter specify the tile size (horizontal and vertical, or both, including the overlapping). Please refer to network specific docs on the recommended tile size.

  • int device_id: Specifies the GPU device id to use, default 0. Requires Nvidia GPUs with second-generation Kepler architecture onwards.

  • bint use_cuda_graph: whether to use CUDA Graphs to improve performance and reduce CPU overhead.

  • int num_streams: number of concurrent CUDA streams to use. Default 1. Increase if GPU not saturated.

  • verbosity: The verbosity level of TensorRT runtime. The message writes to stderr. 0: Internal error. 1: Application error. 2: Warning. 3: Informational messages with instructional information. 4: Verbose messages with debugging information.

  • string flexible_output_prop: used to support onnx models with arbitrary number of output planes.

    from typing import TypedDict
    
    class Output(TypedDict):
        clip: vs.VideoNode
        num_planes: int
    
    prop = "planes" # arbitrary non-empty string
    output = core.trt.Model(src, engine_path, flexible_output_prop=prop) # type: Output
    
    clip = output["clip"]
    num_planes = output["num_planes"]
    
    output_planes = [
        clip.std.PropToClip(prop=f"{prop}{i}")
        for i in range(num_planes)
    ] # type: list[vs.VideoNode]
    

When overlap and tilesize are not specified, the filter will internally try to resize the network to fit the input clips. This might not always work (for example, the network might require the width to be divisible by 8), and the filter will error out in this case.

The general rule is to either:

  1. left out overlap, tilesize at all and just process the input frame in one tile, or
  2. set all three so that the frame is processed in tilesize[0] x tilesize[1] tiles, and adjacent tiles will have an overlap of overlap[0] x overlap[1] pixels on each direction. The overlapped region will be throw out so that only internal output pixels are used.

Instructions for TensorRT

Build engine with dynamic shape support

  • Requires models with built-in dynamic shape support, e.g. waifu2x_v3.7z and dpir_v3.7z.
  1. Build engine

    trtexec --onnx=drunet_gray.onnx --minShapes=input:1x2x8x8 --optShapes=input:1x2x64x64 --maxShapes=input:1x2x1080x1920 --saveEngine=dpir_gray_1080p_dynamic.engine
    

    The engine will be optimized for 64x64 input and can be applied to eligible inputs with shape from 8x8 to 1920x1080 by specifying parameter tilesize in the trt plugin.

    Also check trtexec useful arguments

Run model

In vpy script:

# DPIR
src = core.std.BlankClip(src, width=640, height=360, format=vs.GRAYS)
sigma = 10.0
flt = core.trt.Model([src, core.std.BlankClip(src, color=sigma/255.0)], engine_path="dpir_gray_1080p_dynamic.engine", tilesize=[640, 360])

trtexec useful arguments

  • --workspace=N: Set workspace size in megabytes (default = 16)

  • --fp16: Enable fp16 precision, in addition to fp32 (default = disabled)

  • --noTF32: Disable tf32 precision (default is to enable tf32, in addition to fp32, Ampere only)

  • --device=N: Select cuda device N (default = 0)

  • --timingCacheFile=<file>: Save/load the serialized global timing cache

  • --verbose: Use verbose logging (default = false)

  • --profilingVerbosity=mode: Specify profiling verbosity.

    mode ::= layer_names_only|detailed|none
    

    (default = layer_names_only)

  • --tacticSources=tactics: Specify the tactics to be used by adding (+) or removing (-) tactics from the default

    tactic sources (default = all available tactics).

    Note: Currently only cuDNN, cuBLAS and cuBLAS-LT are listed as optional tactics.

    Tactic Sources:

    tactics ::= [","tactic]
    tactic  ::= (+|-)lib
    lib     ::= "CUBLAS"|"CUBLAS_LT"|"CUDNN"
    

    For example, to disable cudnn and enable cublas: --tacticSources=-CUDNN,+CUBLAS

  • --useCudaGraph: Use CUDA graph to capture engine execution and then launch inference (default = disabled). This flag may be ignored if the graph capture fails.

  • --noDataTransfers: Disable DMA transfers to and from device (default = enabled).

  • --saveEngine=<file>: Save the serialized engine

  • --loadEngine=<file>: Load a serialized engine

Instructions for TensorRT-RTX

Replace the trtexec executable by the tensorrt_rtx executable. Some options may not be supported, e.g. --fp16.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vapoursynth_mlrt_trt_rtx-15.16.tar.gz (674.6 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

vapoursynth_mlrt_trt_rtx-15.16-py3-none-win_amd64.whl (84.0 MB view details)

Uploaded Python 3Windows x86-64

vapoursynth_mlrt_trt_rtx-15.16-py3-none-manylinux_2_39_aarch64.whl (281.7 kB view details)

Uploaded Python 3manylinux: glibc 2.39+ ARM64

vapoursynth_mlrt_trt_rtx-15.16-py3-none-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (274.9 kB view details)

Uploaded Python 3manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

File details

Details for the file vapoursynth_mlrt_trt_rtx-15.16.tar.gz.

File metadata

  • Download URL: vapoursynth_mlrt_trt_rtx-15.16.tar.gz
  • Upload date:
  • Size: 674.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for vapoursynth_mlrt_trt_rtx-15.16.tar.gz
Algorithm Hash digest
SHA256 47efc920d5d629568af1ad7cbc589557f9452c080ae37c1b18a03004fba0a2c9
MD5 988789a1769980d4031056830d047a6c
BLAKE2b-256 6475dd32cbe0061417fbdf7e9eef950ecf96969a85914ed34262e8710544446b

See more details on using hashes here.

Provenance

The following attestation bundles were made for vapoursynth_mlrt_trt_rtx-15.16.tar.gz:

Publisher: cd-publish.yml on Jaded-Encoding-Thaumaturgy/vs-wheels

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file vapoursynth_mlrt_trt_rtx-15.16-py3-none-win_amd64.whl.

File metadata

File hashes

Hashes for vapoursynth_mlrt_trt_rtx-15.16-py3-none-win_amd64.whl
Algorithm Hash digest
SHA256 b5916c53463aea3974c2b93fcfca888ef75f700730df3c57a0fffe240f64894d
MD5 c811e8e656424fc404f38a65919f4d28
BLAKE2b-256 d49f644586f285d086aa70cbe791311c1bf43183cab2107bc25387d7507c6ff9

See more details on using hashes here.

Provenance

The following attestation bundles were made for vapoursynth_mlrt_trt_rtx-15.16-py3-none-win_amd64.whl:

Publisher: cd-publish.yml on Jaded-Encoding-Thaumaturgy/vs-wheels

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file vapoursynth_mlrt_trt_rtx-15.16-py3-none-manylinux_2_39_aarch64.whl.

File metadata

File hashes

Hashes for vapoursynth_mlrt_trt_rtx-15.16-py3-none-manylinux_2_39_aarch64.whl
Algorithm Hash digest
SHA256 f79cc383bf0c6bbb89835aa9cf589bbf8fbe47c507dbc42c1bb5a45020198897
MD5 a435bbdc1afe17d9d93bc76f8bc75fbb
BLAKE2b-256 fe5f3dd5ee886e09bf474ea3bd6435fbcfb9245961b3fc15b22af982df69a30c

See more details on using hashes here.

Provenance

The following attestation bundles were made for vapoursynth_mlrt_trt_rtx-15.16-py3-none-manylinux_2_39_aarch64.whl:

Publisher: cd-publish.yml on Jaded-Encoding-Thaumaturgy/vs-wheels

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file vapoursynth_mlrt_trt_rtx-15.16-py3-none-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for vapoursynth_mlrt_trt_rtx-15.16-py3-none-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 cb88a0a403bd66ce44bca659cb606f7f18e1bf2211cc543a95a0da80b4664d48
MD5 a68f660847f901ab17880a31c6bd22b7
BLAKE2b-256 4751aa23d4aa9af8cd6b32237e359f1de28e8ee0f462842d0d1eb6f351873e88

See more details on using hashes here.

Provenance

The following attestation bundles were made for vapoursynth_mlrt_trt_rtx-15.16-py3-none-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl:

Publisher: cd-publish.yml on Jaded-Encoding-Thaumaturgy/vs-wheels

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page