aitune

NVIDIA AITune

These details have been verified by PyPI

Maintainers

dynamo-ops nv-anants nvidia nv-mesharma saturley-hall-nv

These details have not been verified by PyPI

Project links

Project description

NVIDIA AITune is an inference toolkit designed for tuning and deploying Deep Learning models with a focus on NVIDIA GPUs. It provides model tuning capabilities through compilation and conversion paths that can significantly improve inference speed and efficiency across various AI workloads including Computer Vision, Natural Language Processing, Speech Recognition, and Generative AI.

The toolkit enables seamless tuning of PyTorch models and pipelines using various backends such as TensorRT, Torch-TensorRT, TorchAO, and Torch Inductor through a single Python API. The resulting tuned models are ready for deployment in production environments.

NVIDIA AITune works with your environment — relying first on your software versions — and selects the best-performing backend for your software and hardware setup, guiding you to supported technologies.

Note: This is the first release. The API may change in future versions.

Features at Glance

The distinct capabilities of NVIDIA AITune are summarized in the feature matrix:

Feature	Description
Ease-of-use	Single line of code to run all possible tuning paths directly from your source code
Wide Backend Support	Compatible with various tuning backends including TensorRT, Torch-TensorRT, TorchAO, and Torch Inductor
Model Tuning	Enhance the performance of models such as ResNET and BERT for efficient inference deployment
Pipeline Tuning	Streamline Python code pipelines for models such as Stable Diffusion and Flux using seamless model wrapping and tuning
Model Export and Conversion	Automate the process of exporting and converting models between various formats with focus on TensorRT and Torch-TensorRT
Correctness Testing	Ensures tuned models produce correct outputs by validating on provided data samples
Performance Profiling	Profiles models to select the optimal backend based on performance metrics such as latency and throughput
Model Persistence	Save and load tuned models for production deployment with flexible storage options
JIT tuning	Just-in-time tuning of a model or a pipeline without any code changes required

When to Use AITune

AITune provides compute graph optimizations for PyTorch models at the nn.Module level. Use AITune when you want automated inference optimization with minimal code changes.

If your model is supported by a dedicated serving framework and benefits from runtime optimizations (e.g. continuous batching, speculative decoding), use frameworks like TensorRT-LLM, vLLM, or SGLang for best performance. Use AITune for general PyTorch models and pipelines that lack such specialized tooling.

Prerequisites

Before proceeding with the installation of NVIDIA AITune, ensure your system meets the following criteria:

Operating System: Linux (Ubuntu 22.04+ recommended)
Python: Version 3.10 or newer
PyTorch: Version 2.7 or newer
TensorRT: Version 10.5.0 or higher (for TensorRT backend)
NVIDIA GPU: Required for GPU-accelerated tuning

You can use NGC Containers for PyTorch which contain all necessary dependencies:

PyTorch NGC Container

Install

NVIDIA AITune can be installed from pypi.org.

Installing from PyPI (Recommended)

pip install --extra-index-url https://pypi.nvidia.com aitune

Installing from Source

# Clone the repository
git clone https://github.com/ai-dynamo/aitune
cd aitune
pip install --extra-index-url https://pypi.nvidia.com .

# Or use editable mode for development
pip install --extra-index-url https://pypi.nvidia.com -e .

Quick Start

This quick start provides examples of tuning and deployment paths available in NVIDIA AITune.

NVIDIA AITune enables seamless tuning of models for deployment (for example, converting them to TensorRT) without requiring changes to your original Python pipelines.

The code below demonstrates Stable Diffusion pipeline tuning. First, install the required third-party dependencies:

pip install transformers diffusers torch

Then initialize the pipeline:

# HuggingFace dependencies
import torch
from diffusers import DiffusionPipeline

# Import AITune
import aitune.torch as ait

# Initialize pipeline
pipe = DiffusionPipeline.from_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5", torch_dtype=torch.float16)
pipe.to("cuda")

Next, inspect the pipeline components and display the summary:

# Prepare input data
input_data = [{"prompt": "A beautiful landscape with mountains and a lake"}]

# Inspect pipeline to get modules
modules_info = ait.inspect(pipe, input_data)

# Display modules info
modules_info.describe()

Finally, wrap the selected modules and tune within the pipeline:

# Wrap modules for tuning
modules = modules_info.get_modules()
pipe = ait.wrap(pipe, modules)

# Tune pipeline
ait.tune(pipe, input_data)

At this point, you can use the pipeline to generate predictions with the tuned models directly in Python:

# Run inference on tuned pipeline
images = pipe(["A beautiful landscape with mountains and a lake"])
image = images[0][0]

# Save image for preview
image.save("landscape.png")

Once the pipeline has been tuned, you can save the best-performing version of the modules for later deployment:

ait.save(pipe, "tuned_pipe.ait")

And load the tuned pipeline directly:

pipe = DiffusionPipeline.from_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5", torch_dtype=torch.float16)
pipe.to("cuda")
ait.load(pipe, "tuned_pipe.ait")

Core Functionalities

Inspect

The inspect function allows you to analyze PyTorch models and pipelines to understand their structure, parameters, and execution flow. It provides detailed insights into model architecture and helps identify tuning opportunities.

import aitune.torch as ait
import torch
import torch.nn as nn

class SimpleModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = nn.Linear(100, 10)

    def forward(self, x):
        return self.linear(x)

model = SimpleModel()
dataset = torch.randn(1, 100)

# Inspect the model
ait.inspect(model, dataset)

Tune

The tune function is the core functionality that automatically tunes your PyTorch models and pipelines for optimal inference performance. It supports various backends and automatically selects the best performing configuration.

import aitune.torch as ait
import torch

# Define your model
model = SimpleModel()

# Wrap the model
model = ait.Module(model)

# Define inference function
def inference_fn(x):
    return model(x)

# Tune the model
ait.tune(
    func=inference_fn,
    dataset=torch.randn(1, 100),
)

Save

The save function allows you to persist tuned models for later use. It stores tuned and original module weights together in a single file with a .ait extension. Apart from the checkpoint file, there is also a SHA hash file.

# Save the tuned model
import aitune.torch as ait
ait.save(model, "tuned_model.ait")

Example output:

checkpoints/
├── tuned_model
├── tuned_model.ait
└── tuned_model_sha256_sums.txt

You can copy the checkpoint file tuned_model.ait and SHA sums file to a target host or folder to use it for inference.

Note: We recommend deploying *.ait package on the same hardware as tuning has been performed for functional and performance compatibility.

Load

The load function enables you to load previously tuned models from a checkpoint file.

# Load the tuned model
import aitune.torch as ait
tuned_model = ait.load(model, "tuned_model.ait")

On first load, the checkpoint file is decompressed and the tuned and original module weights are loaded. Subsequent loads will use the decompressed weights from the same folder.

Backends

NVIDIA AITune supports multiple tuning backends, each with different characteristics and use cases. The backends align with a common interface for the build and inference process.

TensorRT Backend

The TensorRT backend provides highly optimized inference using NVIDIA’s TensorRT engine. It offers the best performance for production deployments. The backend integrates TensorRT Model Optimizer in a seamless flow.

from aitune.torch.backend import TensorRTBackend, TensorRTBackendConfig, ONNXAutoCastConfig

config = TensorRTBackendConfig(quantization_config=ONNXAutoCastConfig())
backend = TensorRTBackend(config)

CUDA Graphs Support

The TensorRT backend supports CUDA Graphs for reduced CPU overhead and improved inference performance. This feature is disabled by default.

from aitune.torch.backend import TensorRTBackend, TensorRTBackendConfig

config = TensorRTBackendConfig(use_cuda_graphs=True)
backend = TensorRTBackend(config)

Torch-TensorRT Backend (JIT)

Torch-TensorRT JIT backend integrates TensorRT tuning directly into PyTorch, providing seamless tuning without model conversion through torch.compile.

import torch
from aitune.torch.backend import TorchTensorRTJitBackend, TorchTensorRTJitBackendConfig, TorchTensorRTConfig

config = TorchTensorRTJitBackendConfig(compile_config=TorchTensorRTConfig(enabled_precisions={torch.float16}))
backend = TorchTensorRTJitBackend(config)

Torch-TensorRT Backend (AOT)

Torch-TensorRT backend integrates TensorRT tuning directly into PyTorch, providing seamless tuning without model conversion through torch_tensorrt.compile.

import torch
from aitune.torch.backend import TorchTensorRTAotBackend, TorchTensorRTAotBackendConfig, TorchTensorRTConfig

config = TorchTensorRTAotBackendConfig(compile_config=TorchTensorRTConfig(enabled_precisions={torch.float16}))
backend = TorchTensorRTAotBackend(config)

TorchAO Backend

TorchAO backend leverages PyTorch’s AO (Accelerated Optimization) framework for model tuning.

from aitune.torch.backend import TorchAOBackend

backend = TorchAOBackend()

Torch Inductor Backend

Torch Inductor backend uses PyTorch’s Inductor compiler for model tuning.

from aitune.torch.backend import TorchInductorBackend

backend = TorchInductorBackend()

Tune Strategies

NVIDIA AITune provides different strategies for selecting the optimal backend configuration. The strategies align with a common interface for the tuning process.

Not every backend can tune every model — each relies on different compilation technology with its own limitations (e.g., ONNX export for TensorRT, graph breaks in Torch Inductor, unsupported layers in TorchAO). Strategies control how AITune handles this.

FirstWinsStrategy

Tries backends in priority order and returns the first one that succeeds. If a backend fails, the strategy moves on to the next candidate instead of aborting.

from aitune.torch.tune_strategy import FirstWinsStrategy

strategy = FirstWinsStrategy(backends=[TensorRTBackend(), TorchInductorBackend()])

OneBackendStrategy

Uses exactly one backend, failing immediately with the original error if it cannot build. Use this when you have already validated that a backend works and want deterministic behavior. Unlike FirstWinsStrategy with a single backend, OneBackendStrategy surfaces the original exception rather than catching it.

from aitune.torch.tune_strategy import OneBackendStrategy

strategy = OneBackendStrategy(backend=TensorRTBackend())

HighestThroughputStrategy

Profiles all compatible backends and selects the fastest. Use this when maximum throughput matters and you can afford longer tuning time.

from aitune.torch.tune_strategy import HighestThroughputStrategy

strategy = HighestThroughputStrategy(backends=[TensorRTBackend(), TorchInductorBackend(), TorchEagerBackend()])

Examples

We offer comprehensive examples that showcase the utilization of NVIDIA AITune’s diverse features. These examples are designed to elucidate the processes of tuning, profiling, testing, and deployment of models.

For detailed examples and step-by-step guides, please visit our Examples Catalog. The catalog includes practical implementations for various AI workloads including computer vision, natural language processing, speech recognition, and generative AI models.

Useful Links

Links valid after first official release:

Project details

These details have been verified by PyPI

Maintainers

dynamo-ops nv-anants nvidia nv-mesharma saturley-hall-nv

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.3.0

Mar 16, 2026

0.0.0.dev0 pre-release

Jun 11, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

aitune-0.3.0-py3-none-any.whl (183.7 kB view details)

Uploaded Mar 16, 2026 Python 3

File details

Details for the file aitune-0.3.0-py3-none-any.whl.

File metadata

Download URL: aitune-0.3.0-py3-none-any.whl
Upload date: Mar 16, 2026
Size: 183.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for aitune-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6fad03c73a211a58da5a1654dd142fdb4b501aeaf22e665362f3c6eca5b81b68`
MD5	`15b441bb5c051a588e71489889a2b1be`
BLAKE2b-256	`8c495b247d1ede5a195aab52bff2dd9a742dbd620670d725a952713637bdfb3a`

See more details on using hashes here.

aitune 0.3.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Features at Glance

When to Use AITune

Prerequisites

Install

Installing from PyPI (Recommended)

Installing from Source

Quick Start

Core Functionalities

Inspect

Tune

Save

Load

Backends

TensorRT Backend

CUDA Graphs Support

Torch-TensorRT Backend (JIT)

Torch-TensorRT Backend (AOT)

TorchAO Backend

Torch Inductor Backend

Tune Strategies

FirstWinsStrategy

OneBackendStrategy

HighestThroughputStrategy

Examples

Useful Links

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes