Inference Stack Auto-Tuner -- automatically find the fastest inference configuration for any ONNX model on any GPU

These details have not been verified by PyPI

Project links

Project description

ISAT -- Inference Stack Auto-Tuner

One command to find the fastest inference configuration for any ONNX model on any GPU.

ISAT jointly searches across 6 dimensions -- memory strategy, kernel backend, precision, graph transforms, batch size, and thread tuning -- then benchmarks each combination with thermal-aware cooldowns, statistical rigor, and Bayesian optimization.

pip install isat-tuner
isat tune model.onnx --profile cloud

The Problem

Deploying an ONNX model today means manually tweaking dozens of settings:

Setting	Choices	Impact
`HSA_XNACK`	0 or 1	Up to 30% on APUs
`MIGRAPHX_DISABLE_MLIR`	0 or 1	5-15% GEMM performance
`MIGRAPHX_SET_GEMM_PROVIDER`	default, rocblas, hipblaslt	10-25% on GEMM-heavy models
Precision	FP32, FP16, INT8	2-4x throughput
Batch size	1 to 256	Linear throughput scaling
Graph optimization level	0-99	5-20% latency reduction
Inter/intra op threads	1 to N	CPU-side parallelism

A single wrong choice can leave 40%+ performance on the table. With 6 dimensions and 4+ choices each, there are thousands of combinations. Nobody has time to test them all manually.

ISAT does it automatically.

Installation

# From PyPI (works globally for anyone)
pip install isat-tuner

# From GitHub (latest)
pip install git+https://github.com/SID-Devu/isat-tuner.git

# With all optional features
pip install "isat-tuner[all]"

# Platform-specific
pip install "isat-tuner[rocm]"      # ROCm GPU support
pip install "isat-tuner[cuda]"      # NVIDIA CUDA support
pip install "isat-tuner[server]"    # REST API server
pip install "isat-tuner[bayesian]"  # Bayesian optimization (scipy)

# Development
git clone https://github.com/SID-Devu/isat-tuner.git
cd isat && pip install -e ".[dev,all]"

Quick Start

# One-command auto-tune
isat tune model.onnx --warmup 3 --runs 5 --cooldown 60

# Use a deployment profile
isat tune model.onnx --profile edge
isat tune model.onnx --profile cloud
isat tune model.onnx --profile latency

# Bayesian optimization (smarter than grid search)
isat tune model.onnx --bayesian --max-configs 20

# Pareto analysis (latency vs memory vs power)
isat tune model.onnx --pareto latency_ms memory_mb power_w

# CI/CD performance gate
isat tune model.onnx --gate-latency 50 --gate-throughput 100

# Generate Triton server config
isat tune model.onnx --triton-output model_repository/

# Export Prometheus metrics
isat tune model.onnx --prometheus /var/lib/prometheus/isat.prom

# Dry run (see the plan without benchmarking)
isat tune model.onnx --dry-run

# Inspect model without benchmarking
isat inspect model.onnx

# Check your hardware
isat hwinfo

# View past results
isat history --model my_model --top 10

# Launch REST API server
isat serve --port 8000

# List available profiles
isat profiles

Search Dimensions

1. Memory Strategy

Config	Environment	When to use
`xnack0_default`	`HSA_XNACK=0`	Discrete GPUs, no demand paging
`xnack1_default`	`HSA_XNACK=1`	APUs, unified memory
`xnack1_coarse_grain`	XNACK=1 + coarse-grain	Large models on APU
`xnack1_oversubscribe`	XNACK=1 + queue limit	Models exceeding VRAM

2. Kernel Backend

Config	Environment	When to use
`mlir_default`	(default)	General-purpose, fused kernels
`rocblas_explicit`	`MIGRAPHX_DISABLE_MLIR=1`	GEMM-heavy models
`hipblaslt_explicit`	`MIGRAPHX_SET_GEMM_PROVIDER=hipblaslt`	Latest GEMM tuning
`mlir_parallel_N`	`MIGRAPHX_GPU_COMPILE_PARALLEL=N`	Large models, faster compile

3. Precision

Config	Method	Typical speedup
`fp32_native`	Original	Baseline
`fp16_migraphx`	MIGraphX built-in	1.5-2x
`int8_qdq`	ORT static quantization	2-4x

4. Graph Transforms

Config	Transform	Effect
`raw_opt99`	None + full ORT opt	Default
`sim_opt99`	onnxsim + full ORT opt	Remove dead ops
`pinned_opt99`	Freeze dynamic dims	Better kernel selection
`raw_opt1`	Minimal ORT opt	Debugging

5. Batch Size

Auto-explores powers of 2 up to GPU memory limit.

6. Thread Tuning

Explores inter/intra thread counts and sequential vs parallel execution modes.

7. Execution Provider (Multi-Platform)

Auto-detects available providers: MIGraphX, CUDA, TensorRT, OpenVINO, ROCm, DirectML, CPU.

Bayesian Optimization

Instead of brute-force grid search, ISAT can use Bayesian optimization to intelligently explore the most promising regions first:

isat tune model.onnx --bayesian --max-configs 20

Gaussian Process surrogate with Expected Improvement acquisition
Tree-Parzen Estimator fallback when scipy is unavailable
Early stopping when no improvement is found
Explores thousands of combinations by testing only 10-20

Deployment Profiles

Profile	Warmup	Runs	Cooldown	Priority	Use case
`edge`	3	10	30s	Latency	IoT, mobile, embedded
`cloud`	5	20	120s	Throughput	Serving, batch processing
`latency`	5	30	60s	P99	Real-time inference
`throughput`	3	15	120s	FPS	Max batch throughput
`power`	3	10	60s	Perf/watt	Battery, thermal-constrained
`quick`	1	3	15s	Latency	Fast exploration
`exhaustive`	5	50	180s	Latency	Leave no stone unturned
`apu`	3	10	60s	Latency	APU-specific optimization

isat tune model.onnx --profile edge

Output & Reports

File	Description
`isat_report.html`	Interactive HTML dashboard
`isat_report.json`	Machine-readable results for automation
`best_config.sh`	Shell script -- `source` it to apply best env vars
`isat_results.db`	SQLite database of all historical results
`config.pbtxt`	Triton Inference Server config (with `--triton-output`)
`isat.prom`	Prometheus metrics (with `--prometheus`)

REST API Server

isat serve --port 8000

Endpoint	Method	Description
`/api/v1/tune`	POST	Submit a tuning job
`/api/v1/jobs`	GET	List all jobs
`/api/v1/jobs/{id}`	GET	Get job status + results
`/api/v1/jobs/{id}/report`	GET	Get JSON report
`/api/v1/jobs/{id}/report/html`	GET	Get HTML dashboard
`/api/v1/inspect`	POST	Fingerprint a model
`/api/v1/hardware`	GET	Get hardware fingerprint
`/api/v1/history`	GET	Query historical results
`/health`	GET	Health check

# Submit a tuning job
curl -X POST http://localhost:8000/api/v1/tune \
  -H "Content-Type: application/json" \
  -d '{"model_path": "/models/model.onnx", "warmup": 3, "runs": 5}'

# Check job status
curl http://localhost:8000/api/v1/jobs/abc123

Docker

# Build and run
docker-compose up -d

# Or standalone
docker build -t isat .
docker run --device /dev/kfd --device /dev/dri --group-add video \
  -v ./models:/models isat tune /models/model.onnx

CI/CD Integration

Performance Gates

# Fail CI if latency > 50ms or throughput < 100 fps
isat tune model.onnx --gate-latency 50 --gate-throughput 100
echo $?  # 0 = pass, 1 = fail

GitHub Actions

A pre-built workflow is included at .github/workflows/isat-tune.yml. It runs tests on every push and auto-tunes on workflow dispatch.

Regression Detection

ISAT automatically compares current results against historical baselines and flags regressions caused by driver updates, kernel changes, or model modifications.

Statistical Analysis

Outlier Detection

from isat.analysis import detect_outliers, remove_outliers

cleaned, report = remove_outliers(latencies, method="mad", threshold=3.5)
print(f"Removed {report.n_outliers} outliers")

Significance Testing

from isat.analysis import compare_configs

result = compare_configs(latencies_a, latencies_b, confidence=0.95)
print(result.summary)
# "Config B is 12.3% faster than A (p=0.0023, SIGNIFICANT at 95% confidence)"

Pareto Frontier

from isat.analysis import ParetoFrontier

pareto = ParetoFrontier(results, objectives=["latency_ms", "memory_mb", "power_w"])
for point in pareto.frontier:
    print(f"{point.result.config.label}: {point.objectives}")

Using as a Library

from isat.fingerprint import fingerprint_hardware, fingerprint_model
from isat.search import SearchEngine
from isat.benchmark import BenchmarkRunner
from isat.report import ReportGenerator
from isat.database import ResultsDB
from isat.analysis import ParetoFrontier

hw = fingerprint_hardware()
model = fingerprint_model("model.onnx")

engine = SearchEngine(hw, model, warmup=3, runs=5, cooldown=60)
candidates = engine.generate_candidates()

runner = BenchmarkRunner(hw, model, "model.onnx", warmup=3, runs=5, cooldown=60)
results = runner.run_all(candidates)

# Pareto analysis: best tradeoff between latency and memory
pareto = ParetoFrontier(results, objectives=["latency_ms", "memory_mb"])
best = pareto.recommend(priority="latency_ms")

# Save and report
db = ResultsDB("isat_results.db")
db.save_batch(results, hw.fingerprint_hash, model.fingerprint_hash, model.name)

reporter = ReportGenerator(hw, model, results)
reporter.generate_all()

Architecture

isat/
├── cli.py                     # 9 subcommands: tune, inspect, hwinfo, history,
│                               #   export, compare, serve, triton, profiles
├── fingerprint/
│   ├── hardware.py            # GPU detection, memory topology, XNACK
│   └── model.py               # ONNX analysis, op counting, classification
├── search/
│   ├── memory.py              # XNACK, coarse-grain, oversubscription
│   ├── kernel.py              # MLIR, rocBLAS, hipBLASlt, parallel compile
│   ├── precision.py           # FP32, FP16, INT8 quantization
│   ├── graph.py               # onnxsim, shape pinning, ORT opt levels
│   ├── batch.py               # Batch size auto-exploration
│   ├── threading.py           # Inter/intra threads, execution mode
│   ├── provider.py            # Multi-provider (CUDA, TensorRT, OpenVINO, etc.)
│   ├── bayesian.py            # Bayesian optimization (GP + TPE)
│   └── engine.py              # Cartesian product + pruning + orchestration
├── benchmark/
│   ├── runner.py              # ORT session lifecycle + latency measurement
│   ├── stats.py               # P50/P95/P99, mean, std, CV
│   ├── thermal.py             # Temp/power monitoring + cooldown enforcement
│   └── multi_gpu.py           # Multi-GPU discovery + workload distribution
├── analysis/
│   ├── outliers.py            # MAD + IQR outlier detection
│   ├── significance.py        # Welch's t-test for config comparison
│   ├── pareto.py              # Multi-objective Pareto frontier
│   └── regression.py          # Perf regression detection vs baselines
├── report/
│   └── generator.py           # JSON, HTML, console, best_config.sh
├── database/
│   └── store.py               # SQLite results DB + indexed queries
├── server/
│   └── app.py                 # FastAPI REST API with job management
├── integrations/
│   ├── triton.py              # Triton config.pbtxt generator
│   ├── metrics.py             # Prometheus exposition format
│   └── ci.py                  # Performance gates + GitHub Actions workflow
├── profiles/
│   └── presets.py             # 8 deployment profiles
└── utils/
    ├── sysfs.py               # /sys/class/drm, /proc readers
    ├── rocm.py                # rocminfo parsing, rocm-smi wrappers
    └── onnx_utils.py          # ONNX model deep analysis

Requirements

Python >= 3.9
onnxruntime (CPU), onnxruntime-rocm (ROCm), or onnxruntime-gpu (CUDA)
onnx, numpy

Optional:

scipy -- Bayesian optimization
fastapi + uvicorn -- REST API server
onnxsim -- graph simplification
prometheus-client -- metrics export

License

Apache 2.0 -- see LICENSE

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.8.5

Apr 27, 2026

0.8.4

Apr 27, 2026

0.8.3

Apr 27, 2026

0.8.2

Apr 27, 2026

0.8.1

Apr 27, 2026

0.8.0

Apr 27, 2026

0.7.8

Apr 24, 2026

0.7.7

Apr 24, 2026

0.7.6

Apr 24, 2026

0.7.5

Apr 24, 2026

0.7.4

Apr 24, 2026

0.7.3

Apr 24, 2026

0.7.2

Apr 24, 2026

0.7.1

Apr 24, 2026

0.7.0

Apr 24, 2026

0.6.0

Apr 24, 2026

0.5.0

Apr 24, 2026

This version

0.4.0

Apr 24, 2026

0.3.0

Apr 24, 2026

0.2.0

Apr 24, 2026

0.1.1

Apr 24, 2026

0.1.0

Apr 24, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

isat_tuner-0.4.0.tar.gz (115.3 kB view details)

Uploaded Apr 24, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

isat_tuner-0.4.0-py3-none-any.whl (126.3 kB view details)

Uploaded Apr 24, 2026 Python 3

File details

Details for the file isat_tuner-0.4.0.tar.gz.

File metadata

Download URL: isat_tuner-0.4.0.tar.gz
Upload date: Apr 24, 2026
Size: 115.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for isat_tuner-0.4.0.tar.gz
Algorithm	Hash digest
SHA256	`b24001142f39cd7fe0527acb0dc1adf901469b9ed7b239aced14a12158a1d93a`
MD5	`388e098ada5a14e36ff4dd5c185c723e`
BLAKE2b-256	`a899f42f0d474af0452e9ca732ffd35af9764f0a744b270bcd29b79506cf734e`

See more details on using hashes here.

File details

Details for the file isat_tuner-0.4.0-py3-none-any.whl.

File metadata

Download URL: isat_tuner-0.4.0-py3-none-any.whl
Upload date: Apr 24, 2026
Size: 126.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for isat_tuner-0.4.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2716fbea7f51383ff2a5af368f8c2c13070b51d0d98d7125f99757a9462ace5c`
MD5	`78370a350c659d9555430b1920726698`
BLAKE2b-256	`0d82922714b160cf816f7042363103fe42c0cec9b71c065532e3e3815d727c08`

See more details on using hashes here.

isat-tuner 0.4.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

ISAT -- Inference Stack Auto-Tuner

The Problem

Installation

Quick Start

Search Dimensions

1. Memory Strategy

2. Kernel Backend

3. Precision

4. Graph Transforms

5. Batch Size

6. Thread Tuning

7. Execution Provider (Multi-Platform)

Bayesian Optimization

Deployment Profiles

Output & Reports

REST API Server

Docker

CI/CD Integration

Performance Gates

GitHub Actions

Regression Detection

Statistical Analysis

Outlier Detection

Significance Testing

Pareto Frontier

Using as a Library

Architecture

Requirements

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes