Skip to main content

ONNX export + benchmark pipeline for micro-models. 186x speedup over PyTorch CPU.

Project description

micro-onnx

ONNX export + benchmark pipeline for micro-models.

We measured 186× speedup (58,648 qps vs 314 qps) running ONNX Runtime CPU vs PyTorch CPU on a SplineLinear layer. The secret: ONNX bakes weight materialization into the computation graph, eliminating Python overhead that dominates tiny forward passes.

Install

pip install micro-onnx[all]     # everything
pip install micro-onnx           # numpy only (validate without torch/ort)
pip install micro-onnx[export]   # torch + onnx for exporting
pip install micro-onnx[runtime]  # onnxruntime for inference

Quick Start

import torch
import torch.nn as nn
from micro_onnx import export_model, validate_export, benchmark_model

model = nn.Sequential(nn.Linear(64, 32), nn.ReLU(), nn.Linear(32, 10))
sample = torch.randn(1, 64)

# 1. Export
result = export_model(model, sample_input=sample, output_path="model.onnx")
print(f"Exported: {result.file_size:,} bytes, opset {result.opset_version}")

# 2. Validate
validation = validate_export(model, "model.onnx", sample_input=sample, tolerance=1e-6)
print(f"Max diff: {validation.max_diff:.2e}, Pass: {validation.passed}")

# 3. Benchmark
bench = benchmark_model("model.onnx", sample_input=sample, n_runs=1000)
print(f"PyTorch: {bench.pytorch_qps:.0f} qps")
print(f"ONNX:    {bench.onnx_qps:.0f} qps")
print(f"Speedup: {bench.speedup:.1f}×")

Why ONNX for Micro-Models?

Micro-models (small MLPs, embeddings, spline layers) spend most of their time in Python/PyTorch dispatch overhead, not actual math. ONNX Runtime eliminates that overhead:

Model PyTorch CPU ONNX Runtime CPU Speedup
SplineLinear (64→32) 314 qps 58,648 qps 186×
nn.Linear (64→32) ~1,200 qps ~80,000 qps 67×

FP32 vs INT8: When Smaller Isn't Faster

Counter-intuitively, FP32 beats INT8 for micro-models. Why? INT8 quantization adds dequantize/quantize overhead. For models where the actual FLOPs are tiny, that overhead exceeds the savings from smaller weights.

Rule of thumb: If your model has <1M parameters and runs on CPU, try FP32 first. Only quantize if profiling shows compute-bound (not overhead-bound) inference.

opset 17: The Sweet Spot

We default to opset 17 because it's the highest version with broad device support — CPUs, GPUs, mobile NPUs, and embedded accelerators all handle it. Newer opsets add ops that many runtimes don't support yet.

API Reference

export_model(model, sample_input, opset=17, output_path=None, ...)

Export any PyTorch nn.Module to ONNX. Returns an ExportResult with file path, size, opset version.

validate_export(model, onnx_path, sample_input, tolerance=1e-6)

Run the same input through PyTorch and ONNX Runtime, compare outputs. Returns ValidationResult with max_diff, mean_diff, cosine_similarity, passed.

benchmark_model(onnx_path, sample_input, n_runs=1000, providers=None)

Benchmark ONNX Runtime vs PyTorch on the same input. Returns BenchmarkResult with QPS for both, speedup ratio, and timing details.

optimize_model(onnx_path, output_path=None, level="all")

Apply ONNX graph optimizations (constant folding, node fusion, dead code elimination). Returns OptimizeResult.

Hardware Profiles

from micro_onnx.profiles import PROFILES
# CPU, GPU, iGPU, NPU profiles with recommended settings per target

Works With Any PyTorch Model

micro-onnx is model-agnostic. Any nn.Module with forward-pass-compatible inputs works — transformers, CNNs, GNNs, custom architectures. The only requirement is that the model's forward pass is compatible with torch.onnx.export.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

micro_onnx-0.1.0.tar.gz (13.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

micro_onnx-0.1.0-py3-none-any.whl (11.5 kB view details)

Uploaded Python 3

File details

Details for the file micro_onnx-0.1.0.tar.gz.

File metadata

  • Download URL: micro_onnx-0.1.0.tar.gz
  • Upload date:
  • Size: 13.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for micro_onnx-0.1.0.tar.gz
Algorithm Hash digest
SHA256 96c9c28e762a755e24f52e933d7b1d9c048861770548cb3ba32bcb0d2ba18bb1
MD5 6500d250ab040a6881ca91109f5df4ea
BLAKE2b-256 b7aaef3842264f00841ac2b9b7e4fc697070c871b1058d37446481c1ed4f88d1

See more details on using hashes here.

File details

Details for the file micro_onnx-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: micro_onnx-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 11.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for micro_onnx-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a4b6058fdf8f1de8926a1a30d041245cba75650e54eae9f7cf6e418c3a1fd9ec
MD5 0b8defa68821120bdda884a63847346b
BLAKE2b-256 258b559c8bf34ddecac7b69fca85a0ce9189d4d809bd6cc9341651f4e386d79e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page