ONNX export + benchmark pipeline for micro-models. 186x speedup over PyTorch CPU.

These details have not been verified by PyPI

Project description

micro-onnx

ONNX export + benchmark pipeline for micro-models.

We measured 186× speedup (58,648 qps vs 314 qps) running ONNX Runtime CPU vs PyTorch CPU on a SplineLinear layer. The secret: ONNX bakes weight materialization into the computation graph, eliminating Python overhead that dominates tiny forward passes.

Install

pip install micro-onnx[all]     # everything
pip install micro-onnx           # numpy only (validate without torch/ort)
pip install micro-onnx[export]   # torch + onnx for exporting
pip install micro-onnx[runtime]  # onnxruntime for inference

Quick Start

import torch
import torch.nn as nn
from micro_onnx import export_model, validate_export, benchmark_model

model = nn.Sequential(nn.Linear(64, 32), nn.ReLU(), nn.Linear(32, 10))
sample = torch.randn(1, 64)

# 1. Export
result = export_model(model, sample_input=sample, output_path="model.onnx")
print(f"Exported: {result.file_size:,} bytes, opset {result.opset_version}")

# 2. Validate
validation = validate_export(model, "model.onnx", sample_input=sample, tolerance=1e-6)
print(f"Max diff: {validation.max_diff:.2e}, Pass: {validation.passed}")

# 3. Benchmark
bench = benchmark_model("model.onnx", sample_input=sample, n_runs=1000)
print(f"PyTorch: {bench.pytorch_qps:.0f} qps")
print(f"ONNX:    {bench.onnx_qps:.0f} qps")
print(f"Speedup: {bench.speedup:.1f}×")

Why ONNX for Micro-Models?

Micro-models (small MLPs, embeddings, spline layers) spend most of their time in Python/PyTorch dispatch overhead, not actual math. ONNX Runtime eliminates that overhead:

Model	PyTorch CPU	ONNX Runtime CPU	Speedup
SplineLinear (64→32)	314 qps	58,648 qps	186×
nn.Linear (64→32)	~1,200 qps	~80,000 qps	67×

FP32 vs INT8: When Smaller Isn't Faster

Counter-intuitively, FP32 beats INT8 for micro-models. Why? INT8 quantization adds dequantize/quantize overhead. For models where the actual FLOPs are tiny, that overhead exceeds the savings from smaller weights.

Rule of thumb: If your model has <1M parameters and runs on CPU, try FP32 first. Only quantize if profiling shows compute-bound (not overhead-bound) inference.

opset 17: The Sweet Spot

We default to opset 17 because it's the highest version with broad device support — CPUs, GPUs, mobile NPUs, and embedded accelerators all handle it. Newer opsets add ops that many runtimes don't support yet.

API Reference

`export_model(model, sample_input, opset=17, output_path=None, ...)`

Export any PyTorch nn.Module to ONNX. Returns an ExportResult with file path, size, opset version.

`validate_export(model, onnx_path, sample_input, tolerance=1e-6)`

Run the same input through PyTorch and ONNX Runtime, compare outputs. Returns ValidationResult with max_diff, mean_diff, cosine_similarity, passed.

`benchmark_model(onnx_path, sample_input, n_runs=1000, providers=None)`

Benchmark ONNX Runtime vs PyTorch on the same input. Returns BenchmarkResult with QPS for both, speedup ratio, and timing details.

`optimize_model(onnx_path, output_path=None, level="all")`

Apply ONNX graph optimizations (constant folding, node fusion, dead code elimination). Returns OptimizeResult.

Hardware Profiles

from micro_onnx.profiles import PROFILES
# CPU, GPU, iGPU, NPU profiles with recommended settings per target

Works With Any PyTorch Model

micro-onnx is model-agnostic. Any nn.Module with forward-pass-compatible inputs works — transformers, CNNs, GNNs, custom architectures. The only requirement is that the model's forward pass is compatible with torch.onnx.export.

License

MIT

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0

May 20, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

micro_onnx-0.1.0.tar.gz (13.3 kB view details)

Uploaded May 20, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

micro_onnx-0.1.0-py3-none-any.whl (11.5 kB view details)

Uploaded May 20, 2026 Python 3

File details

Details for the file micro_onnx-0.1.0.tar.gz.

File metadata

Download URL: micro_onnx-0.1.0.tar.gz
Upload date: May 20, 2026
Size: 13.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for micro_onnx-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`96c9c28e762a755e24f52e933d7b1d9c048861770548cb3ba32bcb0d2ba18bb1`
MD5	`6500d250ab040a6881ca91109f5df4ea`
BLAKE2b-256	`b7aaef3842264f00841ac2b9b7e4fc697070c871b1058d37446481c1ed4f88d1`

See more details on using hashes here.

File details

Details for the file micro_onnx-0.1.0-py3-none-any.whl.

File metadata

Download URL: micro_onnx-0.1.0-py3-none-any.whl
Upload date: May 20, 2026
Size: 11.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for micro_onnx-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a4b6058fdf8f1de8926a1a30d041245cba75650e54eae9f7cf6e418c3a1fd9ec`
MD5	`0b8defa68821120bdda884a63847346b`
BLAKE2b-256	`258b559c8bf34ddecac7b69fca85a0ce9189d4d809bd6cc9341651f4e386d79e`

See more details on using hashes here.

micro-onnx 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

micro-onnx

Install

Quick Start

Why ONNX for Micro-Models?

FP32 vs INT8: When Smaller Isn't Faster

opset 17: The Sweet Spot

API Reference

`export_model(model, sample_input, opset=17, output_path=None, ...)`

`validate_export(model, onnx_path, sample_input, tolerance=1e-6)`

`benchmark_model(onnx_path, sample_input, n_runs=1000, providers=None)`

`optimize_model(onnx_path, output_path=None, level="all")`

Hardware Profiles

Works With Any PyTorch Model

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes