ONNX export + benchmark pipeline for micro-models. 186x speedup over PyTorch CPU.
Project description
micro-onnx
ONNX export + benchmark pipeline for micro-models.
We measured 186× speedup (58,648 qps vs 314 qps) running ONNX Runtime CPU vs PyTorch CPU on a SplineLinear layer. The secret: ONNX bakes weight materialization into the computation graph, eliminating Python overhead that dominates tiny forward passes.
Install
pip install micro-onnx[all] # everything
pip install micro-onnx # numpy only (validate without torch/ort)
pip install micro-onnx[export] # torch + onnx for exporting
pip install micro-onnx[runtime] # onnxruntime for inference
Quick Start
import torch
import torch.nn as nn
from micro_onnx import export_model, validate_export, benchmark_model
model = nn.Sequential(nn.Linear(64, 32), nn.ReLU(), nn.Linear(32, 10))
sample = torch.randn(1, 64)
# 1. Export
result = export_model(model, sample_input=sample, output_path="model.onnx")
print(f"Exported: {result.file_size:,} bytes, opset {result.opset_version}")
# 2. Validate
validation = validate_export(model, "model.onnx", sample_input=sample, tolerance=1e-6)
print(f"Max diff: {validation.max_diff:.2e}, Pass: {validation.passed}")
# 3. Benchmark
bench = benchmark_model("model.onnx", sample_input=sample, n_runs=1000)
print(f"PyTorch: {bench.pytorch_qps:.0f} qps")
print(f"ONNX: {bench.onnx_qps:.0f} qps")
print(f"Speedup: {bench.speedup:.1f}×")
Why ONNX for Micro-Models?
Micro-models (small MLPs, embeddings, spline layers) spend most of their time in Python/PyTorch dispatch overhead, not actual math. ONNX Runtime eliminates that overhead:
| Model | PyTorch CPU | ONNX Runtime CPU | Speedup |
|---|---|---|---|
| SplineLinear (64→32) | 314 qps | 58,648 qps | 186× |
| nn.Linear (64→32) | ~1,200 qps | ~80,000 qps | 67× |
FP32 vs INT8: When Smaller Isn't Faster
Counter-intuitively, FP32 beats INT8 for micro-models. Why? INT8 quantization adds dequantize/quantize overhead. For models where the actual FLOPs are tiny, that overhead exceeds the savings from smaller weights.
Rule of thumb: If your model has <1M parameters and runs on CPU, try FP32 first. Only quantize if profiling shows compute-bound (not overhead-bound) inference.
opset 17: The Sweet Spot
We default to opset 17 because it's the highest version with broad device support — CPUs, GPUs, mobile NPUs, and embedded accelerators all handle it. Newer opsets add ops that many runtimes don't support yet.
API Reference
export_model(model, sample_input, opset=17, output_path=None, ...)
Export any PyTorch nn.Module to ONNX. Returns an ExportResult with file path, size, opset version.
validate_export(model, onnx_path, sample_input, tolerance=1e-6)
Run the same input through PyTorch and ONNX Runtime, compare outputs. Returns ValidationResult with max_diff, mean_diff, cosine_similarity, passed.
benchmark_model(onnx_path, sample_input, n_runs=1000, providers=None)
Benchmark ONNX Runtime vs PyTorch on the same input. Returns BenchmarkResult with QPS for both, speedup ratio, and timing details.
optimize_model(onnx_path, output_path=None, level="all")
Apply ONNX graph optimizations (constant folding, node fusion, dead code elimination). Returns OptimizeResult.
Hardware Profiles
from micro_onnx.profiles import PROFILES
# CPU, GPU, iGPU, NPU profiles with recommended settings per target
Works With Any PyTorch Model
micro-onnx is model-agnostic. Any nn.Module with forward-pass-compatible inputs works — transformers, CNNs, GNNs, custom architectures. The only requirement is that the model's forward pass is compatible with torch.onnx.export.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file micro_onnx-0.1.0.tar.gz.
File metadata
- Download URL: micro_onnx-0.1.0.tar.gz
- Upload date:
- Size: 13.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
96c9c28e762a755e24f52e933d7b1d9c048861770548cb3ba32bcb0d2ba18bb1
|
|
| MD5 |
6500d250ab040a6881ca91109f5df4ea
|
|
| BLAKE2b-256 |
b7aaef3842264f00841ac2b9b7e4fc697070c871b1058d37446481c1ed4f88d1
|
File details
Details for the file micro_onnx-0.1.0-py3-none-any.whl.
File metadata
- Download URL: micro_onnx-0.1.0-py3-none-any.whl
- Upload date:
- Size: 11.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a4b6058fdf8f1de8926a1a30d041245cba75650e54eae9f7cf6e418c3a1fd9ec
|
|
| MD5 |
0b8defa68821120bdda884a63847346b
|
|
| BLAKE2b-256 |
258b559c8bf34ddecac7b69fca85a0ce9189d4d809bd6cc9341651f4e386d79e
|