Skip to main content

A tool for parsing, editing, optimizing, and profiling ONNX models.

Project description

📄 简体中文 | ✨ New Project: AI-Enhancement-Filter (powered by onnx-tool)


Python 3.6+ PyPI Version License

onnx-tool

A comprehensive toolkit for analyzing, optimizing, and transforming ONNX models with advanced capabilities for LLMs, diffusion models, and computer vision architectures.

  • LLM Optimization: Build and profile large language models with KV cache analysis (example)
  • Graph Transformation:
    • Constant folding (docs)
    • Operator fusion (docs)
  • Advanced Profiling:
    • Rapid shape inference
    • MACs/parameter statistics with sparsity awareness
  • Compute Graph Engine: Runtime shape computation with minimal overhead (details)
  • Memory Compression:
    • Activation memory optimization (up to 95% reduction)
    • Weight quantization (FP16, INT8/INT4 with per-tensor/channel/block schemes)
  • Quantization & Sparsity: Full support for quantized and sparse model analysis

🤖 Supported Model Architectures

Domain Models
NLP BERT, T5, GPT, LLaMa, MPT (TransformerModel)
Diffusion Stable Diffusion (TextEncoder, VAE, UNet)
CV Detic, BEVFormer, SSD300_VGG16, ConvNeXt, Mask R-CNN, Silero VAD
Audio Sovits, LPCNet

⚡ Build & Profile LLMs in Seconds

Profile 10 Hugging Face models in under one second. Export ONNX models with llama.cpp-like simplicity (code).

Model Statistics (1k token input)

model name(1k input) MACs(G) Parameters(G) KV Cache(G)
gpt-j-6b 6277 6.05049 0.234881
yi-1.5-34B 35862 34.3889 0.125829
microsoft/phi-2 2948 2.77944 0.167772
Phi-3-mini-4k 4083 3.82108 0.201327
Phi-3-small-8k-instruct 7912 7.80167 0.0671089
Phi-3-medium-4k-instruct 14665 13.9602 0.104858
Llama3-8B 8029 8.03026 0.0671089
Llama-3.1-70B-Japanese-Instruct-2407 72888 70.5537 0.167772
QWen-7B 7509 7.61562 0.0293601
Qwen2_72B_Instruct 74895 72.7062 0.167772

Latency Estimation (4-bit weights, 16-bit KV cache)

model_type_4bit_kv16bit memory_size(GB) Ultra-155H_TTFT Ultra-155H_TPOT Arc-A770_TTFT Arc-A770_TPOT H100-PCIe_TTFT H100-PCIe_TPOT
gpt-j-6b 3.75678 1.0947 0.041742 0.0916882 0.00670853 0.0164015 0.00187839
yi-1.5-34B 19.3369 5.77095 0.214854 0.45344 0.0345302 0.0747854 0.00966844
microsoft/phi-2 1.82485 0.58361 0.0202761 0.0529628 0.00325866 0.010338 0.000912425
Phi-3-mini-4k 2.49649 0.811173 0.0277388 0.0745356 0.00445802 0.0147274 0.00124825
Phi-3-small-8k-instruct 4.2913 1.38985 0.0476811 0.117512 0.00766303 0.0212535 0.00214565
Phi-3-medium-4k-instruct 7.96977 2.4463 0.088553 0.198249 0.0142317 0.0340576 0.00398489
Llama3-8B 4.35559 1.4354 0.0483954 0.123333 0.00777784 0.0227182 0.00217779
Llama-3.1-70B-Japanese-Instruct-2407 39.4303 11.3541 0.438114 0.868475 0.0704112 0.137901 0.0197151
QWen-7B 4.03576 1.34983 0.0448417 0.11722 0.00720671 0.0218461 0.00201788
Qwen2_72B_Instruct 40.5309 11.6534 0.450343 0.890816 0.0723766 0.14132 0.0202654

💡 Latencies computed from hardware specs – no actual inference required


🔧 Basic Parsing & Editing

Intuitive API for model manipulation:

from onnx_tool import Model

model = Model('model.onnx')          # Load any ONNX file
graph = model.graph                  # Access computation graph
node = graph.nodemap['Conv_0']       # Modify operator attributes
tensor = graph.tensormap['weight']   # Edit tensor data/types
model.save_model('modified.onnx')    # Persist changes

See comprehensive examples in benchmark/examples.py.


📊 Shape Inference & Profiling

All profiling relies on precise shape inference:

Shape inference visualization

Profiling Capabilities

  • Standard profiling: MACs, parameters, memory footprint
  • Sparse-aware profiling: Quantify sparsity impact on compute

MACs profiling table Sparse model profiling

📚 Learn more:


⚙️ Compute Graph & Shape Engine

Transform exported ONNX graphs into efficient Compute Graphs by removing shape-calculation overhead:

Compute graph transformation

  • Compute Graph: Minimal graph containing only compute operations
  • Shape Engine: Runtime shape resolver for dynamic models

Use Cases:

  • Integration with custom inference engines (guide)
  • Shape regression testing (example)

💾 Memory Compression

Activation Memory Compression

Reuses temporary buffers to minimize peak memory usage – critical for LLMs and high-res CV models.

model Native Memory Size(MB) Compressed Memory Size(MB) Compression Ratio(%)
StableDiffusion(VAE_encoder) 14,245 540 3.7
StableDiffusion(VAE_decoder) 25,417 1,140 4.48
StableDiffusion(Text_encoder) 215 5 2.5
StableDiffusion(UNet) 36,135 2,232 6.2
GPT2 40 2 6.9
BERT 2,170 27 1.25

✅ Typical models achieve >90% activation memory reduction
📌 Implementation: benchmark/compression.py

Weight Compression

Essential for deploying large models on memory-constrained devices:

Quantization Scheme Size vs FP32 Example (7B model)
FP32 (baseline) 1.00× 28 GB
FP16 0.50× 14 GB
INT8 (per-channel) 0.25× 7 GB
INT4 (block=32, symmetric) – llama.cpp 0.156× 4.4 GB

Supported schemes:

  • ✅ FP16
  • ✅ INT8: symmetric/asymmetric × per-tensor/channel/block
  • ✅ INT4: symmetric/asymmetric × per-tensor/channel/block

📌 See benchmark/examples.py for implementation examples.


🚀 Installation

# PyPI (recommended)
pip install onnx-tool

# Latest development version
pip install --upgrade git+https://github.com/ThanatosShinji/onnx-tool.git

Requirements: Python ≥ 3.6

⚠️ Troubleshooting: If ONNX installation fails, try:

pip install onnx==1.8.1 && pip install onnx-tool

Known Issues

  • Loop op is not supported
  • Sequence type is not supported

📈 Model Zoo Results

Comprehensive profiling of ONNX Model Zoo and SOTA models. Input shapes defined in data/public/config.py.

📥 Download pre-profiled models (with full tensor shapes):

Model Params(M) MACs(M)
GPT-J 1 layer 464 173,398
MPT 1 layer 261 79,894
text_encoder 123.13 6,782
UNet2DCondition 859.52 888,870
VAE_encoder 34.16 566,371
VAE_decoder 49.49 1,271,959
SqueezeNet 1.0 1.23 351
AlexNet 60.96 665
GoogleNet 6.99 1,606
googlenet_age 5.98 1,605
LResNet100E-IR 65.22 12,102
BERT-Squad 113.61 22,767
BiDAF 18.08 9.87
EfficientNet-Lite4 12.96 1,361
Emotion 12.95 877
Mask R-CNN 46.77 92,077
Model Params(M) MACs(M)
LLaMa 1 layer 618 211,801
BEVFormer Tiny 33.7 210,838
rvm_mobilenetv3 3.73 4,289
yolov4 64.33 3,319
ConvNeXt-L 229.79 34,872
edgenext_small 5.58 1,357
SSD 19.98 216,598
RealESRGAN 16.69 73,551
ShuffleNet 2.29 146
GPT-2 137.02 1,103
T5-encoder 109.62 686
T5-decoder 162.62 1,113
RoBERTa-BASE 124.64 688
Faster R-CNN 44.10 46,018
FCN ResNet-50 35.29 37,056
ResNet50 25 3,868

🤝 Contributing

Contributions are welcome! Please open an issue or PR for:

  • Bug reports
  • Feature requests
  • Documentation improvements
  • New model support

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

onnx_tool-1.0.1.tar.gz (58.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

onnx_tool-1.0.1-py3-none-any.whl (56.2 kB view details)

Uploaded Python 3

File details

Details for the file onnx_tool-1.0.1.tar.gz.

File metadata

  • Download URL: onnx_tool-1.0.1.tar.gz
  • Upload date:
  • Size: 58.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for onnx_tool-1.0.1.tar.gz
Algorithm Hash digest
SHA256 7c876e3c12793edc9c13449b0999d951f0b457663568022e89679366ca480912
MD5 58b101d7a9f1a14f743dee6257f86ebc
BLAKE2b-256 40b13229cf6c4edb12f5ccf9d556a37d44018fa05d26b66a6473ec7650fd2eeb

See more details on using hashes here.

File details

Details for the file onnx_tool-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: onnx_tool-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 56.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for onnx_tool-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 18357300c219baf200ec10325663c10cd6affa998e2e7404e59d10319a55969c
MD5 c08013be924e598b2fcf4c637edb7c49
BLAKE2b-256 929cf07eb3f8dd29a2f57c490185b103f852fd389550ca3b2f8860358a7e818a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page