A tool for parsing, editing, optimizing, and profiling ONNX models.
Project description
📄 简体中文 | ✨ New Project: AI-Enhancement-Filter (powered by onnx-tool)
onnx-tool
A comprehensive toolkit for analyzing, optimizing, and transforming ONNX models with advanced capabilities for LLMs, diffusion models, and computer vision architectures.
- LLM Optimization: Build and profile large language models with KV cache analysis (example)
- Graph Transformation:
- Advanced Profiling:
- Rapid shape inference
- MACs/parameter statistics with sparsity awareness
- Compute Graph Engine: Runtime shape computation with minimal overhead (details)
- Memory Compression:
- Activation memory optimization (up to 95% reduction)
- Weight quantization (FP16, INT8/INT4 with per-tensor/channel/block schemes)
- Quantization & Sparsity: Full support for quantized and sparse model analysis
🤖 Supported Model Architectures
| Domain | Models |
|---|---|
| NLP | BERT, T5, GPT, LLaMa, MPT (TransformerModel) |
| Diffusion | Stable Diffusion (TextEncoder, VAE, UNet) |
| CV | Detic, BEVFormer, SSD300_VGG16, ConvNeXt, Mask R-CNN, Silero VAD |
| Audio | Sovits, LPCNet |
⚡ Build & Profile LLMs in Seconds
Profile 10 Hugging Face models in under one second. Export ONNX models with llama.cpp-like simplicity (code).
Model Statistics (1k token input)
| model name(1k input) | MACs(G) | Parameters(G) | KV Cache(G) |
|---|---|---|---|
| gpt-j-6b | 6277 | 6.05049 | 0.234881 |
| yi-1.5-34B | 35862 | 34.3889 | 0.125829 |
| microsoft/phi-2 | 2948 | 2.77944 | 0.167772 |
| Phi-3-mini-4k | 4083 | 3.82108 | 0.201327 |
| Phi-3-small-8k-instruct | 7912 | 7.80167 | 0.0671089 |
| Phi-3-medium-4k-instruct | 14665 | 13.9602 | 0.104858 |
| Llama3-8B | 8029 | 8.03026 | 0.0671089 |
| Llama-3.1-70B-Japanese-Instruct-2407 | 72888 | 70.5537 | 0.167772 |
| QWen-7B | 7509 | 7.61562 | 0.0293601 |
| Qwen2_72B_Instruct | 74895 | 72.7062 | 0.167772 |
Latency Estimation (4-bit weights, 16-bit KV cache)
| model_type_4bit_kv16bit | memory_size(GB) | Ultra-155H_TTFT | Ultra-155H_TPOT | Arc-A770_TTFT | Arc-A770_TPOT | H100-PCIe_TTFT | H100-PCIe_TPOT |
|---|---|---|---|---|---|---|---|
| gpt-j-6b | 3.75678 | 1.0947 | 0.041742 | 0.0916882 | 0.00670853 | 0.0164015 | 0.00187839 |
| yi-1.5-34B | 19.3369 | 5.77095 | 0.214854 | 0.45344 | 0.0345302 | 0.0747854 | 0.00966844 |
| microsoft/phi-2 | 1.82485 | 0.58361 | 0.0202761 | 0.0529628 | 0.00325866 | 0.010338 | 0.000912425 |
| Phi-3-mini-4k | 2.49649 | 0.811173 | 0.0277388 | 0.0745356 | 0.00445802 | 0.0147274 | 0.00124825 |
| Phi-3-small-8k-instruct | 4.2913 | 1.38985 | 0.0476811 | 0.117512 | 0.00766303 | 0.0212535 | 0.00214565 |
| Phi-3-medium-4k-instruct | 7.96977 | 2.4463 | 0.088553 | 0.198249 | 0.0142317 | 0.0340576 | 0.00398489 |
| Llama3-8B | 4.35559 | 1.4354 | 0.0483954 | 0.123333 | 0.00777784 | 0.0227182 | 0.00217779 |
| Llama-3.1-70B-Japanese-Instruct-2407 | 39.4303 | 11.3541 | 0.438114 | 0.868475 | 0.0704112 | 0.137901 | 0.0197151 |
| QWen-7B | 4.03576 | 1.34983 | 0.0448417 | 0.11722 | 0.00720671 | 0.0218461 | 0.00201788 |
| Qwen2_72B_Instruct | 40.5309 | 11.6534 | 0.450343 | 0.890816 | 0.0723766 | 0.14132 | 0.0202654 |
💡 Latencies computed from hardware specs – no actual inference required
🔧 Basic Parsing & Editing
Intuitive API for model manipulation:
from onnx_tool import Model
model = Model('model.onnx') # Load any ONNX file
graph = model.graph # Access computation graph
node = graph.nodemap['Conv_0'] # Modify operator attributes
tensor = graph.tensormap['weight'] # Edit tensor data/types
model.save_model('modified.onnx') # Persist changes
See comprehensive examples in benchmark/examples.py.
📊 Shape Inference & Profiling
All profiling relies on precise shape inference:
Profiling Capabilities
- Standard profiling: MACs, parameters, memory footprint
- Sparse-aware profiling: Quantify sparsity impact on compute
📚 Learn more:
⚙️ Compute Graph & Shape Engine
Transform exported ONNX graphs into efficient Compute Graphs by removing shape-calculation overhead:
- Compute Graph: Minimal graph containing only compute operations
- Shape Engine: Runtime shape resolver for dynamic models
Use Cases:
💾 Memory Compression
Activation Memory Compression
Reuses temporary buffers to minimize peak memory usage – critical for LLMs and high-res CV models.
| model | Native Memory Size(MB) | Compressed Memory Size(MB) | Compression Ratio(%) |
|---|---|---|---|
| StableDiffusion(VAE_encoder) | 14,245 | 540 | 3.7 |
| StableDiffusion(VAE_decoder) | 25,417 | 1,140 | 4.48 |
| StableDiffusion(Text_encoder) | 215 | 5 | 2.5 |
| StableDiffusion(UNet) | 36,135 | 2,232 | 6.2 |
| GPT2 | 40 | 2 | 6.9 |
| BERT | 2,170 | 27 | 1.25 |
✅ Typical models achieve >90% activation memory reduction
📌 Implementation:benchmark/compression.py
Weight Compression
Essential for deploying large models on memory-constrained devices:
| Quantization Scheme | Size vs FP32 | Example (7B model) |
|---|---|---|
| FP32 (baseline) | 1.00× | 28 GB |
| FP16 | 0.50× | 14 GB |
| INT8 (per-channel) | 0.25× | 7 GB |
| INT4 (block=32, symmetric) – llama.cpp | 0.156× | 4.4 GB |
Supported schemes:
- ✅ FP16
- ✅ INT8: symmetric/asymmetric × per-tensor/channel/block
- ✅ INT4: symmetric/asymmetric × per-tensor/channel/block
📌 See benchmark/examples.py for implementation examples.
🚀 Installation
# PyPI (recommended)
pip install onnx-tool
# Latest development version
pip install --upgrade git+https://github.com/ThanatosShinji/onnx-tool.git
Requirements: Python ≥ 3.6
⚠️ Troubleshooting: If ONNX installation fails, try:
pip install onnx==1.8.1 && pip install onnx-tool
Known Issues
- Loop op is not supported
- Sequence type is not supported
📈 Model Zoo Results
Comprehensive profiling of ONNX Model Zoo and SOTA models. Input shapes defined in data/public/config.py.
📥 Download pre-profiled models (with full tensor shapes):
- Baidu Drive (code:
p91k) - Google Drive
|
|
🤝 Contributing
Contributions are welcome! Please open an issue or PR for:
- Bug reports
- Feature requests
- Documentation improvements
- New model support
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file onnx_tool-1.0.0.tar.gz.
File metadata
- Download URL: onnx_tool-1.0.0.tar.gz
- Upload date:
- Size: 57.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
386d893c90d6a1d1cde5141a2b20d94bced1ab3c5b8f9e758c6ed23fd74e19e9
|
|
| MD5 |
8dac6469797269b8dc029a96e363d27e
|
|
| BLAKE2b-256 |
b9266c5d0dbb812c8f23a29e2cbb9b7d18ab9ed9c5cfa7bcd03c9c5fdf498099
|
File details
Details for the file onnx_tool-1.0.0-py3-none-any.whl.
File metadata
- Download URL: onnx_tool-1.0.0-py3-none-any.whl
- Upload date:
- Size: 55.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c8b9c1ac547f0a35c13a355c3fda9c41e577d6a6a81e02158420f5be8e6fdb91
|
|
| MD5 |
e110f3e7e020729140d357e7e8cd746d
|
|
| BLAKE2b-256 |
05c5fb5950695afbe92da639cb3e99a0bb3fac467af94634bfafb1cf4a04a6d9
|