A tool for ONNX model:Rapid shape inference; Profile model; Compute Graph and Shape Engine; OPs fusion;Quantized models and sparse models are supported.
Project description
onnx-tool
A tool for ONNX model:
- Rapid shape inference.
- Profile model.
- Constant Folding.
- Compute Graph and Shape Engine.
- OPs fusion.
- Activation memory compression.
- Quantized models and sparse models are supported.
Supported Models:
- NLP: BERT, T5, GPT, LLaMa, MPT(TransformerModel)
- Diffusion: Stable Diffusion(TextEncoder, VAE, UNET)
- CV: BEVFormer, MobileNet, YOLO, ...
- Audio: sovits, LPCNet
Shape inference
how to use: data/Profile.md.
pytorch usage: data/PytorchUsage.md.
tensorflow
usage: data/TensorflowUsage.md.
samples: benchmark/samples.py.
Profile Model
Float MultipleAdd Count(1 MAC=2 FLOPs), Memory Usage(in bytes), Parameters(elements number)
Sparse Pattern, Sparse Block Ratio, Sparse Element Ratio
how to use: data/Profile.md.
pytorch usage: data/PytorchUsage.md.
tensorflow
usage: data/TensorflowUsage.md.
samples: benchmark/samples.py.
Compute Graph with Shape Engine
Remove shape calculation layers(created by ONNX export) to get a Compute Graph. Use Shape Engine to update tensor
shapes at runtime.
Samples: benchmark/shape_regress.py.
benchmark/samples.py.
Integrate Compute Graph and Shape Engine into a cpp inference
engine: data/inference_engine.md
Inplace op fusion
MHA and Layernorm Fusion for Transformers
Resnet18 fusion
how to use: data/Subgraph.md.
BERT samples: benchmark/samples.py.
Pattern fusion: benchmark/do_fusion.py.
Extract subgraph from ONNX model
Help implement model parallelism.
how to use: data/Subgraph.md.
Memory Compression
For large language models and high-resolution CV models, the activation memory compression is a key to save memory.
The compression method achieves 5% memory compression on most models.
For example:
model | Native Memory Size(MB) | Compressed Memory Size(MB) | Compression Ratio(%) |
---|---|---|---|
StableDiffusion(VAE_encoder) | 14,245 | 540 | 3.7 |
StableDiffusion(VAE_decoder) | 25,417 | 1,140 | 4.48 |
StableDiffusion(Text_encoder) | 215 | 5 | 2.5 |
StableDiffusion(UNet) | 36,135 | 2,232 | 6.2 |
GPT2 | 40 | 2 | 6.9 |
BERT | 2,170 | 27 | 1.25 |
code sample: benchmark/compression.py
Tensor operations
- Export weight tensors to files
- Simplify tensor and node names, convert name from a long string to a short string
- Remove unused tensors, models like vgg19-7.onnx set its static weight tensors as its input tensors
- Set custom input and output tensors' name and dimension, change model from fixed input to dynamic input
how to use: data/Tensors.md.
How to install
pip install onnx-tool
OR
pip install --upgrade git+https://github.com/ThanatosShinji/onnx-tool.git
python>=3.6
If pip install onnx-tool
failed by onnx's installation, you may try pip install onnx==1.8.1
(a lower version like this) first.
Then pip install onnx-tool
again.
Known Issues
- Loop op is not supported
- Activation Compression is not optimum
Results of ONNX Model Zoo and SOTA models
Some models have dynamic input shapes. The MACs varies from input shapes. The input shapes used in these results are writen to data/public/config.py. These onnx models with all tensors' shape can be downloaded: baidu drive(code: p91k) google drive
|
|
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for onnx_tool-0.7.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d24eae94137bb4ae9906c9a1f5fb4a78b39a6cb56796e1910a5cd40da839dadf |
|
MD5 | 60b934c220b60bbbfcd5e66ed6a0e6f8 |
|
BLAKE2b-256 | 51607aa61df3e9adb68e580f162a294c3590b2784d68db42d9c830b0fcc8e5a7 |