Skip to main content

A tool for ONNX model: A parser, editor and profiler tool for ONNX models.

Project description

简体中文

onnx-tool

A tool for ONNX model:

Supported Models:

  • NLP: BERT, T5, GPT, LLaMa, MPT(TransformerModel)
  • Diffusion: Stable Diffusion(TextEncoder, VAE, UNET)
  • CV: BEVFormer, MobileNet, YOLO, ...
  • Audio: sovits, LPCNet

Basic Parse and Edit

You can load any onnx file by onnx_tool.Model:
Change graph structure with onnx_tool.Graph;
Change op attributes and IO tensors with onnx_tool.Node;
Change tensor data or type with onnx_tool.Tensor.
To apply your changes, just call save_model method of onnx_tool.Model or onnx_tool.Graph.

Please refer benchmark/examples.py.


Shape Inference & Profile Model

All profiling data must be built on shape inference result.
ONNX graph with tensor shapes:

Regular model profiling table:



Sparse profiling table:



Introduction: data/Profile.md.
pytorch usage: data/PytorchUsage.md.
tensorflow usage: data/TensorflowUsage.md.
examples: benchmark/examples.py.


Compute Graph with Shape Engine

From a raw graph to a compute graph:

Remove shape calculation layers(created by ONNX export) to get a Compute Graph. Use Shape Engine to update tensor shapes at runtime.
Examples: benchmark/shape_regress.py. benchmark/examples.py.
Integrate Compute Graph and Shape Engine into a cpp inference engine: data/inference_engine.md


Memory Compression

Activation Compression

Activation memory also called temporary memory is created by each OP's output. Only the last activation marked as the model's output will be kept. So you don't have to prepare memory space for each activation tensor. They better reuse an optimized memory size.

For large language models and high-resolution CV models, the activation memory compression is a key to save memory.
The compression method achieves 5% memory compression on most models.
For example:

model Native Memory Size(MB) Compressed Memory Size(MB) Compression Ratio(%)
StableDiffusion(VAE_encoder) 14,245 540 3.7
StableDiffusion(VAE_decoder) 25,417 1,140 4.48
StableDiffusion(Text_encoder) 215 5 2.5
StableDiffusion(UNet) 36,135 2,232 6.2
GPT2 40 2 6.9
BERT 2,170 27 1.25

code example: benchmark/compression.py

Weight Compression

A fp32 model with 7B parameters will take 28GB disk space and memory space. You can not even run the model if your device doesn't have that much memory space. So weight compression is critical to run large language models. As a reference, 7B model with int4 symmetric per block(32) quantization(llama.cpp's q4_0 quantization method) only has ~0.156x model size compared with fp32 model.

Current support:

  • [fp16]
  • [int8]x[symmetric/asymmetric]x[per tensor/per channel/per block]
  • [int4]x[symmetric/asymmetric]x[per tensor/per channel/per block]

code examples:benchmark/examples.py.


How to install

pip install onnx-tool

OR

pip install --upgrade git+https://github.com/ThanatosShinji/onnx-tool.git

python>=3.6

If pip install onnx-tool failed by onnx's installation, you may try pip install onnx==1.8.1 (a lower version like this) first.
Then pip install onnx-tool again.


Known Issues

  • Loop op is not supported
  • Sequence type is not supported

Results of ONNX Model Zoo and SOTA models

Some models have dynamic input shapes. The MACs varies from input shapes. The input shapes used in these results are writen to data/public/config.py. These onnx models with all tensors' shape can be downloaded: baidu drive(code: p91k) google drive

Model Params(M) MACs(M)
GPT-J 1 layer 464 173,398
MPT 1 layer 261 79,894
text_encoder 123.13 6,782
UNet2DCondition 859.52 888,870
VAE_encoder 34.16 566,371
VAE_decoder 49.49 1,271,959
SqueezeNet 1.0 1.23 351
AlexNet 60.96 665
GoogleNet 6.99 1,606
googlenet_age 5.98 1,605
LResNet100E-IR 65.22 12,102
BERT-Squad 113.61 22,767
BiDAF 18.08 9.87
EfficientNet-Lite4 12.96 1,361
Emotion 12.95 877
Mask R-CNN 46.77 92,077
Model Params(M) MACs(M)
LLaMa 1 layer 618 211,801
BEVFormer Tiny 33.7 210,838
rvm_mobilenetv3 3.73 4,289
yolov4 64.33 3,319
ConvNeXt-L 229.79 34,872
edgenext_small 5.58 1,357
SSD 19.98 216,598
RealESRGAN 16.69 73,551
ShuffleNet 2.29 146
GPT-2 137.02 1,103
T5-encoder 109.62 686
T5-decoder 162.62 1,113
RoBERTa-BASE 124.64 688
Faster R-CNN 44.10 46,018
FCN ResNet-50 35.29 37,056
ResNet50 25 3,868

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

onnx-tool-0.9.0.tar.gz (45.0 kB view details)

Uploaded Source

Built Distribution

onnx_tool-0.9.0-py3-none-any.whl (44.5 kB view details)

Uploaded Python 3

File details

Details for the file onnx-tool-0.9.0.tar.gz.

File metadata

  • Download URL: onnx-tool-0.9.0.tar.gz
  • Upload date:
  • Size: 45.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.11

File hashes

Hashes for onnx-tool-0.9.0.tar.gz
Algorithm Hash digest
SHA256 83ef90ca877564b4f4fdfdb245e543aef51c4adb796f3371e04ff1172df68075
MD5 62821ca45fa3b2db68ffbcd59b5b5ffd
BLAKE2b-256 0f48e17a24b2d37c4f71bb4ac4297d15b7f63f45cb9e052a30dd6835ffeaa498

See more details on using hashes here.

File details

Details for the file onnx_tool-0.9.0-py3-none-any.whl.

File metadata

  • Download URL: onnx_tool-0.9.0-py3-none-any.whl
  • Upload date:
  • Size: 44.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.11

File hashes

Hashes for onnx_tool-0.9.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1a5bb59f4f4d78614c4fe6627fa3cd02a10eef1d7d1d653dfbe9e4db7b395abd
MD5 4fea772b70e68978d0195703ded349cd
BLAKE2b-256 f43116fa211a12696ebe3dc41b934c5ee81bb9abc8b75482d796b170428051a2

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page