A tool for parsing, editing, optimizing, and profiling ONNX models.

These details have not been verified by PyPI

Project links

Homepage

Project description

📄 简体中文 | ✨ New Project: AI-Enhancement-Filter (powered by onnx-tool)

Python 3.6+ PyPI Version License

onnx-tool

A comprehensive toolkit for analyzing, optimizing, and transforming ONNX models with advanced capabilities for LLMs, diffusion models, and computer vision architectures.

LLM Optimization: Build and profile large language models with KV cache analysis (example)
Graph Transformation:
- Constant folding (docs)
- Operator fusion (docs)
Advanced Profiling:
- Rapid shape inference
- MACs/parameter statistics with sparsity awareness
Compute Graph Engine: Runtime shape computation with minimal overhead (details)
Memory Compression:
- Activation memory optimization (up to 95% reduction)
- Weight quantization (FP16, INT8/INT4 with per-tensor/channel/block schemes)
Quantization & Sparsity: Full support for quantized and sparse model analysis

🤖 Supported Model Architectures

Domain	Models
NLP	BERT, T5, GPT, LLaMa, MPT (TransformerModel)
Diffusion	Stable Diffusion (TextEncoder, VAE, UNet)
CV	Detic, BEVFormer, SSD300_VGG16, ConvNeXt, Mask R-CNN, Silero VAD
Audio	Sovits, LPCNet

⚡ Build & Profile LLMs in Seconds

Profile 10 Hugging Face models in under one second. Export ONNX models with llama.cpp-like simplicity (code).

Model Statistics (1k token input)

model name(1k input)	MACs(G)	Parameters(G)	KV Cache(G)
gpt-j-6b	6277	6.05049	0.234881
yi-1.5-34B	35862	34.3889	0.125829
microsoft/phi-2	2948	2.77944	0.167772
Phi-3-mini-4k	4083	3.82108	0.201327
Phi-3-small-8k-instruct	7912	7.80167	0.0671089
Phi-3-medium-4k-instruct	14665	13.9602	0.104858
Llama3-8B	8029	8.03026	0.0671089
Llama-3.1-70B-Japanese-Instruct-2407	72888	70.5537	0.167772
QWen-7B	7509	7.61562	0.0293601
Qwen2_72B_Instruct	74895	72.7062	0.167772

Latency Estimation (4-bit weights, 16-bit KV cache)

model_type_4bit_kv16bit	memory_size(GB)	Ultra-155H_TTFT	Ultra-155H_TPOT	Arc-A770_TTFT	Arc-A770_TPOT	H100-PCIe_TTFT	H100-PCIe_TPOT
gpt-j-6b	3.75678	1.0947	0.041742	0.0916882	0.00670853	0.0164015	0.00187839
yi-1.5-34B	19.3369	5.77095	0.214854	0.45344	0.0345302	0.0747854	0.00966844
microsoft/phi-2	1.82485	0.58361	0.0202761	0.0529628	0.00325866	0.010338	0.000912425
Phi-3-mini-4k	2.49649	0.811173	0.0277388	0.0745356	0.00445802	0.0147274	0.00124825
Phi-3-small-8k-instruct	4.2913	1.38985	0.0476811	0.117512	0.00766303	0.0212535	0.00214565
Phi-3-medium-4k-instruct	7.96977	2.4463	0.088553	0.198249	0.0142317	0.0340576	0.00398489
Llama3-8B	4.35559	1.4354	0.0483954	0.123333	0.00777784	0.0227182	0.00217779
Llama-3.1-70B-Japanese-Instruct-2407	39.4303	11.3541	0.438114	0.868475	0.0704112	0.137901	0.0197151
QWen-7B	4.03576	1.34983	0.0448417	0.11722	0.00720671	0.0218461	0.00201788
Qwen2_72B_Instruct	40.5309	11.6534	0.450343	0.890816	0.0723766	0.14132	0.0202654

💡 Latencies computed from hardware specs – no actual inference required

🔧 Basic Parsing & Editing

Intuitive API for model manipulation:

from onnx_tool import Model

model = Model('model.onnx')          # Load any ONNX file
graph = model.graph                  # Access computation graph
node = graph.nodemap['Conv_0']       # Modify operator attributes
tensor = graph.tensormap['weight']   # Edit tensor data/types
model.save_model('modified.onnx')    # Persist changes

See comprehensive examples in benchmark/examples.py.

📊 Shape Inference & Profiling

All profiling relies on precise shape inference:

Shape inference visualization

Profiling Capabilities

Standard profiling: MACs, parameters, memory footprint
Sparse-aware profiling: Quantify sparsity impact on compute

MACs profiling table Sparse model profiling

📚 Learn more:

⚙️ Compute Graph & Shape Engine

Transform exported ONNX graphs into efficient Compute Graphs by removing shape-calculation overhead:

Compute graph transformation

Compute Graph: Minimal graph containing only compute operations
Shape Engine: Runtime shape resolver for dynamic models

Use Cases:

Integration with custom inference engines (guide)
Shape regression testing (example)

💾 Memory Compression

Activation Memory Compression

Reuses temporary buffers to minimize peak memory usage – critical for LLMs and high-res CV models.

model	Native Memory Size(MB)	Compressed Memory Size(MB)	Compression Ratio(%)
StableDiffusion(VAE_encoder)	14,245	540	3.7
StableDiffusion(VAE_decoder)	25,417	1,140	4.48
StableDiffusion(Text_encoder)	215	5	2.5
StableDiffusion(UNet)	36,135	2,232	6.2
GPT2	40	2	6.9
BERT	2,170	27	1.25

✅ Typical models achieve >90% activation memory reduction
📌 Implementation: benchmark/compression.py

Weight Compression

Essential for deploying large models on memory-constrained devices:

Quantization Scheme	Size vs FP32	Example (7B model)
FP32 (baseline)	1.00×	28 GB
FP16	0.50×	14 GB
INT8 (per-channel)	0.25×	7 GB
INT4 (block=32, symmetric) – llama.cpp	0.156×	4.4 GB

Supported schemes:

✅ FP16
✅ INT8: symmetric/asymmetric × per-tensor/channel/block
✅ INT4: symmetric/asymmetric × per-tensor/channel/block

📌 See benchmark/examples.py for implementation examples.

🚀 Installation

# PyPI (recommended)
pip install onnx-tool

# Latest development version
pip install --upgrade git+https://github.com/ThanatosShinji/onnx-tool.git

Requirements: Python ≥ 3.6

⚠️ Troubleshooting: If ONNX installation fails, try:
pip install onnx==1.8.1 && pip install onnx-tool

Known Issues

Loop op is not supported
Sequence type is not supported

📈 Model Zoo Results

Comprehensive profiling of ONNX Model Zoo and SOTA models. Input shapes defined in data/public/config.py.

📥 Download pre-profiled models (with full tensor shapes):

Baidu Drive (code: p91k)
Google Drive

Model	Params(M)	MACs(M)
GPT-J 1 layer	464	173,398
MPT 1 layer	261	79,894
text_encoder	123.13	6,782
UNet2DCondition	859.52	888,870
VAE_encoder	34.16	566,371
VAE_decoder	49.49	1,271,959
SqueezeNet 1.0	1.23	351
AlexNet	60.96	665
GoogleNet	6.99	1,606
googlenet_age	5.98	1,605
LResNet100E-IR	65.22	12,102
BERT-Squad	113.61	22,767
BiDAF	18.08	9.87
EfficientNet-Lite4	12.96	1,361
Emotion	12.95	877
Mask R-CNN	46.77	92,077

Model	Params(M)	MACs(M)
LLaMa 1 layer	618	211,801
BEVFormer Tiny	33.7	210,838
rvm_mobilenetv3	3.73	4,289
yolov4	64.33	3,319
ConvNeXt-L	229.79	34,872
edgenext_small	5.58	1,357
SSD	19.98	216,598
RealESRGAN	16.69	73,551
ShuffleNet	2.29	146
GPT-2	137.02	1,103
T5-encoder	109.62	686
T5-decoder	162.62	1,113
RoBERTa-BASE	124.64	688
Faster R-CNN	44.10	46,018
FCN ResNet-50	35.29	37,056
ResNet50	25	3,868

🤝 Contributing

Contributions are welcome! Please open an issue or PR for:

Bug reports
Feature requests
Documentation improvements
New model support

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

1.0.1

Apr 19, 2026

1.0.0

Apr 19, 2026

0.9.0

Feb 28, 2024

0.8.5

Dec 14, 2023

0.8.4

Oct 22, 2023

0.8.3

Oct 22, 2023

0.8.2

Oct 8, 2023

0.8.1

Sep 23, 2023

0.8.0

Sep 16, 2023

0.7.4

Aug 1, 2023

0.7.3

Jun 28, 2023

0.7.2

Jun 23, 2023

0.7.1

Jun 20, 2023

0.7.0

May 28, 2023

0.6.5

May 17, 2023

0.6.4

Apr 19, 2023

0.6.3

Apr 19, 2023

0.6.1

Mar 2, 2023

0.6.0

Feb 7, 2023

0.5.4

Jan 16, 2023

0.5.3

Jan 5, 2023

0.5.2

Dec 12, 2022

0.5.1

Dec 11, 2022

0.5.0

Dec 6, 2022

0.4.0

Nov 22, 2022

0.3.3

Nov 17, 2022

0.3.2

Nov 10, 2022

0.3.1

Oct 26, 2022

0.3.0

Oct 25, 2022

0.2.14

Oct 22, 2022

0.2.13

Sep 26, 2022

0.2.12

Sep 26, 2022

0.2.11

Sep 26, 2022

0.2.10

Sep 3, 2022

0.2.9

Sep 1, 2022

0.2.8

Sep 1, 2022

0.2.7

Sep 1, 2022

0.2.6

Aug 28, 2022

0.2.5

Aug 24, 2022

0.2.4

Aug 24, 2022

0.2.3

Aug 22, 2022

0.2.2

Aug 13, 2022

0.2.1

Aug 11, 2022

0.2.0

Aug 9, 2022

0.1.8

Jul 23, 2022

0.1.7

Jul 23, 2022

0.1.6

Jul 17, 2022

0.1.5

Jul 1, 2022

0.1.4

Jul 1, 2022

0.1.3

Jun 30, 2022

0.1.2

Jun 30, 2022

0.1.1

Jun 30, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

onnx_tool-1.0.1.tar.gz (58.6 kB view details)

Uploaded Apr 19, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

onnx_tool-1.0.1-py3-none-any.whl (56.2 kB view details)

Uploaded Apr 19, 2026 Python 3

File details

Details for the file onnx_tool-1.0.1.tar.gz.

File metadata

Download URL: onnx_tool-1.0.1.tar.gz
Upload date: Apr 19, 2026
Size: 58.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for onnx_tool-1.0.1.tar.gz
Algorithm	Hash digest
SHA256	`7c876e3c12793edc9c13449b0999d951f0b457663568022e89679366ca480912`
MD5	`58b101d7a9f1a14f743dee6257f86ebc`
BLAKE2b-256	`40b13229cf6c4edb12f5ccf9d556a37d44018fa05d26b66a6473ec7650fd2eeb`

See more details on using hashes here.

File details

Details for the file onnx_tool-1.0.1-py3-none-any.whl.

File metadata

Download URL: onnx_tool-1.0.1-py3-none-any.whl
Upload date: Apr 19, 2026
Size: 56.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for onnx_tool-1.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`18357300c219baf200ec10325663c10cd6affa998e2e7404e59d10319a55969c`
MD5	`c08013be924e598b2fcf4c637edb7c49`
BLAKE2b-256	`929cf07eb3f8dd29a2f57c490185b103f852fd389550ca3b2f8860358a7e818a`

See more details on using hashes here.

onnx-tool 1.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

onnx-tool

🤖 Supported Model Architectures

⚡ Build & Profile LLMs in Seconds

Model Statistics (1k token input)

Latency Estimation (4-bit weights, 16-bit KV cache)

🔧 Basic Parsing & Editing

📊 Shape Inference & Profiling

Profiling Capabilities

⚙️ Compute Graph & Shape Engine

💾 Memory Compression

Activation Memory Compression

Weight Compression

🚀 Installation

Known Issues

📈 Model Zoo Results

🤝 Contributing

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes