A simple package for converting TensorFlow Efficientnet models to ONNX and TensorRT formats.
Project description
quantizeeffinet
A Python package for seamlessly converting Efficientnet TensorFlow models to ONNX and TensorRT formats, enabling optimized inference on NVIDIA GPUs.
Features
- TensorFlow to ONNX Conversion: Convert SavedModel or model weights to ONNX format with FP32 or FP16 precision
- ONNX to TensorRT Conversion: Build optimized TensorRT engines with dynamic batch sizes
- Direct TF to TRT Pipeline: One-step conversion from TensorFlow to TensorRT
- INT8 Quantization Support: Improve inference speed with INT8 calibration
- Comprehensive Logging: Detailed conversion process tracking and validation
Table of Contents
Installation
Prerequisites
Before installing the package, ensure you have the following:
- Python 3.8 or higher - Check with
python --version - pip (Python package manager) - Check with
pip --version - NVIDIA GPU with CUDA support
- CUDA Toolkit 11.7+ - Download from NVIDIA
- TensorRT 8.5+ - Download from NVIDIA
Install from Wheel File
- Download the wheel file provided to you
- Open a terminal and navigate to the directory containing the .whl file.
- Install the package using pip.
- If dependencies are not automatically installed with the wheel file, you can install them using the provided
requirements.txtfile:
Quick Start
- Install the wheel and dependencies, then import the converter:
from quantizeeffinet import ModelConverter - Create a converter instance:
converter = ModelConverter() - Convert TensorFlow to ONNX or straight to TensorRT, with optional FP16/INT8 settings and dynamic batching.
Usage Examples
ModelConverter
ModelConverter provides three utilities: convert TensorFlow models to ONNX, build TensorRT engines from ONNX, and run a one-call TensorFlow to TensorRT pipeline for deployment-ready inference engines. These three utilities are accessed by three functions.
Keep In Mind
- The original input shape (NHWC) will be changed to (NCHW)
- All the functions'
output_model_pathcould beNonethis way the functions return the created models. However, it is not recommended to use these models for inference right away. The best practice is to restart the Python kernel, thus freeing up all allocated memory. After that, with a new run, you could run inference without any errors. - Fp16 TensorRT engines are the bes choice for now, as they provide the fastes inference wihout any accuracy loss.
tf_to_onnx
- Converts a
tf.kerasSavedModel or a.h5Weights file to an ONNX model. - I you pass a
.h5file that only contains weights, you should also specify the base model, like EfficientNetB3, B5, B6 - The keras model input shape must be
(None, 224, 224, 3). - Supports exporting to FP32 or FP16 ONNX graph.
- You do not have to change opset argument in the function call
opset=13is sufficient.
from quantizeeffinet import ModelConverter
converter = ModelConverter()
onnx_model = converter.tf_to_onnx(
input_model="path/to/efficientnet_b3.h5",
output_path="path/to/output/model.onnx",
precision="fp16",
only_weigths_of_model='EfficientNetB3'
)
onnx_to_trt
- Parses an ONNX model and builds a TensorRT engine.
- The input model could be passed as a path to a
model.onnxfile or as aonnx.ModelProtoobject. - The onnx model input shape must be
(-1, 3, 224, 224). - If no path to save the engine and
auto_generate_engine_path=Trueit auto-generates the path. - Supports exporting to FP32, FP16 or Int8 trt engines.
- TensorRT uses dynamic batching. This means that you have to specify 3 arguments:
min_batch: int = 1, this is the minimum nuber of images that a batch could ever contain (Should stay 1).opt_batch: int = 32, this is the number of images that most of the inference batches will contain (Should be the same as max_batch).max_batch: int = 32, this is the maximum nuber of images that a batch could ever contain (Should be the same as max_batch).
from quantizeeffinet import ModelConverter
converter = ModelConverter()
engine = converter.onnx_to_trt(
input_model="path/to/model.onnx",
engine_file_path="path/to/output/model.engine",
precision="fp16",
min_batch=1,
opt_batch=16,
max_batch=32
)
- INT8 requires a calibration step or an existing cache.
- INT8 mode supports using a directory of images, a single image, a list of images, or an existing calibration cache to avoid recalibration.
- If no cache value is given then images are going to be used and no cache will be saved, if no images are given then cache value will be used. If both are pased to the function then the cache will be used for faster conversion.
from quantizeeffinet import ModelConverter
converter = ModelConverter()
model=converter.onnx_to_trt(
input_model=path/to/model.onnx,
engine_file_path=path/to/int8_efficientnet.engine,
min_batch=1,
max_batch=32,
opt_batch=32,
precision="int8",
calibration_images=path/to/calibration/images/dir,
calibration_cache=path/to/calibration.cache,
)
tf_to_trt
Runs the end-to-end pipeline in one call. Exports a TensorFlow model to ONNX, then builds a TensorRT engine with the selected precision and batch profiles, streamlining deployment.
- If you pass a
.h5file that only contains weights, you should also specify the base model, like EfficientNetB3, B5, B6 - When using INT8, provide calibration images or a calibration cache.
from quantizeeffinet import ModelConverter
converter = ModelConverter()
model=converter.tf_to_trt(
input_model="path/to/efficientnet_b3.h5",
engine_file_path=path/to/int8_efficientnet.engine,
only_weigths_of_model='EfficientNetB3',
min_batch=1,
opt_batch=32,
max_batch=32,
precision="int8",
calibration_images=path/to/calibration/images/dir,
calibration_cache=path/to/calibration.cache,
)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file quantizeeffinet-0.0.1.tar.gz.
File metadata
- Download URL: quantizeeffinet-0.0.1.tar.gz
- Upload date:
- Size: 15.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b3a45e77b36d156ce6f9e1d09b97150c620f1e2a3b4214d6e30e0829c0fa782d
|
|
| MD5 |
9bf85a6d021ff0ee69b6702faa1ba7a9
|
|
| BLAKE2b-256 |
5fba6ae1ec858972592659b7eba56843757a3d12a54632af1ed76f367c7c5312
|
Provenance
The following attestation bundles were made for quantizeeffinet-0.0.1.tar.gz:
Publisher:
release.yaml on zotyag/quantizeeffinet
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
quantizeeffinet-0.0.1.tar.gz -
Subject digest:
b3a45e77b36d156ce6f9e1d09b97150c620f1e2a3b4214d6e30e0829c0fa782d - Sigstore transparency entry: 707602631
- Sigstore integration time:
-
Permalink:
zotyag/quantizeeffinet@f0223c897fb4a1d702b30f7e20f0918eb3acb23b -
Branch / Tag:
refs/heads/release - Owner: https://github.com/zotyag
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yaml@f0223c897fb4a1d702b30f7e20f0918eb3acb23b -
Trigger Event:
push
-
Statement type:
File details
Details for the file quantizeeffinet-0.0.1-py3-none-any.whl.
File metadata
- Download URL: quantizeeffinet-0.0.1-py3-none-any.whl
- Upload date:
- Size: 16.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4f02ac5bb7336bde88e56529e0b65317f5f52627c1ef9ced756653815a3b2d81
|
|
| MD5 |
9906282d5bf25e0388f5dd2af05b8f4e
|
|
| BLAKE2b-256 |
06b740625137de39561fbf48126208a75a545be95eabab1504b4a88483f0993c
|
Provenance
The following attestation bundles were made for quantizeeffinet-0.0.1-py3-none-any.whl:
Publisher:
release.yaml on zotyag/quantizeeffinet
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
quantizeeffinet-0.0.1-py3-none-any.whl -
Subject digest:
4f02ac5bb7336bde88e56529e0b65317f5f52627c1ef9ced756653815a3b2d81 - Sigstore transparency entry: 707602634
- Sigstore integration time:
-
Permalink:
zotyag/quantizeeffinet@f0223c897fb4a1d702b30f7e20f0918eb3acb23b -
Branch / Tag:
refs/heads/release - Owner: https://github.com/zotyag
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yaml@f0223c897fb4a1d702b30f7e20f0918eb3acb23b -
Trigger Event:
push
-
Statement type: