Optimum ONNX is an interface between the Hugging Face libraries and ONNX / ONNX Runtime
Project description
Installation
Before you begin, make sure you install all necessary libraries by running:
pip install "optimum-onnx[onnxruntime]"
If you want to use the GPU version of ONNX Runtime, make sure the CUDA and cuDNN requirements are satisfied, and install the additional dependencies by running :
pip install "optimum-onnx[onnxruntime-gpu]"
To avoid conflicts between onnxruntime and onnxruntime-gpu, make sure the package onnxruntime is not installed by running pip uninstall onnxruntime prior to installing Optimum.
ONNX export
It is possible to export 🤗 Transformers, Diffusers, Timm and Sentence Transformers models to the ONNX format and perform graph optimization as well as quantization easily:
optimum-cli export onnx --model meta-llama/Llama-3.2-1B onnx_llama/
The model can also be optimized and quantized with onnxruntime.
For more information on the ONNX export, please check the documentation.
Inference
Once the model is exported to the ONNX format, we provide Python classes enabling you to run the exported ONNX model in a seamless manner using ONNX Runtime in the backend:
from transformers import AutoTokenizer, pipeline
- from transformers import AutoModelForCausalLM
+ from optimum.onnxruntime import ORTModelForCausalLM
- model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-1B") # PyTorch checkpoint
+ model = ORTModelForCausalLM.from_pretrained("onnx-community/Llama-3.2-1B", subfolder="onnx") # ONNX checkpoint
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-1B")
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
result = pipe("He never went out without a book under his arm")
More details on how to run ONNX models with ORTModelForXXX classes here.
Examples
Check out the examples folder for more usage examples including optimization, quantization, and model-specific demonstrations.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file optimum_onnx-0.1.0.tar.gz.
File metadata
- Download URL: optimum_onnx-0.1.0.tar.gz
- Upload date:
- Size: 165.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
182c54b25eddaded1618af7b58516da34749393a987ec7111f74677f249676f9
|
|
| MD5 |
f8e905f3bc5f419792504052c0753220
|
|
| BLAKE2b-256 |
08da3a0073af8f436d72c1e4d9c655c00628b857bd1d9ccc101d35301d5bb2df
|
File details
Details for the file optimum_onnx-0.1.0-py3-none-any.whl.
File metadata
- Download URL: optimum_onnx-0.1.0-py3-none-any.whl
- Upload date:
- Size: 194.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0301ec7a6ec5c77a57581e9970d380a6dc104bdb8f15b282e05af40d829c2eda
|
|
| MD5 |
727b280313421b26fc80e71f8b279807
|
|
| BLAKE2b-256 |
41894be9d226bc74fd0eb405d1efea62e86d6f0f31841dae9c5898ee12eb482f
|