Skip to main content

Optimum ONNX is an interface between the Hugging Face libraries and ONNX / ONNX Runtime

Project description

🤗 Optimum ONNX

Export your Hugging Face models to ONNX

Documentation | ONNX | Hub

Installation

Before you begin, make sure you install all necessary libraries by running:

pip install "optimum-onnx[onnxruntime]"

If you want to use the GPU version of ONNX Runtime, make sure the CUDA and cuDNN requirements are satisfied, and install the additional dependencies by running :

pip install "optimum-onnx[onnxruntime-gpu]"

To avoid conflicts between onnxruntime and onnxruntime-gpu, make sure the package onnxruntime is not installed by running pip uninstall onnxruntime prior to installing Optimum.

ONNX export

It is possible to export 🤗 Transformers, Diffusers, Timm and Sentence Transformers models to the ONNX format and perform graph optimization as well as quantization easily:

optimum-cli export onnx --model meta-llama/Llama-3.2-1B onnx_llama/

The model can also be optimized and quantized with onnxruntime.

For more information on the ONNX export, please check the documentation.

Inference

Once the model is exported to the ONNX format, we provide Python classes enabling you to run the exported ONNX model in a seamless manner using ONNX Runtime in the backend:

  from transformers import AutoTokenizer, pipeline
- from transformers import AutoModelForCausalLM
+ from optimum.onnxruntime import ORTModelForCausalLM

- model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-1B") # PyTorch checkpoint
+ model = ORTModelForCausalLM.from_pretrained("onnx-community/Llama-3.2-1B", subfolder="onnx") # ONNX checkpoint
  tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-1B")

  pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
  result = pipe("He never went out without a book under his arm")

More details on how to run ONNX models with ORTModelForXXX classes here.

Examples

Check out the examples folder for more usage examples including optimization, quantization, and model-specific demonstrations.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

optimum_onnx-0.1.0.tar.gz (165.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

optimum_onnx-0.1.0-py3-none-any.whl (194.2 kB view details)

Uploaded Python 3

File details

Details for the file optimum_onnx-0.1.0.tar.gz.

File metadata

  • Download URL: optimum_onnx-0.1.0.tar.gz
  • Upload date:
  • Size: 165.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.16

File hashes

Hashes for optimum_onnx-0.1.0.tar.gz
Algorithm Hash digest
SHA256 182c54b25eddaded1618af7b58516da34749393a987ec7111f74677f249676f9
MD5 f8e905f3bc5f419792504052c0753220
BLAKE2b-256 08da3a0073af8f436d72c1e4d9c655c00628b857bd1d9ccc101d35301d5bb2df

See more details on using hashes here.

File details

Details for the file optimum_onnx-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: optimum_onnx-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 194.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.16

File hashes

Hashes for optimum_onnx-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0301ec7a6ec5c77a57581e9970d380a6dc104bdb8f15b282e05af40d829c2eda
MD5 727b280313421b26fc80e71f8b279807
BLAKE2b-256 41894be9d226bc74fd0eb405d1efea62e86d6f0f31841dae9c5898ee12eb482f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page