Skip to main content

Optimum Library is an extension of the Hugging Face Transformers library, providing a framework to integrate third-party libraries from Hardware Partners and interface with their specific functionality.

Project description

🤗 Optimum

PyPI - License PyPI - Python Version PyPI - Version PyPI - Downloads Documentation

Optimum is an extension of Transformers 🤖 Diffusers 🧨 TIMM 🖼️ and Sentence-Transformers 🤗, providing a set of optimization tools and enabling maximum efficiency to train and run models on targeted hardware, while keeping things easy to use.

Installation

Optimum can be installed using pip as follows:

python -m pip install optimum

If you'd like to use the accelerator-specific features of Optimum, you can check the documentation and install the required dependencies according to the table below:

Accelerator Installation
ONNX Runtime pip install --upgrade --upgrade-strategy eager optimum[onnxruntime]
Intel Neural Compressor pip install --upgrade --upgrade-strategy eager optimum[neural-compressor]
OpenVINO pip install --upgrade --upgrade-strategy eager optimum[openvino]
IPEX pip install --upgrade --upgrade-strategy eager optimum[ipex]
NVIDIA TensorRT-LLM docker run -it --gpus all --ipc host huggingface/optimum-nvidia
AMD Instinct GPUs and Ryzen AI NPU pip install --upgrade --upgrade-strategy eager optimum[amd]
AWS Trainum & Inferentia pip install --upgrade --upgrade-strategy eager optimum[neuronx]
Intel Gaudi Accelerators (HPU) pip install --upgrade --upgrade-strategy eager optimum[habana]
FuriosaAI pip install --upgrade --upgrade-strategy eager optimum[furiosa]

The --upgrade --upgrade-strategy eager option is needed to ensure the different packages are upgraded to the latest possible version.

To install from source:

python -m pip install git+https://github.com/huggingface/optimum.git

For the accelerator-specific features, append optimum[accelerator_type] to the above command:

python -m pip install optimum[onnxruntime]@git+https://github.com/huggingface/optimum.git

Accelerated Inference

Optimum provides multiple tools to export and run optimized models on various ecosystems:

  • ONNX / ONNX Runtime, one of the most popular open formats for model export, and a high-performance inference engine for deployment.
  • OpenVINO, a toolkit for optimizing, quantizing and deploying deep learning models on Intel hardware.
  • ExecuTorch, PyTorch’s native solution for on-device inference across mobile and edge devices.
  • TensorFlow Lite, a lightweight solution for running TensorFlow models on mobile and edge.
  • Intel Gaudi Accelerators enabling optimal performance on first-gen Gaudi, Gaudi2 and Gaudi3.
  • AWS Inferentia for accelerated inference on Inf2 and Inf1 instances.
  • NVIDIA TensorRT-LLM.

The export and optimizations can be done both programmatically and with a command line.

ONNX + ONNX Runtime

Before you begin, make sure you have all the necessary libraries installed :

pip install optimum[exporters,onnxruntime]

It is possible to export Transformers and Diffusers models to the ONNX format and perform graph optimization as well as quantization easily.

For more information on the ONNX export, please check the documentation.

Once the model is exported to the ONNX format, we provide Python classes enabling you to run the exported ONNX model in a seemless manner using ONNX Runtime in the backend.

More details on how to run ONNX models with ORTModelForXXX classes here.

Intel (OpenVINO + Neural Compressor + IPEX)

Before you begin, make sure you have all the necessary libraries installed.

You can find more information on the different integration in our documentation and in the examples of optimum-intel.

ExecuTorch

Before you begin, make sure you have all the necessary libraries installed :

pip install optimum-executorch@git+https://github.com/huggingface/optimum-executorch.git

Users can export Transformers models to ExecuTorch and run inference on edge devices within PyTorch's ecosystem.

For more information about export Transformers to ExecuTorch, please check the doc for Optimum-ExecuTorch.

TensorFlow Lite

Before you begin, make sure you have all the necessary libraries installed :

pip install optimum[exporters-tf]

Just as for ONNX, it is possible to export models to TensorFlow Lite and quantize them. You can find more information in our documentation.

Quanto

Quanto is a pytorch quantization backend which allows you to quantize a model either using the python API or the optimum-cli.

You can see more details and examples in the Quanto repository.

Accelerated training

Optimum provides wrappers around the original Transformers Trainer to enable training on powerful hardware easily. We support many providers:

Intel Gaudi Accelerators

Before you begin, make sure you have all the necessary libraries installed :

pip install --upgrade --upgrade-strategy eager optimum[habana]

You can find examples in the documentation and in the examples.

AWS Trainium

Before you begin, make sure you have all the necessary libraries installed :

pip install --upgrade --upgrade-strategy eager optimum[neuronx]

You can find examples in the documentation and in the tutorials.

ONNX Runtime

Before you begin, make sure you have all the necessary libraries installed :

pip install optimum[onnxruntime-training]

You can find examples in the documentation and in the examples.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

opi_optimum-1.26.3.dev0-py3-none-any.whl (104.3 kB view details)

Uploaded Python 3

File details

Details for the file opi_optimum-1.26.3.dev0-py3-none-any.whl.

File metadata

File hashes

Hashes for opi_optimum-1.26.3.dev0-py3-none-any.whl
Algorithm Hash digest
SHA256 143c84a5981d3388854b96c22e96c34fa4d8d19f798b30de66aa3006878ef253
MD5 89b31ccf2cef2f8711279b05ec766a1b
BLAKE2b-256 8081ed69fc3d08a2d0cb92136299a247df66587815d26abce437d6fd3c20bd68

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page