Optimum Library is an extension of the Hugging Face Transformers library, providing a framework to integrate third-party libraries from Hardware Partners and interface with their specific functionality.

These details have not been verified by PyPI

Project links

Homepage

Project description

Hugging Face Optimum

🤗 Optimum is an extension of 🤗 Transformers and Diffusers, providing a set of optimization tools enabling maximum efficiency to train and run models on targeted hardware, while keeping things easy to use.

Installation

🤗 Optimum can be installed using pip as follows:

python -m pip install optimum

If you'd like to use the accelerator-specific features of 🤗 Optimum, you can install the required dependencies according to the table below:

Accelerator	Installation
ONNX Runtime	`python -m pip install optimum[onnxruntime]`
Intel Neural Compressor	`python -m pip install optimum[neural-compressor]`
OpenVINO	`python -m pip install optimum[openvino,nncf]`
Habana Gaudi Processor (HPU)	`python -m pip install optimum[habana]`

To install from source:

python -m pip install git+https://github.com/huggingface/optimum.git

For the accelerator-specific features, append #egg=optimum[accelerator_type] to the above command:

python -m pip install git+https://github.com/huggingface/optimum.git#egg=optimum[onnxruntime]

Accelerated Inference

🤗 Optimum provides multiple tools to export and run optimized models on various ecosystems:

ONNX / ONNX Runtime
TensorFlow Lite
OpenVINO
Habana first-gen Gaudi / Gaudi2, more details here

The export and optimizations can be done both programmatically and with a command line.

Features summary

Features	ONNX Runtime	Neural Compressor	OpenVINO	TensorFlow Lite
Graph optimization	:heavy_check_mark:	N/A	:heavy_check_mark:	N/A
Post-training dynamic quantization	:heavy_check_mark:	:heavy_check_mark:	N/A	:heavy_check_mark:
Post-training static quantization	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:
Quantization Aware Training (QAT)	N/A	:heavy_check_mark:	:heavy_check_mark:	N/A
FP16 (half precision)	:heavy_check_mark:	N/A	:heavy_check_mark:	:heavy_check_mark:
Pruning	N/A	:heavy_check_mark:	:heavy_check_mark:	N/A
Knowledge Distillation	N/A	:heavy_check_mark:	:heavy_check_mark:	N/A

ONNX + ONNX Runtime

It is possible to export 🤗 Transformers models to the ONNX format and perform graph optimization as well as quantization easily:

optimum-cli export onnx -m deepset/roberta-base-squad2 --optimize O2 roberta_base_qa_onnx

The model can then be quantized using onnxruntime:

optimum-cli onnxruntime quantize \
  --avx512 \
  --onnx_model roberta_base_qa_onnx \
  -o quantized_roberta_base_qa_onnx

These commands will export deepset/roberta-base-squad2 and perform O2 graph optimization on the exported model, and finally quantize it with the avx512 configuration.

For more information on the ONNX export, please check the documentation.

Run the exported model using ONNX Runtime

Once the model is exported to the ONNX format, we provide Python classes enabling you to run the exported ONNX model in a seemless manner using ONNX Runtime in the backend:

from transformers import AutoTokenizer
from optimum.onnxruntime import ORTModelForQuestionAnswering

model_name = "roberta_base_qa_onnx"
tokenizer = AutoTokenizer.from_pretrained(model_name)
ort_model = ORTModelForQuestionAnswering.from_pretrained(model_name)

question = "What's Optimum?"
text = "Optimum is an awesome library everyone should use!"
inputs = tokenizer(question, text, return_tensors="pt") 

# Run with ONNX Runtime.
outputs = ort_model(**inputs)

answer_start_index = outputs.start_logits.argmax()
answer_end_index = outputs.end_logits.argmax()

predict_answer_tokens = inputs.input_ids[0, answer_start_index : answer_end_index + 1]
answer = tokenizer.decode(predict_answer_tokens, skip_special_tokens=True)

More details on how to run ONNX models with ORTModelForXXX classes here.

TensorFlow Lite

Just as for ONNX, it is possible to export models to TensorFlow Lite and quantize them:

optimum-cli export tflite \
  -m deepset/roberta-base-squad2 \
  --sequence_length 384  \
  --quantize int8-dynamic roberta_tflite_model

OpenVINO

This requires to install the Optimum OpenVINO extra by doing pip install optimum[openvino,nncf].

To load a model and run inference with OpenVINO Runtime, you can just replace your AutoModelForXxx class with the corresponding OVModelForXxx class. To load a PyTorch checkpoint and convert it to the OpenVINO format on-the-fly, you can set export=True when loading your model.

- from transformers import AutoModelForSequenceClassification
+ from optimum.intel import OVModelForSequenceClassification
  from transformers import AutoTokenizer, pipeline

  model_id = "distilbert-base-uncased-finetuned-sst-2-english"
  tokenizer = AutoTokenizer.from_pretrained(model_id)
- model = AutoModelForSequenceClassification.from_pretrained(model_id)
+ model = OVModelForSequenceClassification.from_pretrained(model_id, export=True)
  model.save_pretrained("./distilbert")

  classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)
  results = classifier("He's a dreadful magician.")

You can find more examples in the documentation and in the examples.

Accelerated training

🤗 Optimum provides wrappers around the original 🤗 Transformers Trainer to enable training on powerful hardware easily. We support many providers:

Habana's Gaudi processors
ONNX Runtime (optimized for GPUs)

Habana

- from transformers import Trainer, TrainingArguments
+ from optimum.habana import GaudiTrainer, GaudiTrainingArguments

  # Download a pretrained model from the Hub
  model = AutoModelForXxx.from_pretrained("bert-base-uncased")

  # Define the training arguments
- training_args = TrainingArguments(
+ training_args = GaudiTrainingArguments(
      output_dir="path/to/save/folder/",
+     use_habana=True,
+     use_lazy_mode=True,
+     gaudi_config_name="Habana/bert-base-uncased",
      ...
  )

  # Initialize the trainer
- trainer = Trainer(
+ trainer = GaudiTrainer(
      model=model,
      args=training_args,
      train_dataset=train_dataset,
      ...
  )

  # Use Habana Gaudi processor for training!
  trainer.train()

You can find more examples in the documentation and in the examples.

ONNX Runtime

- from transformers import Trainer, TrainingArguments
+ from optimum.onnxruntime import ORTTrainer, ORTTrainingArguments

  # Download a pretrained model from the Hub
  model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased")

  # Define the training arguments
- training_args = TrainingArguments(
+ training_args = ORTTrainingArguments(
      output_dir="path/to/save/folder/",
      optim="adamw_ort_fused",
      ...
  )

  # Create a ONNX Runtime Trainer
- trainer = Trainer(
+ trainer = ORTTrainer(
      model=model,
      args=training_args,
      train_dataset=train_dataset,
+     feature="sequence-classification", # The model type to export to ONNX
      ...
  )

  # Use ONNX Runtime for training!
  trainer.train()

You can find more examples in the documentation and in the examples.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

2.1.0

Dec 19, 2025

2.0.0

Oct 9, 2025

1.27.0

Jul 30, 2025

1.26.1

Jun 13, 2025

1.26.0

Jun 13, 2025

1.25.3

May 16, 2025

1.25.2

May 15, 2025

1.25.1

May 15, 2025

1.25.0

May 13, 2025

1.24.0

Jan 30, 2025

1.23.3

Oct 29, 2024

1.23.2

Oct 22, 2024

1.23.1

Oct 11, 2024

1.23.0

Oct 10, 2024

1.22.0

Sep 10, 2024

1.21.4

Aug 16, 2024

1.21.3

Aug 6, 2024

1.21.2

Jul 5, 2024

1.21.1

Jul 2, 2024

1.21.0

Jul 2, 2024

1.20.0

May 29, 2024

1.19.2

May 9, 2024

1.19.1

Apr 24, 2024

1.19.0

Apr 16, 2024

1.18.1

Apr 9, 2024

1.18.0

Mar 25, 2024

1.17.1

Feb 18, 2024

1.17.0

Feb 16, 2024

1.16.2

Jan 19, 2024

1.16.1

Dec 15, 2023

1.16.0

Dec 13, 2023

1.15.0

Dec 6, 2023

1.14.1

Nov 14, 2023

1.14.0

Nov 6, 2023

1.13.3

Nov 3, 2023

1.13.2

Sep 21, 2023

1.13.1

Sep 8, 2023

1.13.0

Sep 8, 2023

1.12.0

Aug 23, 2023

1.11.2

Aug 17, 2023

1.11.1

Aug 11, 2023

1.11.0

Aug 3, 2023

1.10.1

Jul 27, 2023

1.10.0

Jul 25, 2023

1.9.1

Jul 7, 2023

1.9.0

Jun 30, 2023

1.8.8

Jun 16, 2023

1.8.7

Jun 10, 2023

1.8.6

May 18, 2023

This version

1.8.5

May 11, 2023

1.8.4

May 7, 2023

1.8.3

Apr 28, 2023

1.8.2

Apr 17, 2023

1.8.1

Apr 17, 2023

1.8.0

Apr 17, 2023

1.7.3

Mar 23, 2023

1.7.2

Mar 23, 2023

1.7.1

Mar 3, 2023

1.7.0

Mar 2, 2023

1.6.4

Feb 13, 2023

1.6.3

Jan 25, 2023

1.6.2

Jan 25, 2023

1.6.1

Dec 23, 2022

1.6.0

Dec 23, 2022

1.5.2

Dec 19, 2022

1.5.1

Nov 24, 2022

1.5.0

Nov 17, 2022

1.4.1

Oct 25, 2022

1.4.0

Sep 8, 2022

1.3.0

Jul 12, 2022

1.2.3

Jun 13, 2022

1.2.2

Jun 2, 2022

1.2.1

May 12, 2022

1.2.0

May 10, 2022

1.1.1

Apr 26, 2022

1.1.0

Apr 1, 2022

1.0.0

Feb 23, 2022

0.1.3

Dec 23, 2021

0.1.2

Dec 8, 2021

0.1.1

Nov 5, 2021

0.1.0

Nov 5, 2021

0.0.1

Sep 14, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

optimum-1.8.5.tar.gz (245.5 kB view details)

Uploaded May 11, 2023 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

optimum-1.8.5-py3-none-any.whl (318.1 kB view details)

Uploaded May 11, 2023 Python 3

File details

Details for the file optimum-1.8.5.tar.gz.

File metadata

Download URL: optimum-1.8.5.tar.gz
Upload date: May 11, 2023
Size: 245.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.8.10

File hashes

Hashes for optimum-1.8.5.tar.gz
Algorithm	Hash digest
SHA256	`e7579017e3120c8ab7bd71d090bedd0fb44d971be28ed7b7dd0084b7603d34b2`
MD5	`638ef2ec2fbb67a94539d3e4f26852aa`
BLAKE2b-256	`e7131741402fb005970ef936f0b41332e06f8863588e953706a20a3df92706db`

See more details on using hashes here.

File details

Details for the file optimum-1.8.5-py3-none-any.whl.

File metadata

Download URL: optimum-1.8.5-py3-none-any.whl
Upload date: May 11, 2023
Size: 318.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.8.10

File hashes

Hashes for optimum-1.8.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`030c75060ed6707472517bd35c7cc1595f82eb12e591cb12ea1127be8b7dd4ac`
MD5	`bba9b2b6173f11d75d70c48ff75a0d46`
BLAKE2b-256	`0e227d8e523ff2db94a49edf77747d12c57bd9a561bad45c276abf372513e604`

See more details on using hashes here.

optimum 1.8.5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Hugging Face Optimum

Installation

Accelerated Inference

Features summary

ONNX + ONNX Runtime

Run the exported model using ONNX Runtime

TensorFlow Lite

OpenVINO

Accelerated training

Habana

ONNX Runtime

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes