Optimum Library is an extension of the Hugging Face Transformers library, providing a framework to integrate third-party libraries from Hardware Partners and interface with their specific functionality.
Project description
Hugging Face Optimum
🤗 Optimum is an extension of 🤗 Transformers and Diffusers, providing a set of optimization tools enabling maximum efficiency to train and run models on targeted hardware, while keeping things easy to use.
Installation
🤗 Optimum can be installed using pip
as follows:
python -m pip install optimum
If you'd like to use the accelerator-specific features of 🤗 Optimum, you can install the required dependencies according to the table below:
Accelerator | Installation |
---|---|
ONNX Runtime | python -m pip install optimum[onnxruntime] |
Intel Neural Compressor | python -m pip install optimum[neural-compressor] |
OpenVINO | python -m pip install optimum[openvino,nncf] |
Habana Gaudi Processor (HPU) | python -m pip install optimum[habana] |
To install from source:
python -m pip install git+https://github.com/huggingface/optimum.git
For the accelerator-specific features, append #egg=optimum[accelerator_type]
to the above command:
python -m pip install git+https://github.com/huggingface/optimum.git#egg=optimum[onnxruntime]
Accelerated Inference
🤗 Optimum provides multiple tools to export and run optimized models on various ecosystems:
- ONNX / ONNX Runtime
- TensorFlow Lite
- OpenVINO
- Habana first-gen Gaudi / Gaudi2, more details here
The export and optimizations can be done both programmatically and with a command line.
Features summary
Features | ONNX Runtime | Neural Compressor | OpenVINO | TensorFlow Lite |
---|---|---|---|---|
Graph optimization | :heavy_check_mark: | N/A | :heavy_check_mark: | N/A |
Post-training dynamic quantization | :heavy_check_mark: | :heavy_check_mark: | N/A | :heavy_check_mark: |
Post-training static quantization | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |
Quantization Aware Training (QAT) | N/A | :heavy_check_mark: | :heavy_check_mark: | N/A |
FP16 (half precision) | :heavy_check_mark: | N/A | :heavy_check_mark: | :heavy_check_mark: |
Pruning | N/A | :heavy_check_mark: | :heavy_check_mark: | N/A |
Knowledge Distillation | N/A | :heavy_check_mark: | :heavy_check_mark: | N/A |
ONNX + ONNX Runtime
It is possible to export 🤗 Transformers models to the ONNX format and perform graph optimization as well as quantization easily:
optimum-cli export onnx -m deepset/roberta-base-squad2 --optimize O2 roberta_base_qa_onnx
The model can then be quantized using onnxruntime
:
optimum-cli onnxruntime quantize \
--avx512 \
--onnx_model roberta_base_qa_onnx \
-o quantized_roberta_base_qa_onnx
These commands will export deepset/roberta-base-squad2
and perform O2 graph optimization on the exported model, and finally quantize it with the avx512 configuration.
For more information on the ONNX export, please check the documentation.
Run the exported model using ONNX Runtime
Once the model is exported to the ONNX format, we provide Python classes enabling you to run the exported ONNX model in a seemless manner using ONNX Runtime in the backend:
from transformers import AutoTokenizer
from optimum.onnxruntime import ORTModelForQuestionAnswering
model_name = "roberta_base_qa_onnx"
tokenizer = AutoTokenizer.from_pretrained(model_name)
ort_model = ORTModelForQuestionAnswering.from_pretrained(model_name)
question = "What's Optimum?"
text = "Optimum is an awesome library everyone should use!"
inputs = tokenizer(question, text, return_tensors="pt")
# Run with ONNX Runtime.
outputs = ort_model(**inputs)
answer_start_index = outputs.start_logits.argmax()
answer_end_index = outputs.end_logits.argmax()
predict_answer_tokens = inputs.input_ids[0, answer_start_index : answer_end_index + 1]
answer = tokenizer.decode(predict_answer_tokens, skip_special_tokens=True)
More details on how to run ONNX models with ORTModelForXXX
classes here.
TensorFlow Lite
Just as for ONNX, it is possible to export models to TensorFlow Lite and quantize them:
optimum-cli export tflite \
-m deepset/roberta-base-squad2 \
--sequence_length 384 \
--quantize int8-dynamic roberta_tflite_model
OpenVINO
This requires to install the Optimum OpenVINO extra by doing pip install optimum[openvino,nncf]
.
To load a model and run inference with OpenVINO Runtime, you can just replace your AutoModelForXxx
class with the corresponding OVModelForXxx
class. To load a PyTorch checkpoint and convert it to the OpenVINO format on-the-fly, you can set export=True
when loading your model.
- from transformers import AutoModelForSequenceClassification
+ from optimum.intel import OVModelForSequenceClassification
from transformers import AutoTokenizer, pipeline
model_id = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(model_id)
- model = AutoModelForSequenceClassification.from_pretrained(model_id)
+ model = OVModelForSequenceClassification.from_pretrained(model_id, export=True)
model.save_pretrained("./distilbert")
classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)
results = classifier("He's a dreadful magician.")
You can find more examples in the documentation and in the examples.
Accelerated training
🤗 Optimum provides wrappers around the original 🤗 Transformers Trainer to enable training on powerful hardware easily. We support many providers:
- Habana's Gaudi processors
- ONNX Runtime (optimized for GPUs)
Habana
- from transformers import Trainer, TrainingArguments
+ from optimum.habana import GaudiTrainer, GaudiTrainingArguments
# Download a pretrained model from the Hub
model = AutoModelForXxx.from_pretrained("bert-base-uncased")
# Define the training arguments
- training_args = TrainingArguments(
+ training_args = GaudiTrainingArguments(
output_dir="path/to/save/folder/",
+ use_habana=True,
+ use_lazy_mode=True,
+ gaudi_config_name="Habana/bert-base-uncased",
...
)
# Initialize the trainer
- trainer = Trainer(
+ trainer = GaudiTrainer(
model=model,
args=training_args,
train_dataset=train_dataset,
...
)
# Use Habana Gaudi processor for training!
trainer.train()
You can find more examples in the documentation and in the examples.
ONNX Runtime
- from transformers import Trainer, TrainingArguments
+ from optimum.onnxruntime import ORTTrainer, ORTTrainingArguments
# Download a pretrained model from the Hub
model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased")
# Define the training arguments
- training_args = TrainingArguments(
+ training_args = ORTTrainingArguments(
output_dir="path/to/save/folder/",
optim="adamw_ort_fused",
...
)
# Create a ONNX Runtime Trainer
- trainer = Trainer(
+ trainer = ORTTrainer(
model=model,
args=training_args,
train_dataset=train_dataset,
+ feature="sequence-classification", # The model type to export to ONNX
...
)
# Use ONNX Runtime for training!
trainer.train()
You can find more examples in the documentation and in the examples.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.