Skip to main content

Optimum Library is an extension of the Hugging Face Transformers library, providing a framework to integrate third-party libraries from Hardware Partners and interface with their specific functionality.

Project description

Optimum Intel

🤗 Optimum Intel is the interface between the 🤗 Transformers library and the different tools and libraries provided by Intel to accelerate end-to-end pipelines on Intel architectures.

Intel Neural Compressor (INC) is an open-source library enabling the usage of the most popular compression techniques such as quantization, pruning and knowledge distillation. It supports automatic accuracy-driven tuning strategies in order for users to easily generate quantized model. The users can easily apply static, dynamic and aware-training quantization approaches while giving an expected accuracy criteria. It also supports different weight pruning techniques enabling the creation of pruned model giving a predefined sparsity target.

Install

To install the latest release of this package:

pip install optimum[intel]

Optimum Intel is a fast-moving project, and you may want to install from source.

pip install git+https://github.com/huggingface/optimum-intel.git

Running the examples

There are a number of examples provided in the examples directory.

Please install the requirements for every example:

cd <example-folder>
pip install -r requirements.txt

How to use it?

Here is an example on how to combine magnitude pruning with dynamic quantization while fine-tuning a DistilBERT on the sst-2 task. Note that quantization is currently only supported for CPUs (only CPU backends are available), so we will not be utilizing GPUs / CUDA in this example.

To apply our pruning methodology, we need to create an instance of IncTrainer, which is very similar to the 🤗 Transformers Trainer. We will fine-tune our model for 3 epochs while applying pruning.

# Initialize our IncTrainer
-from transformers import Trainer
+from optimum.intel.neural_compressor import IncTrainer

-trainer = Trainer(
+trainer = IncTrainer(
    model=model,
    args=TrainingArguments(output_dir, num_train_epochs=3.0),
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    compute_metrics=compute_metrics,
    tokenizer=tokenizer,
    data_collator=default_data_collator,
)

To apply our quantization and pruning methodologies, we first need to create the corresponding configuration describing how we want those methodologies to be applied :

from optimum.intel.neural_compressor import IncOptimizer, IncPruner, IncQuantizer
from optimum.intel.neural_compressor.configuration import IncPruningConfig, IncQuantizationConfig

# The targeted sparsity is set to 10%
target_sparsity = 0.1
config_path = "echarlaix/distilbert-sst2-inc-dynamic-quantization-magnitude-pruning-0.1"
# Load the quantization configuration detailing the quantization we wish to apply
quantization_config = IncQuantizationConfig.from_pretrained(config_path, config_file_name="quantization.yml")
# Load the pruning configuration detailing the pruning we wish to apply
pruning_config = IncPruningConfig.from_pretrained(config_path, config_file_name="prune.yml")

# Instantiate our IncQuantizer using the desired configuration
inc_quantizer = IncQuantizer(model, quantization_config, eval_func=eval_func)
quantizer = inc_quantizer.fit()
# Instantiate our IncPruner using the desired configuration
inc_pruner = IncPruner(model, pruning_config, eval_func=eval_func, train_func=train_func)
pruner = inc_pruner.fit()
inc_optimizer = IncOptimizer(model, quantizer=quantizer, pruner=pruner)
# Apply pruning and quantization 
opt_model = inc_optimizer.fit()

To load a quantized model hosted locally or on the 🤗 hub, you can do as follows :

from optimum.intel.neural_compressor.quantization import IncQuantizedModelForSequenceClassification

loaded_model_from_hub = IncQuantizedModelForSequenceClassification.from_pretrained(
    "echarlaix/distilbert-sst2-inc-dynamic-quantization-magnitude-pruning-0.1"
)

Check out the examples directory for more sophisticated usage.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

optimum-intel-1.2.2.tar.gz (23.6 kB view details)

Uploaded Source

File details

Details for the file optimum-intel-1.2.2.tar.gz.

File metadata

  • Download URL: optimum-intel-1.2.2.tar.gz
  • Upload date:
  • Size: 23.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.13

File hashes

Hashes for optimum-intel-1.2.2.tar.gz
Algorithm Hash digest
SHA256 e4ef989b44808e512fc2e56d0d6882fc2bfba049fa7a7d3bed1a157cb3c59154
MD5 1f949190d7986864542f53b8f132d163
BLAKE2b-256 2a440ce4577bf8df43e8dff8f1db58caf7d933322ecda9598ee49344cd4e6e81

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page