Skip to main content

Repository of Intel® Extension for Transformers

Project description

Intel® Extension for Transformers: Accelerating Transformer-based Models on Intel Platforms

Intel® Extension for Transformers is an innovative toolkit to accelerate Transformer-based models on Intel platforms. The toolkit helps developers to improve the productivity through ease-of-use model compression APIs by extending Hugging Face transformers APIs. The compression infrastructure leverages Intel® Neural Compressor which provides a rich set of model compression techniques: quantization, pruning, distillation and so on. The toolkit provides Transformers-accelerated Libraries and Neural Engine to demonstrate the performance of extremely compressed models, and therefore significantly improve the inference efficiency on Intel platforms. Some of the key features have been published in NeurIPS 2021 and 2022.

What does Intel® Extension for Transformers offer?

This toolkit helps developers to improve the productivity of inference deployment by extending Hugging Face transformers APIs for Transformer-based models in natural language processing (NLP) domain. With extremely compressed models, the toolkit can greatly improve the inference efficiency on Intel platforms.

  • Model Compression

    Framework Quantization Pruning/Sparsity Distillation Neural Architecture Search
    PyTorch
    TensorFlow Stay tuned :star:
  • Data Augmentation for NLP Datasets

  • Transformers-accelerated Neural Engine

  • Transformers-accelerated Libraries

  • Domain Algorithms

    Length Adaptive Transformer
    PyTorch ✔
  • Architecture of Intel® Extension for Transformers

arch

Installation

Install Dependency

pip install -r requirements.txt

Install Intel® Extension for Transformers

git clone https://github.com/intel/intel-extension-for-transformers.git intel_extension_for_transformers
cd intel_extension_for_transformers
git submodule update --init --recursive
python setup.py install

Getting Started

Quantization

from intel_extension_for_transformers import QuantizationConfig, metric, objectives
from intel_extension_for_transformers.optimization.trainer import NLPTrainer

# Replace transformers.Trainer with NLPTrainer
# trainer = transformers.Trainer(...)
trainer = NLPTrainer(...)
metric = metrics.Metric(name="eval_f1", is_relative=True, criterion=0.01)
q_config = QuantizationConfig(
    approach="PostTrainingStatic",
    metrics=[metric],
    objectives=[objectives.performance]
)
model = trainer.quantize(quant_config=q_config)

Please refer to quantization document for more details.

Pruning

from intel_extension_for_transformers import PrunerConfig, PruningConfig
from intel_extension_for_transformers.optimization.trainer import NLPTrainer

# Replace transformers.Trainer with NLPTrainer
# trainer = transformers.Trainer(...)
trainer = NLPTrainer(...)
metric = metrics.Metric(name="eval_accuracy")
pruner_config = PrunerConfig(prune_type='BasicMagnitude', target_sparsity_ratio=0.9)
p_conf = PruningConfig(pruner_config=[pruner_config], metrics=metric)
model = trainer.prune(pruning_config=p_conf)

Please refer to pruning document for more details.

Distillation

from intel_extension_for_transformers import DistillationConfig, Criterion
from intel_extension_for_transformers.optimization.trainer import NLPTrainer

# Replace transformers.Trainer with NLPTrainer
# trainer = transformers.Trainer(...)
teacher_model = ... # exist model
trainer = NLPTrainer(...)
metric = metrics.Metric(name="eval_accuracy")
d_conf = DistillationConfig(metrics=metric)
model = trainer.distill(distillation_config=d_conf, teacher_model=teacher_model)

Please refer to distillation document for more details.

Data Augmentation

Data augmentation provides the facilities to generate synthesized NLP dataset for further model optimization. The data augmentation supports text generation on popular fine-tuned models like GPT, GPT2, and other text synthesis approaches from nlpaug.

from intel_extension_for_transformers.preprocessing.data_augmentation import DataAugmentation
aug = DataAugmentation(augmenter_type="TextGenerationAug")
aug.input_dataset = "original_dataset.csv" # example: https://huggingface.co/datasets/glue/viewer/sst2/train
aug.column_names = "sentence"
aug.output_path = os.path.join(self.result_path, "test2.cvs")
aug.augmenter_arguments = {'model_name_or_path': 'gpt2-medium'}
aug.data_augment()
raw_datasets = load_dataset("csv", data_files=aug.output_path, delimiter="\t", split="train")

Please refer to data augmentation document for more details.

Quantized Length Adaptive Transformer

Quantized Length Adaptive Transformer leverages sequence-length reduction and low-bit representation techniques to further enhance model inference performance, enabling adaptive sequence-length sizes to accommodate different computational budget requirements with an optimal accuracy efficiency tradeoff.

from intel_extension_for_transformers import QuantizationConfig, DynamicLengthConfig, metric, objectives
from intel_extension_for_transformers.optimization.trainer import NLPTrainer

# Replace transformers.Trainer with NLPTrainer
# trainer = transformers.Trainer(...)
trainer = NLPTrainer(...)
metric = metrics.Metric(name="eval_f1", is_relative=True, criterion=0.01)
q_config = QuantizationConfig(
    approach="PostTrainingStatic",
    metrics=[metric],
    objectives=[objectives.performance]
)
# Apply the length config
dynamic_length_config = DynamicLengthConfig(length_config=length_config)
trainer.set_dynamic_config(dynamic_config=dynamic_length_config)
# Quantization
model = trainer.quantize(quant_config=q_config)

Please refer to paper QuaLA-MiniLM and code for details

Transformers-accelerated Neural Engine

Transformers-accelerated Neural Engine is one of reference deployments that Intel® Extension for Transformers provides. Neural Engine aims to demonstrate the optimal performance of extremely compressed NLP models by exploring the optimization opportunities from both HW and SW.

from intel_extension_for_transformers.backends.neural_engine.compile import compile
# /path/to/your/model is a TensorFlow pb model or ONNX model
model = compile('/path/to/your/model')
inputs = ... # [input_ids, segment_ids, input_mask]
model.inference(inputs)

Please refer to example in Transformers-accelerated Neural Engine and paper Fast Distilbert on CPUs for more details.

Transformers-accelerated Libraries

Transformers-accelerated Libraries is a high-performance operator computing library implemented by assembly. Transformers-accelerated Libraries contains a JIT domain, a kernel domain, and a scheduling proxy framework.

#include "interface.hpp"
  ...
  operator_desc op_desc(ker_kind, ker_prop, eng_kind, ts_descs, op_attrs);
  sparse_matmul_desc spmm_desc(op_desc);
  sparse_matmul spmm_kern(spmm_desc);
  std::vector<const void*> rt_data = {data0, data1, data2, data3, data4};
  spmm_kern.execute(rt_data);

Please refer to Transformers-accelerated Libraries for more details.

System Requirements

Validated Hardware Environment

Intel® Extension for Transformers supports systems based on Intel 64 architecture or compatible processors that are specifically optimized for the following CPUs:

  • Intel Xeon Scalable processor (formerly Cascade Lake, Icelake)
  • Future Intel Xeon Scalable processor (code name Sapphire Rapids)

Validated Software Environment

  • OS version: CentOS 8.4, Ubuntu 20.04
  • Python version: 3.7, 3.8, 3.9, 3.10
Framework TensorFlow Intel TensorFlow PyTorch IPEX
Version 2.10.0
2.9.1
2.10.0
2.9.1
1.13.0+cpu
1.12.0+cpu
1.13.0
1.12.0

Selected Publications/Events

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

intel_extension_for_transformers-1.0a0.tar.gz (151.6 kB view details)

Uploaded Source

Built Distributions

intel_extension_for_transformers-1.0a0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (37.4 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

intel_extension_for_transformers-1.0a0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (37.4 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

intel_extension_for_transformers-1.0a0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (37.7 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

intel_extension_for_transformers-1.0a0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (37.4 MB view details)

Uploaded CPython 3.7m manylinux: glibc 2.17+ x86-64

File details

Details for the file intel_extension_for_transformers-1.0a0.tar.gz.

File metadata

File hashes

Hashes for intel_extension_for_transformers-1.0a0.tar.gz
Algorithm Hash digest
SHA256 4788e5bb3537daadfb04b8910b1fd6e5e28c8b7b919be7c63f58d354e7e7d8d1
MD5 cfef24c03f580d18e0de71c4ca748981
BLAKE2b-256 c1cc4f7d1fe237f2fc0f67719e627a6a13805d17f7a74bb6b1b573dcd96fa422

See more details on using hashes here.

File details

Details for the file intel_extension_for_transformers-1.0a0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for intel_extension_for_transformers-1.0a0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 ef837615978662a907f9ed032a8019070df048ac8ce0f2ccb4f122975d4949b4
MD5 d071e37c207760f317ff50c517ce9e91
BLAKE2b-256 ce71f5afa8fb9c1f649dc2515b1bceeb2e9652afdb5141e8fce0cf320a4b8b2a

See more details on using hashes here.

File details

Details for the file intel_extension_for_transformers-1.0a0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for intel_extension_for_transformers-1.0a0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 87f6a38df6d809dcd727dbf5b0481b427b608fa1e580131e4cb929d2fd618ea7
MD5 8f7e7700aaed43ed8b8dfbf62fa50daf
BLAKE2b-256 6f12603ac40e63bbb10723d120d2627ad1c88ed068ac252b5c965981d9d421af

See more details on using hashes here.

File details

Details for the file intel_extension_for_transformers-1.0a0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for intel_extension_for_transformers-1.0a0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 7c83d538e4f719a92a01a5cbd113a51cb8050de10f2cd79936ba4962918ec33f
MD5 630df57e14bef2713d1c68dc4a5b9492
BLAKE2b-256 86ea8b79326381a03c32ead4a8ff4669a193a3358a8f2fca5709b22d50a4f7e7

See more details on using hashes here.

File details

Details for the file intel_extension_for_transformers-1.0a0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for intel_extension_for_transformers-1.0a0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 ac4f75446b9aefeb1829b3d57a00d040da9b73a3fcab0fb032571c94caaad738
MD5 0110eb343aa6efbdbe8fdc1e73471db9
BLAKE2b-256 55139f345f716a00fd32a03aa6a684a09a6af987833204fc91c78ed5ea382a39

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page