Skip to main content

Repository of Intel® Intel Extension for Transformers

Project description

Intel® Extension for Transformers

An innovative toolkit to accelerate Transformer-based models on Intel platforms

Architecture   |   NeuralChat   |   Cpp inference   |   Examples   |   Documentations


Intel® Extension for Transformers is an innovative toolkit to accelerate Transformer-based models on Intel platforms, in particular effective on 4th Intel Xeon Scalable processor Sapphire Rapids (codenamed Sapphire Rapids). The toolkit provides the below key features and examples:

Installation

Install from Pypi

pip install intel-extension-for-transformers

For more installation method, please refer to Installation Page

Getting Started

Sentiment Analysis with Quantization

Prepare Dataset

from datasets import load_dataset, load_metric
from transformers import AutoConfig,AutoModelForSequenceClassification,AutoTokenizer

raw_datasets = load_dataset("glue", "sst2")
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english")
raw_datasets = raw_datasets.map(lambda e: tokenizer(e['sentence'], truncation=True, padding='max_length', max_length=128), batched=True)

Quantization

from intel_extension_for_transformers.optimization import QuantizationConfig, metrics, objectives
from intel_extension_for_transformers.optimization.trainer import NLPTrainer

config = AutoConfig.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english",num_labels=2)
model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english",config=config)
model.config.label2id = {0: 0, 1: 1}
model.config.id2label = {0: 'NEGATIVE', 1: 'POSITIVE'}
# Replace transformers.Trainer with NLPTrainer
# trainer = transformers.Trainer(...)
trainer = NLPTrainer(model=model, 
    train_dataset=raw_datasets["train"], 
    eval_dataset=raw_datasets["validation"],
    tokenizer=tokenizer
)
q_config = QuantizationConfig(metrics=[metrics.Metric(name="eval_loss", greater_is_better=False)])
model = trainer.quantize(quant_config=q_config)

input = tokenizer("I like Intel Extension for Transformers", return_tensors="pt")
output = model(**input).logits.argmax().item()

For more quick samples, please refer to Get Started Page. For more validated examples, please refer to Support Model Matrix

Validated Performance

Model FP32 BF16 INT8
EleutherAI/gpt-j-6B 4163.67 (ms) 1879.61 (ms) 1612.24 (ms)
CompVis/stable-diffusion-v1-4 10.33 (s) 3.02 (s) N/A

Note*: GPT-J-6B software/hardware configuration please refer to text-generation. Stable-diffusion software/hardware configuration please refer to text-to-image

Documentation

OVERVIEW
Model Compression NeuralChat Neural Engine Kernel Libraries
MODEL COMPRESSION
Quantization Pruning Distillation Orchestration
Neural Architecture Search Export Metrics/Objectives Pipeline
NEURAL ENGINE
Model Compilation Custom Pattern Deployment Profiling
KERNEL LIBRARIES
Sparse GEMM Kernels Custom INT8 Kernels Profiling Benchmark
ALGORITHMS
Length Adaptive Data Augmentation
TUTORIALS AND RESULTS
Tutorials Supported Models Model Performance Kernel Performance

Selected Publications/Events

Additional Content

Research Collaborations

Welcome to raise any interesting research ideas on model compression techniques and feel free to reach us (maintainers). Look forward to our collaborations on Intel Extension for Transformers!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

intel_extension_for_transformers-1.1.tar.gz (37.2 MB view details)

Uploaded Source

Built Distributions

intel_extension_for_transformers-1.1-cp310-cp310-win_amd64.whl (9.5 MB view details)

Uploaded CPython 3.10 Windows x86-64

intel_extension_for_transformers-1.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (40.6 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

intel_extension_for_transformers-1.1-cp39-cp39-win_amd64.whl (9.5 MB view details)

Uploaded CPython 3.9 Windows x86-64

intel_extension_for_transformers-1.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (40.6 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

intel_extension_for_transformers-1.1-cp38-cp38-win_amd64.whl (9.5 MB view details)

Uploaded CPython 3.8 Windows x86-64

intel_extension_for_transformers-1.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (40.6 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

File details

Details for the file intel_extension_for_transformers-1.1.tar.gz.

File metadata

File hashes

Hashes for intel_extension_for_transformers-1.1.tar.gz
Algorithm Hash digest
SHA256 6516da459205d8a6fad6956b9e72e3368448d829bcb18f5722561b1045a1f38f
MD5 faa71cebd4139382ad113834dd27a478
BLAKE2b-256 439c8879e163f6f8da83a68c6d306403bf6517e979f03ca31f1f4b2ed3787ddc

See more details on using hashes here.

File details

Details for the file intel_extension_for_transformers-1.1-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for intel_extension_for_transformers-1.1-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 eb5df1ed3131eac1e3d5d8ed1908482298502004388cb17b8a110da992597fd7
MD5 a2c032c567f36550ecd1106e1d552cf4
BLAKE2b-256 b338a8329c259401feca61507956cde9e5049cdc3e7fb335a9035b4711e8f3e9

See more details on using hashes here.

File details

Details for the file intel_extension_for_transformers-1.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for intel_extension_for_transformers-1.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 be821dec6f2e2cb0c605aab4f374ea86f455d615832a13e51763e5f8596032a0
MD5 2946235cfce62e3ba58538435c7c85bf
BLAKE2b-256 3bfb63f43cbf43a2be8dff8514614783218b9c851e1e47eb9e94267567aa1136

See more details on using hashes here.

File details

Details for the file intel_extension_for_transformers-1.1-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for intel_extension_for_transformers-1.1-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 32785b86ab0b0e2ca8da2452da1ccf3ecf67a3465577f3afb93643f7801a5652
MD5 a3260313111cbe5fe15cac5e87ad795f
BLAKE2b-256 0a7ab49bff084b474da2366b15e382664787bfc5c5b14a84f5f5650377461405

See more details on using hashes here.

File details

Details for the file intel_extension_for_transformers-1.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for intel_extension_for_transformers-1.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 96267c4e21a9507bda18597406e59d573e84aaae32aec87141522addb1e4b4b7
MD5 e17cc0fa103e70209562c687c5143e37
BLAKE2b-256 60b687ca03dc13f18fdaa640aad7a21bd615f6e93f903051c3c740dac44a486c

See more details on using hashes here.

File details

Details for the file intel_extension_for_transformers-1.1-cp38-cp38-win_amd64.whl.

File metadata

File hashes

Hashes for intel_extension_for_transformers-1.1-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 e85691a4cd206a0d06d9dfcd28bf89e18add1f51c6f4cdd460cc9fe236211b49
MD5 2bbbb2ca4d860f83c99174f42ade093e
BLAKE2b-256 2de92ab402464b234883350219075afe0646223c844fc5b09ba10c17d5b4c7a0

See more details on using hashes here.

File details

Details for the file intel_extension_for_transformers-1.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for intel_extension_for_transformers-1.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 fbeadb102a09c1db999dc1cf79be1cd85ef8e4a1f7c128942b9ffe688406c9ab
MD5 5d17525d60b2aa78b71fb7c5dad9c98c
BLAKE2b-256 fc20156c37e51949b4750f988b4a19ccd22ec52ac6e899221b3194a070b4f5c5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page