Skip to main content

Repository of Intel® Intel Extension for Transformers

Project description

Intel® Extension for Transformers

An innovative toolkit to accelerate Transformer-based models on Intel platforms

Architecture   |   NeuralChat   |   Cpp inference   |   Examples   |   Documentations


Intel® Extension for Transformers is an innovative toolkit to accelerate Transformer-based models on Intel platforms, in particular effective on 4th Intel Xeon Scalable processor Sapphire Rapids (codenamed Sapphire Rapids). The toolkit provides the below key features and examples:

Installation

Install from Pypi

pip install intel-extension-for-transformers

For more installation method, please refer to Installation Page

Getting Started

Sentiment Analysis with Quantization

Prepare Dataset

from datasets import load_dataset, load_metric
from transformers import AutoConfig,AutoModelForSequenceClassification,AutoTokenizer

raw_datasets = load_dataset("glue", "sst2")
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english")
raw_datasets = raw_datasets.map(lambda e: tokenizer(e['sentence'], truncation=True, padding='max_length', max_length=128), batched=True)

Quantization

from intel_extension_for_transformers.optimization import QuantizationConfig, metrics, objectives
from intel_extension_for_transformers.optimization.trainer import NLPTrainer

config = AutoConfig.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english",num_labels=2)
model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english",config=config)
model.config.label2id = {0: 0, 1: 1}
model.config.id2label = {0: 'NEGATIVE', 1: 'POSITIVE'}
# Replace transformers.Trainer with NLPTrainer
# trainer = transformers.Trainer(...)
trainer = NLPTrainer(model=model, 
    train_dataset=raw_datasets["train"], 
    eval_dataset=raw_datasets["validation"],
    tokenizer=tokenizer
)
q_config = QuantizationConfig(metrics=[metrics.Metric(name="eval_loss", greater_is_better=False)])
model = trainer.quantize(quant_config=q_config)

input = tokenizer("I like Intel Extension for Transformers", return_tensors="pt")
output = model(**input).logits.argmax().item()

For more quick samples, please refer to Get Started Page. For more validated examples, please refer to Support Model Matrix

Validated Performance

Model FP32 BF16 INT8
EleutherAI/gpt-j-6B 4163.67 (ms) 1879.61 (ms) 1612.24 (ms)
CompVis/stable-diffusion-v1-4 10.33 (s) 3.02 (s) N/A

Note*: GPT-J-6B software/hardware configuration please refer to text-generation. Stable-diffusion software/hardware configuration please refer to text-to-image

Documentation

OVERVIEW
Model Compression NeuralChat Neural Engine Kernel Libraries
MODEL COMPRESSION
Quantization Pruning Distillation Orchestration
Neural Architecture Search Export Metrics/Objectives Pipeline
NEURAL ENGINE
Model Compilation Custom Pattern Deployment Profiling
KERNEL LIBRARIES
Sparse GEMM Kernels Custom INT8 Kernels Profiling Benchmark
ALGORITHMS
Length Adaptive Data Augmentation
TUTORIALS AND RESULTS
Tutorials Supported Models Model Performance Kernel Performance

Selected Publications/Events

Additional Content

Research Collaborations

Welcome to raise any interesting research ideas on model compression techniques and feel free to reach us (maintainers). Look forward to our collaborations on Intel Extension for Transformers!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

intel_extension_for_transformers-1.1.tar.gz (37.2 MB view hashes)

Uploaded Source

Built Distributions

intel_extension_for_transformers-1.1-cp310-cp310-win_amd64.whl (9.5 MB view hashes)

Uploaded CPython 3.10 Windows x86-64

intel_extension_for_transformers-1.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (40.6 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

intel_extension_for_transformers-1.1-cp39-cp39-win_amd64.whl (9.5 MB view hashes)

Uploaded CPython 3.9 Windows x86-64

intel_extension_for_transformers-1.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (40.6 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

intel_extension_for_transformers-1.1-cp38-cp38-win_amd64.whl (9.5 MB view hashes)

Uploaded CPython 3.8 Windows x86-64

intel_extension_for_transformers-1.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (40.6 MB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page