Repository of Intel® Intel Extension for Transformers
Project description
Intel® Extension for Transformers
An innovative toolkit to accelerate Transformer-based models on Intel platforms
Intel® Extension for Transformers is an innovative toolkit to accelerate Transformer-based models on Intel platforms, in particular effective on 4th Intel Xeon Scalable processor Sapphire Rapids (codenamed Sapphire Rapids). The toolkit provides the below key features and examples:
-
Seamless user experience of model compressions on Transformer-based models by extending Hugging Face transformers APIs and leveraging Intel® Neural Compressor
-
Advanced software optimizations and unique compression-aware runtime (released with NeurIPS 2022's paper Fast Distilbert on CPUs and QuaLA-MiniLM: a Quantized Length Adaptive MiniLM, and NeurIPS 2021's paper Prune Once for All: Sparse Pre-Trained Language Models)
-
Optimized Transformer-based model packages such as Stable Diffusion, GPT-J-6B, GPT-NEOX, BLOOM-176B, T5, Flan-T5 and end-to-end workflows such as SetFit-based text classification and document level sentiment analysis (DLSA)
-
NeuralChat, a custom Chatbot trained on Intel CPUs through parameter-efficient fine-tuning PEFT on domain knowledge
Installation
Install from Pypi
pip install intel-extension-for-transformers
For more installation method, please refer to Installation Page
Getting Started
Sentiment Analysis with Quantization
Prepare Dataset
from datasets import load_dataset, load_metric
from transformers import AutoConfig,AutoModelForSequenceClassification,AutoTokenizer
raw_datasets = load_dataset("glue", "sst2")
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english")
raw_datasets = raw_datasets.map(lambda e: tokenizer(e['sentence'], truncation=True, padding='max_length', max_length=128), batched=True)
Quantization
from intel_extension_for_transformers.optimization import QuantizationConfig, metrics, objectives
from intel_extension_for_transformers.optimization.trainer import NLPTrainer
config = AutoConfig.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english",num_labels=2)
model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english",config=config)
model.config.label2id = {0: 0, 1: 1}
model.config.id2label = {0: 'NEGATIVE', 1: 'POSITIVE'}
# Replace transformers.Trainer with NLPTrainer
# trainer = transformers.Trainer(...)
trainer = NLPTrainer(model=model,
train_dataset=raw_datasets["train"],
eval_dataset=raw_datasets["validation"],
tokenizer=tokenizer
)
q_config = QuantizationConfig(metrics=[metrics.Metric(name="eval_loss", greater_is_better=False)])
model = trainer.quantize(quant_config=q_config)
input = tokenizer("I like Intel Extension for Transformers", return_tensors="pt")
output = model(**input).logits.argmax().item()
For more quick samples, please refer to Get Started Page. For more validated examples, please refer to Support Model Matrix
Documentation
OVERVIEW | |||||||
---|---|---|---|---|---|---|---|
Model Compression | NeuralChat | Neural Engine | Kernel Libraries | ||||
MODEL COMPRESSION | |||||||
Quantization | Pruning | Distillation | Orchestration | ||||
Neural Architecture Search | Export | Metrics/Objectives | Pipeline | ||||
NEURAL ENGINE | |||||||
Model Compilation | Custom Pattern | Deployment | Profiling | ||||
KERNEL LIBRARIES | |||||||
Sparse GEMM Kernels | Custom INT8 Kernels | Profiling | Benchmark | ||||
ALGORITHMS | |||||||
Length Adaptive | Data Augmentation | ||||||
TUTORIALS AND RESULTS | |||||||
Tutorials | Supported Models | Model Performance | Kernel Performance |
Selected Publications/Events
- Blog of Tech-Innovation Artificial-Intelligence(AI): Intel® Xeon® Processors Are Still the Only CPU With MLPerf Results, Raising the Bar By 5x - Intel Communities (April 2023)
- Blog published on Medium: MLefficiency — Optimizing transformer models for efficiency (Dec 2022)
- NeurIPS'2022: Fast Distilbert on CPUs (Nov 2022)
- NeurIPS'2022: QuaLA-MiniLM: a Quantized Length Adaptive MiniLM (Nov 2022)
- Blog published by Cohere: Top NLP Papers—November 2022 (Nov 2022)
- Blog published by Alibaba: Deep learning inference optimization for Address Purification (Aug 2022)
- NeurIPS'2021: Prune Once for All: Sparse Pre-Trained Language Models (Nov 2021)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for intel_extension_for_transformers-1.0.1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | a8d8bae8a2633f126304e082da0a9cdce9db1d3b719acd5e5a29f8b4f2f8f7dd |
|
MD5 | c8edf6c333383c58331ae3594752b65d |
|
BLAKE2b-256 | 902888eb93a441d38af442614732e1da4ebaae5cee1c77181d55002741b6ead7 |
Hashes for intel_extension_for_transformers-1.0.1-cp310-cp310-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b8d96890a65322de423954236496af4b6833fa25bb0d8f129ad938922a71794b |
|
MD5 | 5d53f379f89dab54c4bd51ac6155ac28 |
|
BLAKE2b-256 | 27b79ec349293210a3b82775128d790ccc6d006d96648a2a014d25b41c4055c3 |
Hashes for intel_extension_for_transformers-1.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 11d3337bdeda722abfd1578049bd3599363e8ab29096a79006810a86789afcd2 |
|
MD5 | c868b019c38a0b1b12e09f1bae405b0a |
|
BLAKE2b-256 | f94e64890e2594cc427c91eb65e352d8a22f3a19966816a52ac87becc76eaf30 |
Hashes for intel_extension_for_transformers-1.0.1-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 447407b5b0cdec51a111211830153186098cef22c8c302f6417258a3086e5a69 |
|
MD5 | 474d54e7739b383ed8cc117a562ae9b9 |
|
BLAKE2b-256 | 844c6e5786237ecc93461e9ea9c4123ce704d9eead19b00e1711877bdf3d83de |
Hashes for intel_extension_for_transformers-1.0.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a6accded98b928c456e0453997b4e057f40a766bb53ca66b03fa73f36da26f4e |
|
MD5 | 1918c5067a2ee04afb0659f25d9cf581 |
|
BLAKE2b-256 | 188ae68189a8e1c9e0c61c757e6a79b21a69f6fa1ff5f72fdf9e827f7c315662 |
Hashes for intel_extension_for_transformers-1.0.1-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8efe2fa933b229cb1bda676310901cc0a81cae372ecad778e74eea3913fee956 |
|
MD5 | bb63e2df9f032fa5bc9c4ce9de482d3b |
|
BLAKE2b-256 | 2c89411271d881c4c07107d402b71f66de0d3acda977f822c538c4f75228f1cc |
Hashes for intel_extension_for_transformers-1.0.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8ab736c106c3ccd41d7d4fc061e4d5c108f013726e83b559f93c33ffc36db47c |
|
MD5 | 36378acdf29fa5edf81a60f153423d2e |
|
BLAKE2b-256 | a450a2452d0218c5fed54e2a7a896b6c78e502ba87db822a2a8c7d2e15025bc1 |
Hashes for intel_extension_for_transformers-1.0.1-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 79f7a442fb9feb67613f3427b8a3bf80eacbdc9b612885766d07300ff08956cb |
|
MD5 | 0a420b7def0f101c34bf4096e41a0aa0 |
|
BLAKE2b-256 | 2b992582d97d2c24dce4b1e4516498d1bec9e4d9c37a8a6edbd2046b71d100f3 |
Hashes for intel_extension_for_transformers-1.0.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | dd98e40f340e31de6913cec0dee4b65add93135bc87ed246e7450b29288f5daf |
|
MD5 | e4e2cd05a755a0d62c81a9793c68ecd3 |
|
BLAKE2b-256 | 3330a4a8a27035adcd69d176416201bbc23cff6c7966630b1d31d99ff932c45f |