Repository of Intel® Intel Extension for Transformers
Project description
Intel® Extension for Transformers
An innovative toolkit to accelerate Transformer-based models on Intel platforms
Architecture | NeuralChat | Inference | Examples | Documentations
🚀 Latest News
NeuralChat, a customizable chatbot framework under Intel® Extension for Transformers, is now available for you to create your own chatbot within minutes on multiple architectures.
NeuralChat offers a rich set of plugins to allow your personalized chatbot smarter with knowledge retrieval, more interactive through speech, faster through query caching, and more secure with guardrails.
- [Plugins] Knowledge Retrieval, Speech Interaction, Query Caching, Security Guardrail
- [Architectures] Intel® Xeon® Scalable Processors, Habana Gaudi® Accelerator, and others
Check out the below sample code and have a try now!
# follow the installation instructions
from intel_extension_for_transformers.neural_chat import build_chatbot
chatbot = build_chatbot()
response = chatbot.predict("Tell me about Intel Xeon Scalable Processors.")
Intel® Extension for Transformers is an innovative toolkit to accelerate Transformer-based models on Intel platforms, in particular effective on 4th Intel Xeon Scalable processor Sapphire Rapids (codenamed Sapphire Rapids). The toolkit provides the below key features and examples:
-
Seamless user experience of model compressions on Transformer-based models by extending Hugging Face transformers APIs and leveraging Intel® Neural Compressor
-
Advanced software optimizations and unique compression-aware runtime (released with NeurIPS 2022's paper Fast Distilbert on CPUs and QuaLA-MiniLM: a Quantized Length Adaptive MiniLM, and NeurIPS 2021's paper Prune Once for All: Sparse Pre-Trained Language Models)
-
Optimized Transformer-based model packages such as Stable Diffusion, GPT-J-6B, GPT-NEOX, BLOOM-176B, T5, Flan-T5 and end-to-end workflows such as SetFit-based text classification and document level sentiment analysis (DLSA)
-
NeuralChat, a custom Chatbot trained on Intel CPUs through parameter-efficient fine-tuning PEFT on domain knowledge
-
Inference of Large Language Model (LLM) in pure C/C++ with weight-only quantization kernels. It already enabled GPT-NEOX, LLAMA-7B, MPT-7B and FALCON-7B
Installation
Install from Pypi
pip install intel-extension-for-transformers
For more installation method, please refer to Installation Page
Getting Started
Sentiment Analysis with Quantization
Prepare Dataset
from datasets import load_dataset, load_metric
from transformers import AutoConfig,AutoModelForSequenceClassification,AutoTokenizer
raw_datasets = load_dataset("glue", "sst2")
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english")
raw_datasets = raw_datasets.map(lambda e: tokenizer(e['sentence'], truncation=True, padding='max_length', max_length=128), batched=True)
Quantization
from intel_extension_for_transformers.transformers import QuantizationConfig, metrics, objectives
from intel_extension_for_transformers.transformers.trainer import NLPTrainer
config = AutoConfig.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english",num_labels=2)
model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english",config=config)
model.config.label2id = {0: 0, 1: 1}
model.config.id2label = {0: 'NEGATIVE', 1: 'POSITIVE'}
# Replace transformers.Trainer with NLPTrainer
# trainer = transformers.Trainer(...)
trainer = NLPTrainer(model=model,
train_dataset=raw_datasets["train"],
eval_dataset=raw_datasets["validation"],
tokenizer=tokenizer
)
q_config = QuantizationConfig(metrics=[metrics.Metric(name="eval_loss", greater_is_better=False)])
model = trainer.quantize(quant_config=q_config)
input = tokenizer("I like Intel Extension for Transformers", return_tensors="pt")
output = model(**input).logits.argmax().item()
For more quick samples, please refer to Get Started Page. For more validated examples, please refer to Support Model Matrix
Validated Performance
Model | FP32 | BF16 | INT8 |
---|---|---|---|
EleutherAI/gpt-j-6B | 4163.67 (ms) | 1879.61 (ms) | 1612.24 (ms) |
CompVis/stable-diffusion-v1-4 | 10.33 (s) | 3.02 (s) | N/A |
Note*: GPT-J-6B software/hardware configuration please refer to text-generation. Stable-diffusion software/hardware configuration please refer to text-to-image
Documentation
OVERVIEW | |||||||
---|---|---|---|---|---|---|---|
Model Compression | NeuralChat | Neural Engine | Kernel Libraries | ||||
MODEL COMPRESSION | |||||||
Quantization | Pruning | Distillation | Orchestration | ||||
Neural Architecture Search | Export | Metrics/Objectives | Pipeline | ||||
NEURAL ENGINE | |||||||
Model Compilation | Custom Pattern | Deployment | Profiling | ||||
KERNEL LIBRARIES | |||||||
Sparse GEMM Kernels | Custom INT8 Kernels | Profiling | Benchmark | ||||
ALGORITHMS | |||||||
Length Adaptive | Data Augmentation | ||||||
TUTORIALS AND RESULTS | |||||||
Tutorials | Supported Models | Model Performance | Kernel Performance |
Selected Publications/Events
- Blog published on Medium: Faster Stable Diffusion Inference with Intel Extension for Transformers (July 2023)
- Blog of Intel Developer News: The Moat Is Trust, Or Maybe Just Responsible AI (July 2023)
- Blog of Intel Developer News: Create Your Own Custom Chatbot (July 2023)
- Blog of Intel Developer News: Accelerate Llama 2 with Intel AI Hardware and Software Optimizations (July 2023)
- Arxiv: An Efficient Sparse Inference Software Accelerator for Transformer-based Language Models on CPUs (June 2023)
- Blog published on Medium: Simplify Your Custom Chatbot Deployment (June 2023)
- Blog published on Medium: Create Your Own Custom Chatbot (April 2023)
View Full Publication List.
Additional Content
Collaborations
Welcome to raise any interesting ideas on model compression techniques and LLM-based chatbot development! Feel free to reach us and look forward to our collaborations on Intel Extension for Transformers!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
File details
Details for the file intel_extension_for_transformers-1.1.1.tar.gz
.
File metadata
- Download URL: intel_extension_for_transformers-1.1.1.tar.gz
- Upload date:
- Size: 65.8 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.17
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d0b419c7619fd0964fb5ef4f1dc7494dab134bb8cfc2ff46f0170f1f01419bed |
|
MD5 | 31b04622e1f2abac89b3c6e15aaeac7c |
|
BLAKE2b-256 | 8ab320205fc9c4df5e3fb8bc00971c2d5e38d5a35a7e7f126c537afa0f0000c1 |
File details
Details for the file intel_extension_for_transformers-1.1.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: intel_extension_for_transformers-1.1.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 77.6 MB
- Tags: CPython 3.10, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.17
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 15f440270d2c3d4aef28c2423ae4b6cb22863af0aba814b9b23458e1a51b26e0 |
|
MD5 | bb84bd605666c59ff7abe0f0f912cee2 |
|
BLAKE2b-256 | 35229f05fe548d25d4513a7fd7f2c8f12c3c65da3bf6f99905bf8f5bff46a213 |
File details
Details for the file intel_extension_for_transformers-1.1.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: intel_extension_for_transformers-1.1.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 77.6 MB
- Tags: CPython 3.9, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.17
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e5b3d85edba97c051d7975a8990b3cb71f8a05574752caa46dfd6d33bc22c192 |
|
MD5 | ac54dc558bb0ce3b85842ed49b54ec35 |
|
BLAKE2b-256 | f4877a81b73360679bd92e360155a9f70dddb0dd240ce99f611eca3753f955c5 |
File details
Details for the file intel_extension_for_transformers-1.1.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: intel_extension_for_transformers-1.1.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 77.6 MB
- Tags: CPython 3.8, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.17
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 001046307d80aabb51878ccb30a3b83db9b7e34fdcb95c1a208e79f59ea005da |
|
MD5 | e3a9b7cc9166eb13811e946f6b4e8238 |
|
BLAKE2b-256 | 1d9c87137e9091db8fbd0b28507f755b269f14b1d3e7df543358e6be94984dde |