Repository of Intel® Intel Extension for Transformers
Project description
Intel® Extension for Transformers
An innovative toolkit to accelerate Transformer-based models on Intel platforms
Architecture | NeuralChat | Inference | Examples | Documentations
🚀 Latest News
NeuralChat, a customizable chatbot framework under Intel® Extension for Transformers, is now available for you to create your own chatbot within minutes on multiple architectures.
NeuralChat offers a rich set of plugins to allow your personalized chatbot smarter with knowledge retrieval, more interactive through speech, faster through query caching, and more secure with guardrails.
- [Plugins] Knowledge Retrieval, Speech Interaction, Query Caching, Security Guardrail
- [Architectures] Intel® Xeon® Scalable Processors, Habana Gaudi® Accelerator, and others
Check out the below sample code and have a try now!
# follow the installation instructions
from intel_extension_for_transformers.neural_chat import build_chatbot
chatbot = build_chatbot()
response = chatbot.predict("Tell me about Intel Xeon Scalable Processors.")
Intel® Extension for Transformers is an innovative toolkit to accelerate Transformer-based models on Intel platforms, in particular effective on 4th Intel Xeon Scalable processor Sapphire Rapids (codenamed Sapphire Rapids). The toolkit provides the below key features and examples:
-
Seamless user experience of model compressions on Transformer-based models by extending Hugging Face transformers APIs and leveraging Intel® Neural Compressor
-
Advanced software optimizations and unique compression-aware runtime (released with NeurIPS 2022's paper Fast Distilbert on CPUs and QuaLA-MiniLM: a Quantized Length Adaptive MiniLM, and NeurIPS 2021's paper Prune Once for All: Sparse Pre-Trained Language Models)
-
Optimized Transformer-based model packages such as Stable Diffusion, GPT-J-6B, GPT-NEOX, BLOOM-176B, T5, Flan-T5 and end-to-end workflows such as SetFit-based text classification and document level sentiment analysis (DLSA)
-
NeuralChat, a custom Chatbot trained on Intel CPUs through parameter-efficient fine-tuning PEFT on domain knowledge
-
Inference of Large Language Model (LLM) in pure C/C++ with weight-only quantization kernels. It already enabled GPT-NEOX, LLAMA-7B, MPT-7B and FALCON-7B
Installation
Install from Pypi
pip install intel-extension-for-transformers
For more installation method, please refer to Installation Page
Getting Started
Sentiment Analysis with Quantization
Prepare Dataset
from datasets import load_dataset, load_metric
from transformers import AutoConfig,AutoModelForSequenceClassification,AutoTokenizer
raw_datasets = load_dataset("glue", "sst2")
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english")
raw_datasets = raw_datasets.map(lambda e: tokenizer(e['sentence'], truncation=True, padding='max_length', max_length=128), batched=True)
Quantization
from intel_extension_for_transformers.transformers import QuantizationConfig, metrics, objectives
from intel_extension_for_transformers.transformers.trainer import NLPTrainer
config = AutoConfig.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english",num_labels=2)
model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english",config=config)
model.config.label2id = {0: 0, 1: 1}
model.config.id2label = {0: 'NEGATIVE', 1: 'POSITIVE'}
# Replace transformers.Trainer with NLPTrainer
# trainer = transformers.Trainer(...)
trainer = NLPTrainer(model=model,
train_dataset=raw_datasets["train"],
eval_dataset=raw_datasets["validation"],
tokenizer=tokenizer
)
q_config = QuantizationConfig(metrics=[metrics.Metric(name="eval_loss", greater_is_better=False)])
model = trainer.quantize(quant_config=q_config)
input = tokenizer("I like Intel Extension for Transformers", return_tensors="pt")
output = model(**input).logits.argmax().item()
For more quick samples, please refer to Get Started Page. For more validated examples, please refer to Support Model Matrix
Validated Performance
Model | FP32 | BF16 | INT8 |
---|---|---|---|
EleutherAI/gpt-j-6B | 4163.67 (ms) | 1879.61 (ms) | 1612.24 (ms) |
CompVis/stable-diffusion-v1-4 | 10.33 (s) | 3.02 (s) | N/A |
Note*: GPT-J-6B software/hardware configuration please refer to text-generation. Stable-diffusion software/hardware configuration please refer to text-to-image
Documentation
OVERVIEW | |||||||
---|---|---|---|---|---|---|---|
Model Compression | NeuralChat | Neural Engine | Kernel Libraries | ||||
MODEL COMPRESSION | |||||||
Quantization | Pruning | Distillation | Orchestration | ||||
Neural Architecture Search | Export | Metrics/Objectives | Pipeline | ||||
NEURAL ENGINE | |||||||
Model Compilation | Custom Pattern | Deployment | Profiling | ||||
KERNEL LIBRARIES | |||||||
Sparse GEMM Kernels | Custom INT8 Kernels | Profiling | Benchmark | ||||
ALGORITHMS | |||||||
Length Adaptive | Data Augmentation | ||||||
TUTORIALS AND RESULTS | |||||||
Tutorials | Supported Models | Model Performance | Kernel Performance |
Selected Publications/Events
- Blog published on Medium: Faster Stable Diffusion Inference with Intel Extension for Transformers (July 2023)
- Blog of Intel Developer News: The Moat Is Trust, Or Maybe Just Responsible AI (July 2023)
- Blog of Intel Developer News: Create Your Own Custom Chatbot (July 2023)
- Blog of Intel Developer News: Accelerate Llama 2 with Intel AI Hardware and Software Optimizations (July 2023)
- Arxiv: An Efficient Sparse Inference Software Accelerator for Transformer-based Language Models on CPUs (June 2023)
- Blog published on Medium: Simplify Your Custom Chatbot Deployment (June 2023)
- Blog published on Medium: Create Your Own Custom Chatbot (April 2023)
View Full Publication List.
Additional Content
Collaborations
Welcome to raise any interesting ideas on model compression techniques and LLM-based chatbot development! Feel free to reach us and look forward to our collaborations on Intel Extension for Transformers!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for intel_extension_for_transformers-1.1.1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | d0b419c7619fd0964fb5ef4f1dc7494dab134bb8cfc2ff46f0170f1f01419bed |
|
MD5 | 31b04622e1f2abac89b3c6e15aaeac7c |
|
BLAKE2b-256 | 8ab320205fc9c4df5e3fb8bc00971c2d5e38d5a35a7e7f126c537afa0f0000c1 |
Hashes for intel_extension_for_transformers-1.1.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 15f440270d2c3d4aef28c2423ae4b6cb22863af0aba814b9b23458e1a51b26e0 |
|
MD5 | bb84bd605666c59ff7abe0f0f912cee2 |
|
BLAKE2b-256 | 35229f05fe548d25d4513a7fd7f2c8f12c3c65da3bf6f99905bf8f5bff46a213 |
Hashes for intel_extension_for_transformers-1.1.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e5b3d85edba97c051d7975a8990b3cb71f8a05574752caa46dfd6d33bc22c192 |
|
MD5 | ac54dc558bb0ce3b85842ed49b54ec35 |
|
BLAKE2b-256 | f4877a81b73360679bd92e360155a9f70dddb0dd240ce99f611eca3753f955c5 |
Hashes for intel_extension_for_transformers-1.1.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 001046307d80aabb51878ccb30a3b83db9b7e34fdcb95c1a208e79f59ea005da |
|
MD5 | e3a9b7cc9166eb13811e946f6b4e8238 |
|
BLAKE2b-256 | 1d9c87137e9091db8fbd0b28507f755b269f14b1d3e7df543358e6be94984dde |