Skip to main content

Repository of Intel® Intel Extension for Transformers

Project description

Intel® Extension for Transformers

An innovative toolkit to accelerate Transformer-based models on Intel platforms

Architecture   |   NeuralChat   |   Inference   |   Examples   |   Documentations

🚀 Latest News

NeuralChat, a customizable chatbot framework under Intel® Extension for Transformers, is now available for you to create your own chatbot within minutes on multiple architectures.

NeuralChat offers a rich set of plugins to allow your personalized chatbot smarter with knowledge retrieval, more interactive through speech, faster through query caching, and more secure with guardrails.

Check out the below sample code and have a try now!

# follow the installation instructions
from intel_extension_for_transformers.neural_chat import build_chatbot
chatbot = build_chatbot()
response = chatbot.predict("Tell me about Intel Xeon Scalable Processors.")

Intel® Extension for Transformers is an innovative toolkit to accelerate Transformer-based models on Intel platforms, in particular effective on 4th Intel Xeon Scalable processor Sapphire Rapids (codenamed Sapphire Rapids). The toolkit provides the below key features and examples:

Installation

Install from Pypi

pip install intel-extension-for-transformers

For more installation method, please refer to Installation Page

Getting Started

Sentiment Analysis with Quantization

Prepare Dataset

from datasets import load_dataset, load_metric
from transformers import AutoConfig,AutoModelForSequenceClassification,AutoTokenizer

raw_datasets = load_dataset("glue", "sst2")
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english")
raw_datasets = raw_datasets.map(lambda e: tokenizer(e['sentence'], truncation=True, padding='max_length', max_length=128), batched=True)

Quantization

from intel_extension_for_transformers.transformers import QuantizationConfig, metrics, objectives
from intel_extension_for_transformers.transformers.trainer import NLPTrainer

config = AutoConfig.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english",num_labels=2)
model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english",config=config)
model.config.label2id = {0: 0, 1: 1}
model.config.id2label = {0: 'NEGATIVE', 1: 'POSITIVE'}
# Replace transformers.Trainer with NLPTrainer
# trainer = transformers.Trainer(...)
trainer = NLPTrainer(model=model, 
    train_dataset=raw_datasets["train"], 
    eval_dataset=raw_datasets["validation"],
    tokenizer=tokenizer
)
q_config = QuantizationConfig(metrics=[metrics.Metric(name="eval_loss", greater_is_better=False)])
model = trainer.quantize(quant_config=q_config)

input = tokenizer("I like Intel Extension for Transformers", return_tensors="pt")
output = model(**input).logits.argmax().item()

For more quick samples, please refer to Get Started Page. For more validated examples, please refer to Support Model Matrix

Validated Performance

Model FP32 BF16 INT8
EleutherAI/gpt-j-6B 4163.67 (ms) 1879.61 (ms) 1612.24 (ms)
CompVis/stable-diffusion-v1-4 10.33 (s) 3.02 (s) N/A

Note*: GPT-J-6B software/hardware configuration please refer to text-generation. Stable-diffusion software/hardware configuration please refer to text-to-image

Documentation

OVERVIEW
Model Compression NeuralChat Neural Engine Kernel Libraries
MODEL COMPRESSION
Quantization Pruning Distillation Orchestration
Neural Architecture Search Export Metrics/Objectives Pipeline
NEURAL ENGINE
Model Compilation Custom Pattern Deployment Profiling
KERNEL LIBRARIES
Sparse GEMM Kernels Custom INT8 Kernels Profiling Benchmark
ALGORITHMS
Length Adaptive Data Augmentation
TUTORIALS AND RESULTS
Tutorials Supported Models Model Performance Kernel Performance

Selected Publications/Events

View Full Publication List.

Additional Content

Collaborations

Welcome to raise any interesting ideas on model compression techniques and LLM-based chatbot development! Feel free to reach us and look forward to our collaborations on Intel Extension for Transformers!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Built Distributions

intel_extension_for_transformers-1.1.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (77.6 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

intel_extension_for_transformers-1.1.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (77.6 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

intel_extension_for_transformers-1.1.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (77.6 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

File details

Details for the file intel_extension_for_transformers-1.1.1.tar.gz.

File metadata

File hashes

Hashes for intel_extension_for_transformers-1.1.1.tar.gz
Algorithm Hash digest
SHA256 d0b419c7619fd0964fb5ef4f1dc7494dab134bb8cfc2ff46f0170f1f01419bed
MD5 31b04622e1f2abac89b3c6e15aaeac7c
BLAKE2b-256 8ab320205fc9c4df5e3fb8bc00971c2d5e38d5a35a7e7f126c537afa0f0000c1

See more details on using hashes here.

File details

Details for the file intel_extension_for_transformers-1.1.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for intel_extension_for_transformers-1.1.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 15f440270d2c3d4aef28c2423ae4b6cb22863af0aba814b9b23458e1a51b26e0
MD5 bb84bd605666c59ff7abe0f0f912cee2
BLAKE2b-256 35229f05fe548d25d4513a7fd7f2c8f12c3c65da3bf6f99905bf8f5bff46a213

See more details on using hashes here.

File details

Details for the file intel_extension_for_transformers-1.1.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for intel_extension_for_transformers-1.1.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 e5b3d85edba97c051d7975a8990b3cb71f8a05574752caa46dfd6d33bc22c192
MD5 ac54dc558bb0ce3b85842ed49b54ec35
BLAKE2b-256 f4877a81b73360679bd92e360155a9f70dddb0dd240ce99f611eca3753f955c5

See more details on using hashes here.

File details

Details for the file intel_extension_for_transformers-1.1.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for intel_extension_for_transformers-1.1.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 001046307d80aabb51878ccb30a3b83db9b7e34fdcb95c1a208e79f59ea005da
MD5 e3a9b7cc9166eb13811e946f6b4e8238
BLAKE2b-256 1d9c87137e9091db8fbd0b28507f755b269f14b1d3e7df543358e6be94984dde

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page