Repository of Intel® Intel Extension for Transformers

These details have not been verified by PyPI

Project links

Homepage

Project description

Intel® Extension for Transformers

An Innovative Transformer-based Toolkit to Accelerate GenAI/LLM Everywhere

🏭Architecture | 💬NeuralChat | 😃Inference | 💻Examples | 📖Documentations

🚀Latest News

NeuralChat has been showcased in Intel Innovation’23 Keynote and Google Cloud Next'23 to demonstrate GenAI/LLM capabilities on Intel Xeon Scalable Processors.
NeuralChat supports custom chatbot development and deployment on broad Intel HWs such as Xeon Scalable Processors, Gaudi2, Xeon CPU Max Series, Data Center GPU Max Series, Arc Series, and Core Processors. Check out Notebooks and below sample code.

# pip install intel-extension-for-transformers
from intel_extension_for_transformers.neural_chat import build_chatbot
chatbot = build_chatbot()
response = chatbot.predict("Tell me about Intel Xeon Scalable Processors.")

LLM runtime extends Hugging Face Transformers API to provide seamless low precision inference for popular LLMs, supporting mainstream low precision data types such as INT8/FP8/INT4/FP4/NF4.

🏃Installation

Quick Install from Pypi

pip install intel-extension-for-transformers

For more installation methods, please refer to Installation Page

🌟Introduction

Intel® Extension for Transformers is an innovative toolkit to accelerate Transformer-based models on Intel platforms, in particular effective on 4th Intel Xeon Scalable processor Sapphire Rapids (codenamed Sapphire Rapids). The toolkit provides the below key features and examples:

Seamless user experience of model compressions on Transformer-based models by extending Hugging Face transformers APIs and leveraging Intel® Neural Compressor
Advanced software optimizations and unique compression-aware runtime (released with NeurIPS 2022's paper Fast Distilbert on CPUs and QuaLA-MiniLM: a Quantized Length Adaptive MiniLM, and NeurIPS 2021's paper Prune Once for All: Sparse Pre-Trained Language Models)
Optimized Transformer-based model packages such as Stable Diffusion, GPT-J-6B, GPT-NEOX, BLOOM-176B, T5, Flan-T5 and end-to-end workflows such as SetFit-based text classification and document level sentiment analysis (DLSA)
NeuralChat, a customizable chatbot framework to create your own chatbot within minutes by leveraging a rich set of plugins Knowledge Retrieval, Speech Interaction, Query Caching, Security Guardrail.
Inference of Large Language Model (LLM) in pure C/C++ with weight-only quantization kernels, supporting GPT-NEOX, LLAMA, MPT, FALCON, BLOOM-7B, OPT, ChatGLM2-6B, GPT-J-6B and Dolly-v2-3B

🌱Getting Started

Below are the sample code to enable weight-only low precision inference. See more examples.

INT4 Inference

from transformers import AutoTokenizer
from intel_extension_for_transformers.transformers import AutoModel, WeightOnlyQuantConfig

model_name = "EleutherAI/gpt-j-6B"
config = WeightOnlyQuantConfig(compute_dtype="int8", weight_dtype="int4")
prompt = "Once upon a time, a little girl"

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
inputs = tokenizer(prompt, return_tensors="pt").input_ids

model = AutoModel.from_pretrained(model_name, quantization_config=config)
gen_tokens = model.generate(inputs, max_new_tokens=300)
gen_text = tokenizer.batch_decode(gen_tokens)

INT8 Inference

from transformers import AutoTokenizer
from intel_extension_for_transformers.transformers import AutoModel, WeightOnlyQuantConfig

model_name = "EleutherAI/gpt-j-6B" 
config = WeightOnlyQuantConfig(compute_dtype="bf16", weight_dtype="int8")
prompt = "Once upon a time, a little girl"

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
inputs = tokenizer(prompt, return_tensors="pt").input_ids

model = AutoModel.from_pretrained(model_name, quantization_config=config)
gen_tokens = model.generate(inputs, max_new_tokens=300)
gen_text = tokenizer.batch_decode(gen_tokens)

🎯Validated Models

Here is the average accuracy of validated models on Lambada (OpenAI), HellaSwag, Winogrande, PIQA, and WikiText. The next token latency is based on 32 input tokens and greedy search on Intel's 4th Generation Xeon Scalable Sapphire Rapids processor.

Model	FP32	INT4 (Group size 32)	INT4 (Group size 128)	Next Token Latency
EleutherAI/gpt-j-6B	0.643	0.644	0.64	21.98ms
meta-llama/Llama-2-7b-hf	0.69	0.69	0.685	24.55ms
decapoda-research/llama-7b-hf	0.689	0.682	0.68	24.84ms
EleutherAI/gpt-neox-20b	0.674	0.672	0.669	80.16ms
mosaicml/mpt-7b-chat	0.672	0.67	0.666	35.84ms
tiiuae/falcon-7b	0.698	0.694	0.693	36.1ms
baichuan-inc/baichuan-7B	0.474	0.471	0.47	Coming Soon
facebook/opt-6.7b	0.65	0.647	0.643	Coming Soon
databricks/dolly-v2-3b	0.613	0.609	0.609	22.02ms
tiiuae/falcon-40b-instruct	0.756	0.757	0.755	Coming Soon

Find other models like ChatGLM, ChatGLM2, StarCoder... in LLM Runtime

📖Documentation

OVERVIEW
Model Compression	NeuralChat	Neural Engine	Kernel Libraries
MODEL COMPRESSION
Quantization	Pruning	Distillation	Orchestration
Neural Architecture Search	Export	Metrics/Objectives	Pipeline
NEURAL ENGINE
Model Compilation	Custom Pattern	Deployment	Profiling
KERNEL LIBRARIES
Sparse GEMM Kernels	Custom INT8 Kernels	Profiling	Benchmark
ALGORITHMS
Length Adaptive		Data Augmentation
TUTORIALS AND RESULTS
Tutorials	Supported Models	Model Performance	Kernel Performance

📃Selected Publications/Events

Intel Innovation'23 Keynote: Intel Innovation 2023 Keynote by Greg Lavender (Sep 2023)
Blog on Intel Community: NeuralChat: A Customizable Chatbot Framework (Sep 2023)
Blog published on Medium: NeuralChat: A Customizable Chatbot Framework (Sep 2023)
Blog published on Medium: Faster Stable Diffusion Inference with Intel Extension for Transformers (July 2023)
Blog of Intel Developer News: The Moat Is Trust, Or Maybe Just Responsible AI (July 2023)
Blog of Intel Developer News: Create Your Own Custom Chatbot (July 2023)
Blog of Intel Developer News: Accelerate Llama 2 with Intel AI Hardware and Software Optimizations (July 2023)
Arxiv: An Efficient Sparse Inference Software Accelerator for Transformer-based Language Models on CPUs (June 2023)
Blog published on Medium: Simplify Your Custom Chatbot Deployment (June 2023)

View Full Publication List.

Additional Content

Acknowledgements

Excellent open-source projects: bitsandbytes, FastChat, fastRAG, ggml, gptq, llama.cpp, lm-evauation-harness, peft, trl, and many others.

💁Collaborations

Welcome to raise any interesting ideas on model compression techniques and LLM-based chatbot development! Feel free to reach us and look forward to our collaborations on Intel Extension for Transformers!

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

1.4.2

May 24, 2024

1.4.1

Apr 21, 2024

1.4

Apr 3, 2024

1.3.2

Feb 24, 2024

1.3.1

Jan 19, 2024

1.3

Dec 22, 2023

1.2.2

Dec 6, 2023

1.2.1

Nov 8, 2023

This version

1.2

Sep 26, 2023

1.1.1

Sep 6, 2023

1.1

Jul 14, 2023

1.0.1

Jun 2, 2023

1.0

Apr 4, 2023

1.0b0 pre-release

Dec 11, 2022

1.0a0 pre-release

Nov 23, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

intel-extension-for-transformers-1.2.tar.gz (73.3 MB view details)

Uploaded Sep 26, 2023 Source

Built Distributions

intel_extension_for_transformers-1.2-cp310-cp310-win_amd64.whl (21.5 MB view details)

Uploaded Sep 26, 2023 CPython 3.10Windows x86-64

intel_extension_for_transformers-1.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (74.0 MB view details)

Uploaded Sep 26, 2023 CPython 3.10manylinux: glibc 2.17+ x86-64

intel_extension_for_transformers-1.2-cp39-cp39-win_amd64.whl (21.5 MB view details)

Uploaded Sep 26, 2023 CPython 3.9Windows x86-64

intel_extension_for_transformers-1.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (74.0 MB view details)

Uploaded Sep 26, 2023 CPython 3.9manylinux: glibc 2.17+ x86-64

intel_extension_for_transformers-1.2-cp38-cp38-win_amd64.whl (21.5 MB view details)

Uploaded Sep 26, 2023 CPython 3.8Windows x86-64

intel_extension_for_transformers-1.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (74.0 MB view details)

Uploaded Sep 26, 2023 CPython 3.8manylinux: glibc 2.17+ x86-64

File details

Details for the file intel-extension-for-transformers-1.2.tar.gz.

File metadata

Download URL: intel-extension-for-transformers-1.2.tar.gz
Upload date: Sep 26, 2023
Size: 73.3 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for intel-extension-for-transformers-1.2.tar.gz
Algorithm	Hash digest
SHA256	`5ab3589039733492c65427bab9bf08dc0bd0e5915a81d83a848d4b89ed6ecbb4`
MD5	`b2f1da1156846b971ba560cd5496b813`
BLAKE2b-256	`828478eab558cbeba5e5490867e38e267a154592bd435ab8bd967323a7b60f13`

See more details on using hashes here.

File details

Details for the file intel_extension_for_transformers-1.2-cp310-cp310-win_amd64.whl.

File metadata

Download URL: intel_extension_for_transformers-1.2-cp310-cp310-win_amd64.whl
Upload date: Sep 26, 2023
Size: 21.5 MB
Tags: CPython 3.10, Windows x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for intel_extension_for_transformers-1.2-cp310-cp310-win_amd64.whl
Algorithm	Hash digest
SHA256	`0d74aff0a8d1f90ee61ff808409f541c8b9880b41ad7d1df31075570502ca34a`
MD5	`55596f962aaa2d460e8f99577a2373cb`
BLAKE2b-256	`ac55f68a7ea18d74e3a162b5d172d5c92d56b587d9b32c543e8ac8b6091afd6a`

See more details on using hashes here.

File details

Details for the file intel_extension_for_transformers-1.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

Download URL: intel_extension_for_transformers-1.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Upload date: Sep 26, 2023
Size: 74.0 MB
Tags: CPython 3.10, manylinux: glibc 2.17+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for intel_extension_for_transformers-1.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm	Hash digest
SHA256	`e8dd9772665696072b78b829848649bc3728e7d6c95411ea535b6dea5630db0b`
MD5	`61204234a88c518ea6aaa08218139648`
BLAKE2b-256	`1615c46218743d794604670263360c99a82ecca2b34542e8646378c29b4df799`

See more details on using hashes here.

File details

Details for the file intel_extension_for_transformers-1.2-cp39-cp39-win_amd64.whl.

File metadata

Download URL: intel_extension_for_transformers-1.2-cp39-cp39-win_amd64.whl
Upload date: Sep 26, 2023
Size: 21.5 MB
Tags: CPython 3.9, Windows x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for intel_extension_for_transformers-1.2-cp39-cp39-win_amd64.whl
Algorithm	Hash digest
SHA256	`5f169308921397bf2a614ffc491324ce12db397f7b5d2185d39fefce2f3add53`
MD5	`124bbdc49b15865cae9b5d5d319cd6b4`
BLAKE2b-256	`d4c2f340f0b2f22df011c864509e1cd4183c1fb154a2167824cdd34fd39a2ec3`

See more details on using hashes here.

File details

Details for the file intel_extension_for_transformers-1.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

Download URL: intel_extension_for_transformers-1.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Upload date: Sep 26, 2023
Size: 74.0 MB
Tags: CPython 3.9, manylinux: glibc 2.17+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for intel_extension_for_transformers-1.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm	Hash digest
SHA256	`ea595ebebb72e48944e6a5ccad419efdbc6652bbb5f8d281cf719724b1d94c81`
MD5	`69a56194d2b5286b17d335f6eea13475`
BLAKE2b-256	`449e643d532c6c2277eddb765f32d5da5a3622ff16077339577e7704a08e8119`

See more details on using hashes here.

File details

Details for the file intel_extension_for_transformers-1.2-cp38-cp38-win_amd64.whl.

File metadata

Download URL: intel_extension_for_transformers-1.2-cp38-cp38-win_amd64.whl
Upload date: Sep 26, 2023
Size: 21.5 MB
Tags: CPython 3.8, Windows x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for intel_extension_for_transformers-1.2-cp38-cp38-win_amd64.whl
Algorithm	Hash digest
SHA256	`28d0ff9814d4a0a432ee2cc08b777d7d5d8cf84fb3b664f3b67408ad4ef591ad`
MD5	`26c065d5e836e51b52d81df9559f9a0e`
BLAKE2b-256	`c332a85e0608e7cb59d532990771d482d3190686d15a10c92819725fe09811b9`

See more details on using hashes here.

File details

Details for the file intel_extension_for_transformers-1.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

Download URL: intel_extension_for_transformers-1.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Upload date: Sep 26, 2023
Size: 74.0 MB
Tags: CPython 3.8, manylinux: glibc 2.17+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for intel_extension_for_transformers-1.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm	Hash digest
SHA256	`3e60f07424b079d01aad818ff1238d547613fbdd9e7298a5ac3aa252a2fd81bb`
MD5	`a2e6ad7a72ac1eadeecad4513d98dbac`
BLAKE2b-256	`fe80e01bcfd7ce587122223bcfc4e912505d38c39ec7d1372d388b6c7e81046f`

See more details on using hashes here.

intel-extension-for-transformers 1.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Intel® Extension for Transformers

An Innovative Transformer-based Toolkit to Accelerate GenAI/LLM Everywhere

🚀Latest News

🏃Installation

Quick Install from Pypi

🌟Introduction

🌱Getting Started

INT4 Inference

INT8 Inference

🎯Validated Models

📖Documentation

📃Selected Publications/Events

Additional Content

Acknowledgements

💁Collaborations

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distributions

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes