Intel® NPU Acceleration Library

These details have not been verified by PyPI

Project links

Homepage

Project description

Intel® NPU Acceleration Library

The Intel® NPU Acceleration Library is a Python library designed to boost the efficiency of your applications by leveraging the power of the Intel Neural Processing Unit (NPU) to perform high-speed computations on compatible hardware.

_Note: The Intel® NPU Acceleration Library is currently in active development, with our team working to introduce a variety of features that are anticipated to dramatically enhance performance. For performant production ready solutions please refer to like OpenVINO or DirectML. _

Intel NPU

The Intel NPU is an AI accelerator integrated into Intel Core Ultra processors, characterized by a unique architecture comprising compute acceleration and data transfer capabilities. Its compute acceleration is facilitated by Neural Compute Engines, which consist of hardware acceleration blocks for AI operations like Matrix Multiplication and Convolution, alongside Streaming Hybrid Architecture Vector Engines for general computing tasks.

To optimize performance, the NPU features DMA engines for efficient data transfers between system memory and a managed cache, supported by device MMU and IOMMU for security isolation. The NPU's software utilizes compiler technology to optimize AI workloads by directing compute and data flow in a tiled fashion, maximizing compute utilization primarily from scratchpad SRAM while minimizing data transfers between SRAM and DRAM for optimal performance and power efficiency.

Some useful links

Intel AI PC (link)
Intel Core Ultra Processor line (link)
AI Acceleration and NPU explained (video)

Feature roadmap

In our quest to significantly improve the library's performance, we are directing our efforts toward implementing a range of key features, including:

8-bit quantization
4-bit Quantization and GPTQ
NPU-Native mixed precision inference
Float16 support
BFloat16 (Brain Floating Point Format)
torch.compile support
LLM MLP horizontal fusion implementation
Static shape inference
MHA NPU inference
NPU/GPU hetero compute
Paper

Make sure to stay updated with the project's progress as these exciting enhancements are on the horizon. External contributions are very welcomed! If you want to participate in this library development, please check the Contributing guide, the developer guide and the list of open issues

Setup

Check that your system has an available NPU (how-to).

You can install the packet in your machine with

   pip install intel-npu-acceleration-library

You can also install the package on Windows and Linux from source by typing

pip install "intel-npu-acceleration-library @ git+https://github.com/intel/intel-npu-acceleration-library.git"

To build the package you need a compiler in your system (Visual Studio 2019 suggested for Windows build). MacOS is not yet supported. At the moment only Ubuntu OS is supported for Linux build. If you need a library for your specific OS, please open an issue

The library is intended to be used with Intel Core Ultra processors, which have an integrated NPU (Neural Processing Unit). For best performance please install/update the NPU drivers to the latest version. (Windows, Linux).

Usage

For implemented examples, please check the examples folder

Run a single MatMul in the NPU

from intel_npu_acceleration_library.backend import MatMul
import numpy as np

inC, outC, batch = ... # Define your own values

# Create both inputs
X1 = np.random.uniform(-1, 1, (batch, inC)).astype(np.float16)
X2 = np.random.uniform(-1, 1, (outC, inC)).astype(np.float16)

mm = MatMul(inC, outC, batch, profile=False)

result = mm.run(X1, X2)

Compile a model for the NPU

If you have pytorch>=2.0.0 installed you can use torch compile to optimize your model for the NPU

import intel_npu_acceleration_library
import torch

# Compile model for the NPU
# model a torch.nn.Module class. Model can be quantized JIT
optimized_model = torch.compile(model, backend="npu")

# Use the model as usual

In windows torch.compile is not supported yet. So you might want to use the explicit function intel_npu_acceleration_library.compile. This is true also if you use a pytorch version < 2.0.0

import intel_npu_acceleration_library
optimized_model = intel_npu_acceleration_library.compile(model, dtype=torch.int8)

# Use the model as usual

Run a Tiny-llama model on the NPU

from transformers import AutoTokenizer, TextStreamer
from intel_npu_acceleration_library import NPUModelForCausalLM
import torch

model_id = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"

model = NPUModelForCausalLM.from_pretrained(model_id, use_cache=True, dtype=torch.int8).eval()
tokenizer = AutoTokenizer.from_pretrained(model_id, use_default_system_prompt=True)
tokenizer.pad_token_id = tokenizer.eos_token_id
streamer = TextStreamer(tokenizer, skip_special_tokens=True)


query = input("Ask something: ")
prefix = tokenizer(query, return_tensors="pt")["input_ids"]


generation_kwargs = dict(
    input_ids=prefix,
    streamer=streamer,
    do_sample=True,
    top_k=50,
    top_p=0.9,
    max_new_tokens=512,
)

print("Run inference")
_ = model.generate(**generation_kwargs)

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

1.4.0

Nov 22, 2024

1.3.0

Jun 26, 2024

1.2.0

Jun 4, 2024

1.1.0

May 25, 2024

1.0.0

Feb 28, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

intel_npu_acceleration_library-1.4.0.tar.gz (70.0 kB view details)

Uploaded Nov 22, 2024 Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

intel_npu_acceleration_library-1.4.0-cp312-cp312-win_amd64.whl (41.1 MB view details)

Uploaded Nov 22, 2024 CPython 3.12Windows x86-64

intel_npu_acceleration_library-1.4.0-cp312-cp312-win32.whl (41.1 MB view details)

Uploaded Nov 22, 2024 CPython 3.12Windows x86

intel_npu_acceleration_library-1.4.0-cp311-cp311-win_amd64.whl (41.1 MB view details)

Uploaded Nov 22, 2024 CPython 3.11Windows x86-64

intel_npu_acceleration_library-1.4.0-cp311-cp311-win32.whl (41.1 MB view details)

Uploaded Nov 22, 2024 CPython 3.11Windows x86

intel_npu_acceleration_library-1.4.0-cp310-cp310-win_amd64.whl (41.1 MB view details)

Uploaded Nov 22, 2024 CPython 3.10Windows x86-64

intel_npu_acceleration_library-1.4.0-cp310-cp310-win32.whl (41.1 MB view details)

Uploaded Nov 22, 2024 CPython 3.10Windows x86

intel_npu_acceleration_library-1.4.0-cp39-cp39-win_amd64.whl (41.1 MB view details)

Uploaded Nov 22, 2024 CPython 3.9Windows x86-64

intel_npu_acceleration_library-1.4.0-cp39-cp39-win32.whl (41.1 MB view details)

Uploaded Nov 22, 2024 CPython 3.9Windows x86

intel_npu_acceleration_library-1.4.0-cp38-cp38-win_amd64.whl (41.1 MB view details)

Uploaded Nov 22, 2024 CPython 3.8Windows x86-64

intel_npu_acceleration_library-1.4.0-cp38-cp38-win32.whl (41.1 MB view details)

Uploaded Nov 22, 2024 CPython 3.8Windows x86

File details

Details for the file intel_npu_acceleration_library-1.4.0.tar.gz.

File metadata

Download URL: intel_npu_acceleration_library-1.4.0.tar.gz
Upload date: Nov 22, 2024
Size: 70.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.0.0 CPython/3.11.2

File hashes

Hashes for intel_npu_acceleration_library-1.4.0.tar.gz
Algorithm	Hash digest
SHA256	`85bc37169189a0bfdb536a74de454925a4750adc1000d29a6c87de1f1a3a8d7d`
MD5	`47ff3fb2946f27332afd06831141b6a6`
BLAKE2b-256	`b292cb0db9baaf1a21a98ef7a7f37a852836eab84da245c9f938db409a96b136`

See more details on using hashes here.

File details

Details for the file intel_npu_acceleration_library-1.4.0-cp312-cp312-win_amd64.whl.

File metadata

Download URL: intel_npu_acceleration_library-1.4.0-cp312-cp312-win_amd64.whl
Upload date: Nov 22, 2024
Size: 41.1 MB
Tags: CPython 3.12, Windows x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.0.0 CPython/3.11.2

File hashes

Hashes for intel_npu_acceleration_library-1.4.0-cp312-cp312-win_amd64.whl
Algorithm	Hash digest
SHA256	`336d1c97e4b344cb0c1d5268f0568f6aec07237100a6b88a8a40a6d8c2c3e474`
MD5	`18022da251500e40648951fbd014a53f`
BLAKE2b-256	`0c6162f28687130cd04ab979b4718b7a6f95d93a73b97d641ee642f8f4f4bb64`

See more details on using hashes here.

File details

Details for the file intel_npu_acceleration_library-1.4.0-cp312-cp312-win32.whl.

File metadata

Download URL: intel_npu_acceleration_library-1.4.0-cp312-cp312-win32.whl
Upload date: Nov 22, 2024
Size: 41.1 MB
Tags: CPython 3.12, Windows x86
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.0.0 CPython/3.11.2

File hashes

Hashes for intel_npu_acceleration_library-1.4.0-cp312-cp312-win32.whl
Algorithm	Hash digest
SHA256	`e42b5f77c2764dac1b94f00b8bd80735010313133ce793d110164f8673f15bd7`
MD5	`1457912bb0c0af998df0a20502ce482a`
BLAKE2b-256	`6c22dd2274cbe26a69044e6664a087d0e1314e0fd587a67bf3a1a1f2900f4a1d`

See more details on using hashes here.

File details

Details for the file intel_npu_acceleration_library-1.4.0-cp311-cp311-win_amd64.whl.

File metadata

Download URL: intel_npu_acceleration_library-1.4.0-cp311-cp311-win_amd64.whl
Upload date: Nov 22, 2024
Size: 41.1 MB
Tags: CPython 3.11, Windows x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.0.0 CPython/3.11.2

File hashes

Hashes for intel_npu_acceleration_library-1.4.0-cp311-cp311-win_amd64.whl
Algorithm	Hash digest
SHA256	`88c3adf842a02dbf21c4f59f6efc4bb721b76acb787a41ec0e099f21bce6b1b7`
MD5	`badd57365be6d65bcebdc1c97745fc5b`
BLAKE2b-256	`b977f58a727be2daa7d6f48d9a1347cd123a86c7daaf6d310a9825a8807f1c46`

See more details on using hashes here.

File details

Details for the file intel_npu_acceleration_library-1.4.0-cp311-cp311-win32.whl.

File metadata

Download URL: intel_npu_acceleration_library-1.4.0-cp311-cp311-win32.whl
Upload date: Nov 22, 2024
Size: 41.1 MB
Tags: CPython 3.11, Windows x86
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.0.0 CPython/3.11.2

File hashes

Hashes for intel_npu_acceleration_library-1.4.0-cp311-cp311-win32.whl
Algorithm	Hash digest
SHA256	`59647db67809bc15f2cf41ae56109b822f9ac3e4929186f46a26cdba54817913`
MD5	`3791fdfdcf2c48f0e1ed8589be8bdd27`
BLAKE2b-256	`23b42946e40c67c2a05867bae3a52d87c54db721df7807c8a679b96b0765c898`

See more details on using hashes here.

File details

Details for the file intel_npu_acceleration_library-1.4.0-cp310-cp310-win_amd64.whl.

File metadata

Download URL: intel_npu_acceleration_library-1.4.0-cp310-cp310-win_amd64.whl
Upload date: Nov 22, 2024
Size: 41.1 MB
Tags: CPython 3.10, Windows x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.0.0 CPython/3.11.2

File hashes

Hashes for intel_npu_acceleration_library-1.4.0-cp310-cp310-win_amd64.whl
Algorithm	Hash digest
SHA256	`a2c49a856d4fa99f73f3ab5178646e0b8bdb0ec6b7faf6c52d2f115b1025d542`
MD5	`6aa9143d7afb19e9e903b763eebd0d35`
BLAKE2b-256	`9bb32a986db816dda94372579d9fe264e64acdab7a2283805700c25f4bfbf224`

See more details on using hashes here.

File details

Details for the file intel_npu_acceleration_library-1.4.0-cp310-cp310-win32.whl.

File metadata

Download URL: intel_npu_acceleration_library-1.4.0-cp310-cp310-win32.whl
Upload date: Nov 22, 2024
Size: 41.1 MB
Tags: CPython 3.10, Windows x86
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.0.0 CPython/3.11.2

File hashes

Hashes for intel_npu_acceleration_library-1.4.0-cp310-cp310-win32.whl
Algorithm	Hash digest
SHA256	`49142f045f01775d97161b9e1a2688f3c041d2a01debd23a5c7d049d16a5a5fb`
MD5	`3207322e94370e7021b90debc83e0756`
BLAKE2b-256	`c20d8c882045a1d4b646b8f6e5c313546e63caf039ac0332b296eea9155d0594`

See more details on using hashes here.

File details

Details for the file intel_npu_acceleration_library-1.4.0-cp39-cp39-win_amd64.whl.

File metadata

Download URL: intel_npu_acceleration_library-1.4.0-cp39-cp39-win_amd64.whl
Upload date: Nov 22, 2024
Size: 41.1 MB
Tags: CPython 3.9, Windows x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.0.0 CPython/3.11.2

File hashes

Hashes for intel_npu_acceleration_library-1.4.0-cp39-cp39-win_amd64.whl
Algorithm	Hash digest
SHA256	`256b95f8223a9d811bb3959841adc1a2a97f294d9fda15093df233688cc844a2`
MD5	`0ac496f7346138b0b09eeb53ab81d7c0`
BLAKE2b-256	`fd3c15fbc42e6d80197ffd0e9e91a327b396eac271df5de6136d61db8e5d939e`

See more details on using hashes here.

File details

Details for the file intel_npu_acceleration_library-1.4.0-cp39-cp39-win32.whl.

File metadata

Download URL: intel_npu_acceleration_library-1.4.0-cp39-cp39-win32.whl
Upload date: Nov 22, 2024
Size: 41.1 MB
Tags: CPython 3.9, Windows x86
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.0.0 CPython/3.11.2

File hashes

Hashes for intel_npu_acceleration_library-1.4.0-cp39-cp39-win32.whl
Algorithm	Hash digest
SHA256	`e23a9ab6ffc149804d57a03cedd26ac0acbfebb98fcb3a28d55c35f65cf4d9b7`
MD5	`fa16e2cba5c205e58717aaffd490c5c6`
BLAKE2b-256	`b8fd83df4a7f8e96e0f83c6fccc1d282f9cdf841f8d6ee8a80148db8533aa10c`

See more details on using hashes here.

File details

Details for the file intel_npu_acceleration_library-1.4.0-cp38-cp38-win_amd64.whl.

File metadata

Download URL: intel_npu_acceleration_library-1.4.0-cp38-cp38-win_amd64.whl
Upload date: Nov 22, 2024
Size: 41.1 MB
Tags: CPython 3.8, Windows x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.0.0 CPython/3.11.2

File hashes

Hashes for intel_npu_acceleration_library-1.4.0-cp38-cp38-win_amd64.whl
Algorithm	Hash digest
SHA256	`bb4230afec36b711c7710f7fe21b29e6d961bff32f3d1e2cfed8df72d5be864b`
MD5	`fc1b453d945ef852be37c51273791538`
BLAKE2b-256	`b1927b40681d4c4d9ca6b75528f524588302133e08a7f033cc5799cc33791530`

See more details on using hashes here.

File details

Details for the file intel_npu_acceleration_library-1.4.0-cp38-cp38-win32.whl.

File metadata

Download URL: intel_npu_acceleration_library-1.4.0-cp38-cp38-win32.whl
Upload date: Nov 22, 2024
Size: 41.1 MB
Tags: CPython 3.8, Windows x86
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.0.0 CPython/3.11.2

File hashes

Hashes for intel_npu_acceleration_library-1.4.0-cp38-cp38-win32.whl
Algorithm	Hash digest
SHA256	`d278fceda896f30bfef47847f200c2fea24a5a58273650909019387824681009`
MD5	`7008fbdacdfd185911a1cf3379a490b0`
BLAKE2b-256	`f2de03ace9b9de154fbb5e35d081ed76615e3617aaf73cc7294d90b062a5198c`

See more details on using hashes here.

intel-npu-acceleration-library 1.4.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Intel® NPU Acceleration Library

Intel NPU

Feature roadmap

Setup

Usage

Run a single MatMul in the NPU

Compile a model for the NPU

Run a Tiny-llama model on the NPU

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distributions

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes