Skip to main content

Intel® NPU Acceleration Library

Project description

Intel® NPU Acceleration Library

Test Style Documentation

PyPI version Downloads

Documentation

The Intel® NPU Acceleration Library is a Python library designed to boost the efficiency of your applications by leveraging the power of the Intel Neural Processing Unit (NPU) to perform high-speed computations on compatible hardware.

_Note: The Intel® NPU Acceleration Library is currently in active development, with our team working to introduce a variety of features that are anticipated to dramatically enhance performance. For performant production ready solutions please refer to like OpenVINO or DirectML. _

Intel NPU

The Intel NPU is an AI accelerator integrated into Intel Core Ultra processors, characterized by a unique architecture comprising compute acceleration and data transfer capabilities. Its compute acceleration is facilitated by Neural Compute Engines, which consist of hardware acceleration blocks for AI operations like Matrix Multiplication and Convolution, alongside Streaming Hybrid Architecture Vector Engines for general computing tasks.

To optimize performance, the NPU features DMA engines for efficient data transfers between system memory and a managed cache, supported by device MMU and IOMMU for security isolation. The NPU's software utilizes compiler technology to optimize AI workloads by directing compute and data flow in a tiled fashion, maximizing compute utilization primarily from scratchpad SRAM while minimizing data transfers between SRAM and DRAM for optimal performance and power efficiency.

Some useful links

  • Intel AI PC (link)
  • Intel Core Ultra Processor line (link)
  • AI Acceleration and NPU explained (video)

Feature roadmap

In our quest to significantly improve the library's performance, we are directing our efforts toward implementing a range of key features, including:

  • 8-bit quantization
  • 4-bit Quantization and GPTQ
  • NPU-Native mixed precision inference
  • Float16 support
  • BFloat16 (Brain Floating Point Format)
  • torch.compile support
  • LLM MLP horizontal fusion implementation
  • Static shape inference
  • MHA NPU inference
  • NPU/GPU hetero compute
  • Paper

Make sure to stay updated with the project's progress as these exciting enhancements are on the horizon. External contributions are very welcomed! If you want to participate in this library development, please check the Contributing guide, the developer guide and the list of open issues

Setup

Check that your system has an available NPU (how-to).

You can install the packet in your machine with

   pip install intel-npu-acceleration-library

You can also install the package on Windows and Linux from source by typing

pip install "intel-npu-acceleration-library @ git+https://github.com/intel/intel-npu-acceleration-library.git"

To build the package you need a compiler in your system (Visual Studio 2019 suggested for Windows build). MacOS is not yet supported. At the moment only Ubuntu OS is supported for Linux build. If you need a library for your specific OS, please open an issue

The library is intended to be used with Intel Core Ultra processors, which have an integrated NPU (Neural Processing Unit). For best performance please install/update the NPU drivers to the latest version. (Windows, Linux).

Usage

For implemented examples, please check the examples folder

Run a single MatMul in the NPU

from intel_npu_acceleration_library.backend import MatMul
import numpy as np

inC, outC, batch = ... # Define your own values

# Create both inputs
X1 = np.random.uniform(-1, 1, (batch, inC)).astype(np.float16)
X2 = np.random.uniform(-1, 1, (outC, inC)).astype(np.float16)

mm = MatMul(inC, outC, batch, profile=False)

result = mm.run(X1, X2)

Compile a model for the NPU

If you have pytorch>=2.0.0 installed you can use torch compile to optimize your model for the NPU

import intel_npu_acceleration_library
import torch

# Compile model for the NPU
# model a torch.nn.Module class. Model can be quantized JIT
optimized_model = torch.compile(model, backend="npu")

# Use the model as usual

In windows torch.compile is not supported yet. So you might want to use the explicit function intel_npu_acceleration_library.compile. This is true also if you use a pytorch version < 2.0.0

import intel_npu_acceleration_library
optimized_model = intel_npu_acceleration_library.compile(model, dtype=torch.int8)

# Use the model as usual

Run a Tiny-llama model on the NPU

from transformers import AutoTokenizer, TextStreamer
from intel_npu_acceleration_library import NPUModelForCausalLM
import torch

model_id = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"

model = NPUModelForCausalLM.from_pretrained(model_id, use_cache=True, dtype=torch.int8).eval()
tokenizer = AutoTokenizer.from_pretrained(model_id, use_default_system_prompt=True)
tokenizer.pad_token_id = tokenizer.eos_token_id
streamer = TextStreamer(tokenizer, skip_special_tokens=True)


query = input("Ask something: ")
prefix = tokenizer(query, return_tensors="pt")["input_ids"]


generation_kwargs = dict(
    input_ids=prefix,
    streamer=streamer,
    do_sample=True,
    top_k=50,
    top_p=0.9,
    max_new_tokens=512,
)

print("Run inference")
_ = model.generate(**generation_kwargs)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

intel_npu_acceleration_library-1.4.0.tar.gz (70.0 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

intel_npu_acceleration_library-1.4.0-cp312-cp312-win_amd64.whl (41.1 MB view details)

Uploaded CPython 3.12Windows x86-64

intel_npu_acceleration_library-1.4.0-cp312-cp312-win32.whl (41.1 MB view details)

Uploaded CPython 3.12Windows x86

intel_npu_acceleration_library-1.4.0-cp311-cp311-win_amd64.whl (41.1 MB view details)

Uploaded CPython 3.11Windows x86-64

intel_npu_acceleration_library-1.4.0-cp311-cp311-win32.whl (41.1 MB view details)

Uploaded CPython 3.11Windows x86

intel_npu_acceleration_library-1.4.0-cp310-cp310-win_amd64.whl (41.1 MB view details)

Uploaded CPython 3.10Windows x86-64

intel_npu_acceleration_library-1.4.0-cp310-cp310-win32.whl (41.1 MB view details)

Uploaded CPython 3.10Windows x86

intel_npu_acceleration_library-1.4.0-cp39-cp39-win_amd64.whl (41.1 MB view details)

Uploaded CPython 3.9Windows x86-64

intel_npu_acceleration_library-1.4.0-cp39-cp39-win32.whl (41.1 MB view details)

Uploaded CPython 3.9Windows x86

intel_npu_acceleration_library-1.4.0-cp38-cp38-win_amd64.whl (41.1 MB view details)

Uploaded CPython 3.8Windows x86-64

intel_npu_acceleration_library-1.4.0-cp38-cp38-win32.whl (41.1 MB view details)

Uploaded CPython 3.8Windows x86

File details

Details for the file intel_npu_acceleration_library-1.4.0.tar.gz.

File metadata

File hashes

Hashes for intel_npu_acceleration_library-1.4.0.tar.gz
Algorithm Hash digest
SHA256 85bc37169189a0bfdb536a74de454925a4750adc1000d29a6c87de1f1a3a8d7d
MD5 47ff3fb2946f27332afd06831141b6a6
BLAKE2b-256 b292cb0db9baaf1a21a98ef7a7f37a852836eab84da245c9f938db409a96b136

See more details on using hashes here.

File details

Details for the file intel_npu_acceleration_library-1.4.0-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for intel_npu_acceleration_library-1.4.0-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 336d1c97e4b344cb0c1d5268f0568f6aec07237100a6b88a8a40a6d8c2c3e474
MD5 18022da251500e40648951fbd014a53f
BLAKE2b-256 0c6162f28687130cd04ab979b4718b7a6f95d93a73b97d641ee642f8f4f4bb64

See more details on using hashes here.

File details

Details for the file intel_npu_acceleration_library-1.4.0-cp312-cp312-win32.whl.

File metadata

File hashes

Hashes for intel_npu_acceleration_library-1.4.0-cp312-cp312-win32.whl
Algorithm Hash digest
SHA256 e42b5f77c2764dac1b94f00b8bd80735010313133ce793d110164f8673f15bd7
MD5 1457912bb0c0af998df0a20502ce482a
BLAKE2b-256 6c22dd2274cbe26a69044e6664a087d0e1314e0fd587a67bf3a1a1f2900f4a1d

See more details on using hashes here.

File details

Details for the file intel_npu_acceleration_library-1.4.0-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for intel_npu_acceleration_library-1.4.0-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 88c3adf842a02dbf21c4f59f6efc4bb721b76acb787a41ec0e099f21bce6b1b7
MD5 badd57365be6d65bcebdc1c97745fc5b
BLAKE2b-256 b977f58a727be2daa7d6f48d9a1347cd123a86c7daaf6d310a9825a8807f1c46

See more details on using hashes here.

File details

Details for the file intel_npu_acceleration_library-1.4.0-cp311-cp311-win32.whl.

File metadata

File hashes

Hashes for intel_npu_acceleration_library-1.4.0-cp311-cp311-win32.whl
Algorithm Hash digest
SHA256 59647db67809bc15f2cf41ae56109b822f9ac3e4929186f46a26cdba54817913
MD5 3791fdfdcf2c48f0e1ed8589be8bdd27
BLAKE2b-256 23b42946e40c67c2a05867bae3a52d87c54db721df7807c8a679b96b0765c898

See more details on using hashes here.

File details

Details for the file intel_npu_acceleration_library-1.4.0-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for intel_npu_acceleration_library-1.4.0-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 a2c49a856d4fa99f73f3ab5178646e0b8bdb0ec6b7faf6c52d2f115b1025d542
MD5 6aa9143d7afb19e9e903b763eebd0d35
BLAKE2b-256 9bb32a986db816dda94372579d9fe264e64acdab7a2283805700c25f4bfbf224

See more details on using hashes here.

File details

Details for the file intel_npu_acceleration_library-1.4.0-cp310-cp310-win32.whl.

File metadata

File hashes

Hashes for intel_npu_acceleration_library-1.4.0-cp310-cp310-win32.whl
Algorithm Hash digest
SHA256 49142f045f01775d97161b9e1a2688f3c041d2a01debd23a5c7d049d16a5a5fb
MD5 3207322e94370e7021b90debc83e0756
BLAKE2b-256 c20d8c882045a1d4b646b8f6e5c313546e63caf039ac0332b296eea9155d0594

See more details on using hashes here.

File details

Details for the file intel_npu_acceleration_library-1.4.0-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for intel_npu_acceleration_library-1.4.0-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 256b95f8223a9d811bb3959841adc1a2a97f294d9fda15093df233688cc844a2
MD5 0ac496f7346138b0b09eeb53ab81d7c0
BLAKE2b-256 fd3c15fbc42e6d80197ffd0e9e91a327b396eac271df5de6136d61db8e5d939e

See more details on using hashes here.

File details

Details for the file intel_npu_acceleration_library-1.4.0-cp39-cp39-win32.whl.

File metadata

File hashes

Hashes for intel_npu_acceleration_library-1.4.0-cp39-cp39-win32.whl
Algorithm Hash digest
SHA256 e23a9ab6ffc149804d57a03cedd26ac0acbfebb98fcb3a28d55c35f65cf4d9b7
MD5 fa16e2cba5c205e58717aaffd490c5c6
BLAKE2b-256 b8fd83df4a7f8e96e0f83c6fccc1d282f9cdf841f8d6ee8a80148db8533aa10c

See more details on using hashes here.

File details

Details for the file intel_npu_acceleration_library-1.4.0-cp38-cp38-win_amd64.whl.

File metadata

File hashes

Hashes for intel_npu_acceleration_library-1.4.0-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 bb4230afec36b711c7710f7fe21b29e6d961bff32f3d1e2cfed8df72d5be864b
MD5 fc1b453d945ef852be37c51273791538
BLAKE2b-256 b1927b40681d4c4d9ca6b75528f524588302133e08a7f033cc5799cc33791530

See more details on using hashes here.

File details

Details for the file intel_npu_acceleration_library-1.4.0-cp38-cp38-win32.whl.

File metadata

File hashes

Hashes for intel_npu_acceleration_library-1.4.0-cp38-cp38-win32.whl
Algorithm Hash digest
SHA256 d278fceda896f30bfef47847f200c2fea24a5a58273650909019387824681009
MD5 7008fbdacdfd185911a1cf3379a490b0
BLAKE2b-256 f2de03ace9b9de154fbb5e35d081ed76615e3617aaf73cc7294d90b062a5198c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page