Skip to main content

Intel® NPU Acceleration Library

Project description

Intel® NPU Acceleration Library

Test Style Documentation

PyPI version Downloads

Documentation

The Intel® NPU Acceleration Library is a Python library designed to boost the efficiency of your applications by leveraging the power of the Intel Neural Processing Unit (NPU) to perform high-speed computations on compatible hardware.

Note: The Intel® NPU Acceleration Library is currently in active development, with our team working to introduce a variety of features that are anticipated to dramatically enhance performance.

Intel NPU

The Intel NPU is an AI accelerator integrated into Intel Core Ultra processors, characterized by a unique architecture comprising compute acceleration and data transfer capabilities. Its compute acceleration is facilitated by Neural Compute Engines, which consist of hardware acceleration blocks for AI operations like Matrix Multiplication and Convolution, alongside Streaming Hybrid Architecture Vector Engines for general computing tasks.

To optimize performance, the NPU features DMA engines for efficient data transfers between system memory and a managed cache, supported by device MMU and IOMMU for security isolation. The NPU's software utilizes compiler technology to optimize AI workloads by directing compute and data flow in a tiled fashion, maximizing compute utilization primarily from scratchpad SRAM while minimizing data transfers between SRAM and DRAM for optimal performance and power efficiency.

Some useful links

  • Intel AI PC (link)
  • Intel Core Ultra Processor line (link)
  • AI Acceleration and NPU explained (video)

Feature roadmap

In our quest to significantly improve the library's performance, we are directing our efforts toward implementing a range of key features, including:

  • 8-bit quantization
  • 4-bit Quantization and GPTQ
  • NPU-Native mixed precision inference
  • Float16 support
  • BFloat16 (Brain Floating Point Format)
  • torch.compile support
  • LLM MLP horizontal fusion implementation
  • Static shape inference
  • MHA NPU inference
  • NPU/GPU hetero compute
  • Paper

Make sure to stay updated with the project's progress as these exciting enhancements are on the horizon. External contributions are very welcomed! If you want to participate in this library development, please check the Contributing guide, the developer guide and the list of open issues

Setup

Check that your system has an available NPU (how-to).

You can install the packet in your machine with

   pip install intel-npu-acceleration-library

You can also install the package on Windows and Linux from source by typing

pip install "intel-npu-acceleration-library @ git+https://github.com/intel/intel-npu-acceleration-library.git"

To build the package you need a compiler in your system (Visual Studio 2019 suggested for Windows build). MacOS is not yet supported. At the moment only Ubuntu OS is supported for Linux build. If you need a library for your specific OS, please open an issue

The library is intended to be used with Intel Core Ultra processors, which have an integrated NPU (Neural Processing Unit). For best performance please install/update the NPU drivers to the latest version. (Windows, Linux).

Usage

For implemented examples, please check the examples folder

Run a single MatMul in the NPU

from intel_npu_acceleration_library.backend import MatMul
import numpy as np

inC, outC, batch = ... # Define your own values

# Create both inputs
X1 = np.random.uniform(-1, 1, (batch, inC)).astype(np.float16)
X2 = np.random.uniform(-1, 1, (outC, inC)).astype(np.float16)

mm = MatMul(inC, outC, batch, profile=False)

result = mm.run(X1, X2)

Compile a model for the NPU

If you have pytorch>=2.0.0 installed you can use torch compile to optimize your model for the NPU

import intel_npu_acceleration_library
import torch

# Compile model for the NPU
# model a torch.nn.Module class. Model can be quantized JIT
optimized_model = torch.compile(model, backend="npu")

# Use the model as usual

In windows torch.compile is not supported yet. So you might want to use the explicit function intel_npu_acceleration_library.compile. This is true also if you use a pytorch version < 2.0.0

import intel_npu_acceleration_library
optimized_model = intel_npu_acceleration_library.compile(model, dtype=torch.int8)

# Use the model as usual

Run a Tiny-llama model on the NPU

from transformers import AutoTokenizer, TextStreamer
from intel_npu_acceleration_library import NPUModelForCausalLM
import torch

model_id = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"

model = NPUModelForCausalLM.from_pretrained(model_id, use_cache=True, dtype=torch.int8).eval()
tokenizer = AutoTokenizer.from_pretrained(model_id, use_default_system_prompt=True)
tokenizer.pad_token_id = tokenizer.eos_token_id
streamer = TextStreamer(tokenizer, skip_special_tokens=True)


query = input("Ask something: ")
prefix = tokenizer(query, return_tensors="pt")["input_ids"]


generation_kwargs = dict(
    input_ids=prefix,
    streamer=streamer,
    do_sample=True,
    top_k=50,
    top_p=0.9,
    max_new_tokens=512,
)

print("Run inference")
_ = model.generate(**generation_kwargs)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

intel_npu_acceleration_library-1.3.0.tar.gz (65.1 kB view details)

Uploaded Source

Built Distributions

intel_npu_acceleration_library-1.3.0-cp312-cp312-win_amd64.whl (39.7 MB view details)

Uploaded CPython 3.12 Windows x86-64

intel_npu_acceleration_library-1.3.0-cp312-cp312-win32.whl (39.7 MB view details)

Uploaded CPython 3.12 Windows x86

intel_npu_acceleration_library-1.3.0-cp311-cp311-win_amd64.whl (39.7 MB view details)

Uploaded CPython 3.11 Windows x86-64

intel_npu_acceleration_library-1.3.0-cp311-cp311-win32.whl (39.7 MB view details)

Uploaded CPython 3.11 Windows x86

intel_npu_acceleration_library-1.3.0-cp310-cp310-win_amd64.whl (39.7 MB view details)

Uploaded CPython 3.10 Windows x86-64

intel_npu_acceleration_library-1.3.0-cp310-cp310-win32.whl (39.7 MB view details)

Uploaded CPython 3.10 Windows x86

intel_npu_acceleration_library-1.3.0-cp39-cp39-win_amd64.whl (39.7 MB view details)

Uploaded CPython 3.9 Windows x86-64

intel_npu_acceleration_library-1.3.0-cp39-cp39-win32.whl (39.7 MB view details)

Uploaded CPython 3.9 Windows x86

intel_npu_acceleration_library-1.3.0-cp38-cp38-win_amd64.whl (39.7 MB view details)

Uploaded CPython 3.8 Windows x86-64

intel_npu_acceleration_library-1.3.0-cp38-cp38-win32.whl (39.7 MB view details)

Uploaded CPython 3.8 Windows x86

File details

Details for the file intel_npu_acceleration_library-1.3.0.tar.gz.

File metadata

File hashes

Hashes for intel_npu_acceleration_library-1.3.0.tar.gz
Algorithm Hash digest
SHA256 a6bbe15385aa4a63814d1f8489efa1f3296869d6c6d9246cc995f1421e393e80
MD5 08234e242682bf62fec238d872825eaf
BLAKE2b-256 d1a98ed16d003fdec94177f334d525b5204eb9818d329f29bf5d5a2af4a582b2

See more details on using hashes here.

File details

Details for the file intel_npu_acceleration_library-1.3.0-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for intel_npu_acceleration_library-1.3.0-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 c745fe51174d6ddd21212f3a09478b76a473740a8b7680f461fe1b75f3a2a998
MD5 18a45cf0f44ab9a0ffdc1a2d79824e5f
BLAKE2b-256 e84800c7038023453fdeef8ed9f3c2bd958f5fe742c655ce9ebaebf97d2ab26a

See more details on using hashes here.

File details

Details for the file intel_npu_acceleration_library-1.3.0-cp312-cp312-win32.whl.

File metadata

File hashes

Hashes for intel_npu_acceleration_library-1.3.0-cp312-cp312-win32.whl
Algorithm Hash digest
SHA256 239df3615cf5a0e8c10840dc614d08d9815a1e61a5d9ec43dc3c76955fffb128
MD5 0690350728ec804069d17a998536e3db
BLAKE2b-256 17c21aa69fe0f8b059e019bdea32cbf7ec7f8c73abb9a5e3ec78090907edbbe5

See more details on using hashes here.

File details

Details for the file intel_npu_acceleration_library-1.3.0-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for intel_npu_acceleration_library-1.3.0-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 59dabeebcd76399479d7320d3006e2172641fba8ea3ea1f06cf2fe78ed7de868
MD5 86c71e02bec93b53d05cb46c75458126
BLAKE2b-256 22ff5e3e6fe67f95b8142236dc58aefc7aa90ce5bfbff3e3cc72d231c143d2d8

See more details on using hashes here.

File details

Details for the file intel_npu_acceleration_library-1.3.0-cp311-cp311-win32.whl.

File metadata

File hashes

Hashes for intel_npu_acceleration_library-1.3.0-cp311-cp311-win32.whl
Algorithm Hash digest
SHA256 ba9a269217875e04a4185fefe43650f12af14265aef299eb38d5a770e283bfe8
MD5 d9383c25b5642d3d3fd7a9208b35e3c3
BLAKE2b-256 8afa5335ae0c755d6a51d4917838d98fd6a872dc4425f837edef8f929928cae0

See more details on using hashes here.

File details

Details for the file intel_npu_acceleration_library-1.3.0-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for intel_npu_acceleration_library-1.3.0-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 12f80ce3ec6e3bd60cb50b25d5ce0dae1438ef9432f772703196368cb4b04a3a
MD5 12ca6dc92615cbe3e27276939607cbf2
BLAKE2b-256 d66e66b2750c47ade55dfa2774f1655aa822f8337a5500d39b3cf7c110c2d206

See more details on using hashes here.

File details

Details for the file intel_npu_acceleration_library-1.3.0-cp310-cp310-win32.whl.

File metadata

File hashes

Hashes for intel_npu_acceleration_library-1.3.0-cp310-cp310-win32.whl
Algorithm Hash digest
SHA256 02ee17980cce4c9be4e8fa5efa09a7332fb0f858caa9ba4cafb0208f4cf52d2c
MD5 b2e8931ec0695d16885522b5e255ff9b
BLAKE2b-256 0d1f83abf7086f5055d8dc6177c98e8f671600a3866f47fdd5baaee3ac801029

See more details on using hashes here.

File details

Details for the file intel_npu_acceleration_library-1.3.0-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for intel_npu_acceleration_library-1.3.0-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 d9ef23d06b9f984bc2eb4048267033b3878f45625f99fc51d4aa16ed8436af34
MD5 77da8fa7cf07469d3454a54f0451e153
BLAKE2b-256 b5f4f0e34a0446ea233e8dde47682e71f2336561a165375c0f3d74a0aa0c804c

See more details on using hashes here.

File details

Details for the file intel_npu_acceleration_library-1.3.0-cp39-cp39-win32.whl.

File metadata

File hashes

Hashes for intel_npu_acceleration_library-1.3.0-cp39-cp39-win32.whl
Algorithm Hash digest
SHA256 e3df4fe77cf1041cf6a725500a30b1611361577556caf072deb21382fa37410e
MD5 8b883cbc73ea06e4adf59c6b4e7732c6
BLAKE2b-256 7c4b94ebb693781f5057094313de659730cf2b4c22d5e220783af80148f3f267

See more details on using hashes here.

File details

Details for the file intel_npu_acceleration_library-1.3.0-cp38-cp38-win_amd64.whl.

File metadata

File hashes

Hashes for intel_npu_acceleration_library-1.3.0-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 ea36ca858194ce0def7086d1bdb7538c23d52987cc79d8996297d8a876bcef08
MD5 a413d55c9ab22c2f7af5889cf6803c72
BLAKE2b-256 83d1b234623ad146f3486f13fd78e5172c28ca53a9bb74d23ca22e18d394c213

See more details on using hashes here.

File details

Details for the file intel_npu_acceleration_library-1.3.0-cp38-cp38-win32.whl.

File metadata

File hashes

Hashes for intel_npu_acceleration_library-1.3.0-cp38-cp38-win32.whl
Algorithm Hash digest
SHA256 c96d48cb3b1f6ee07d21b7d2bac75646f84c8e35ac938cec03eb9630286ae61a
MD5 a34fb567036379e0d518582095e6b76d
BLAKE2b-256 58cef64c9ce3e33466a3538a0227b8d6033180297277e9a75306599ddb49ec69

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page