Skip to main content

Intel® NPU Acceleration Library

Project description

Intel® NPU Acceleration Library

Test Style Documentation

Documentation

The Intel® NPU Acceleration Library is a Python library designed to boost the efficiency of your applications by leveraging the power of the Intel Neural Processing Unit (NPU) to perform high-speed computations on compatible hardware.

Note: The Intel® NPU Acceleration Library is currently in active development, with our team working to introduce a variety of features that are anticipated to dramatically enhance performance.

Intel NPU

The Intel NPU is an AI accelerator integrated into Intel Core Ultra processors, characterized by a unique architecture comprising compute acceleration and data transfer capabilities. Its compute acceleration is facilitated by Neural Compute Engines, which consist of hardware acceleration blocks for AI operations like Matrix Multiplication and Convolution, alongside Streaming Hybrid Architecture Vector Engines for general computing tasks.

To optimize performance, the NPU features DMA engines for efficient data transfers between system memory and a managed cache, supported by device MMU and IOMMU for security isolation. The NPU's software utilizes compiler technology to optimize AI workloads by directing compute and data flow in a tiled fashion, maximizing compute utilization primarily from scratchpad SRAM while minimizing data transfers between SRAM and DRAM for optimal performance and power efficiency.

Some useful links

  • Intel AI PC (link)
  • Intel Core Ultra Processor line (link)
  • AI Acceleration and NPU explained (video)

Feature roadmap

In our quest to significantly improve the library's performance, we are directing our efforts toward implementing a range of key features, including:

  • 8-bit quantization
  • 4-bit Quantization and GPTQ
  • NPU-Native mixed precision inference
  • Float16 support
  • BFloat16 (Brain Floating Point Format)
  • torch.compile support
  • LLM MLP horizontal fusion implementation
  • Static shape inference
  • MHA NPU inference
  • NPU/GPU hetero compute
  • Paper

Make sure to stay updated with the project's progress as these exciting enhancements are on the horizon. External contributions are very welcomed! If you want to participate in this library development, please check the Contributing guide, the developer guide and the list of open issues

Setup

Check that your system has an available NPU (how-to).

You can install the packet in your machine with

   pip install intel-npu-acceleration-library

You can also install the package on Windows and Linux from source by typing

pip install "intel-npu-acceleration-library @ git+https://github.com/intel/intel-npu-acceleration-library.git"

To build the package you need a compiler in your system (Visual Studio 2019 suggested for Windows build). MacOS is not yet supported. At the moment only Ubuntu OS is supported for Linux build. If you need a library for your specific OS, please open an issue

The library is intended to be used with Intel Core Ultra processors, which have an integrated NPU (Neural Processing Unit). For best performance please install/update the NPU drivers to the latest version. (Windows, Linux).

Usage

For implemented examples, please check the examples folder

Run a single MatMul in the NPU

from intel_npu_acceleration_library.backend import MatMul
import numpy as np

inC, outC, batch = ... # Define your own values

# Create both inputs
X1 = np.random.uniform(-1, 1, (batch, inC)).astype(np.float16)
X2 = np.random.uniform(-1, 1, (outC, inC)).astype(np.float16)

mm = MatMul(inC, outC, batch, profile=False)

result = mm.run(X1, X2)

Compile a model for the NPU

If you have pytorch>=2.0.0 installed you can use torch compile to optimize your model for the NPU

import intel_npu_acceleration_library
import torch

# Compile model for the NPU
# model a torch.nn.Module class. Model can be quantized JIT
optimized_model = torch.compile(model, backend="npu")

# Use the model as usual

In windows torch.compile is not supported yet. So you might want to use the explicit function intel_npu_acceleration_library.compile. This is true also if you use a pytorch version < 2.0.0

import intel_npu_acceleration_library
optimized_model = intel_npu_acceleration_library.compile(model, dtype=torch.int8)

# Use the model as usual

Run a Tiny-llama model on the NPU

from transformers import AutoTokenizer, TextStreamer, AutoModelForCausalLM
import intel_npu_acceleration_library
import torch

model_id = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"

model = AutoModelForCausalLM.from_pretrained(model_id, use_cache=True).eval()
tokenizer = AutoTokenizer.from_pretrained(model_id, use_default_system_prompt=True)
tokenizer.pad_token_id = tokenizer.eos_token_id
streamer = TextStreamer(tokenizer, skip_special_tokens=True)


print("Compile model for the NPU")
model = intel_npu_acceleration_library.compile(model, dtype=torch.int8)

query = input("Ask something: ")
prefix = tokenizer(query, return_tensors="pt")["input_ids"]


generation_kwargs = dict(
    input_ids=prefix,
    streamer=streamer,
    do_sample=True,
    top_k=50,
    top_p=0.9,
    max_new_tokens=512,
)

print("Run inference")
_ = model.generate(**generation_kwargs)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

intel_npu_acceleration_library-1.0.0.tar.gz (34.9 kB view details)

Uploaded Source

Built Distributions

intel_npu_acceleration_library-1.0.0-cp312-cp312-win_amd64.whl (33.4 MB view details)

Uploaded CPython 3.12 Windows x86-64

intel_npu_acceleration_library-1.0.0-cp312-cp312-win32.whl (33.4 MB view details)

Uploaded CPython 3.12 Windows x86

intel_npu_acceleration_library-1.0.0-cp311-cp311-win_amd64.whl (33.4 MB view details)

Uploaded CPython 3.11 Windows x86-64

intel_npu_acceleration_library-1.0.0-cp311-cp311-win32.whl (33.4 MB view details)

Uploaded CPython 3.11 Windows x86

intel_npu_acceleration_library-1.0.0-cp310-cp310-win_amd64.whl (33.4 MB view details)

Uploaded CPython 3.10 Windows x86-64

intel_npu_acceleration_library-1.0.0-cp310-cp310-win32.whl (33.4 MB view details)

Uploaded CPython 3.10 Windows x86

intel_npu_acceleration_library-1.0.0-cp39-cp39-win_amd64.whl (33.4 MB view details)

Uploaded CPython 3.9 Windows x86-64

intel_npu_acceleration_library-1.0.0-cp39-cp39-win32.whl (33.4 MB view details)

Uploaded CPython 3.9 Windows x86

intel_npu_acceleration_library-1.0.0-cp38-cp38-win_amd64.whl (33.4 MB view details)

Uploaded CPython 3.8 Windows x86-64

intel_npu_acceleration_library-1.0.0-cp38-cp38-win32.whl (33.4 MB view details)

Uploaded CPython 3.8 Windows x86

File details

Details for the file intel_npu_acceleration_library-1.0.0.tar.gz.

File metadata

File hashes

Hashes for intel_npu_acceleration_library-1.0.0.tar.gz
Algorithm Hash digest
SHA256 60438e62baf5659b1402b7e0fc3acf8a8b1e7e875c2dd04e3b23b1f8db79a02c
MD5 46f04009b9c3d6543265f596a1cb9a90
BLAKE2b-256 9c77f400412e2e53a2824cb9b0edfc9f6afcca86af6780552ded0b8528943f5e

See more details on using hashes here.

File details

Details for the file intel_npu_acceleration_library-1.0.0-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for intel_npu_acceleration_library-1.0.0-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 4c58b46aa545f9ff4f62d6bd2ae2d11440c111bc86e7d691eb839e75b02c1db3
MD5 515d999cd44a6c43f45c1585130a46a9
BLAKE2b-256 963a1456c59ab75a862adce030701d341c978e7e17d3da98e74eca6c24dbdf38

See more details on using hashes here.

File details

Details for the file intel_npu_acceleration_library-1.0.0-cp312-cp312-win32.whl.

File metadata

File hashes

Hashes for intel_npu_acceleration_library-1.0.0-cp312-cp312-win32.whl
Algorithm Hash digest
SHA256 45309373a01a650091596c9e1b11ae6d4991cd185ece173f2e11765f0fb5eb39
MD5 e0f5a1c96ae99f8201b2fc8005682074
BLAKE2b-256 9fe6afc14adb48e43b632b6831107cd8a714bd8b3386884edba1d3dd3ee4fd79

See more details on using hashes here.

File details

Details for the file intel_npu_acceleration_library-1.0.0-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for intel_npu_acceleration_library-1.0.0-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 bbc612da59e378679bea667a064237a1f58793c807bbb069169fdf5f5d5b1226
MD5 d0df34b94b701b1699255de595810255
BLAKE2b-256 fe23ec791889480772212e683835a7a55665fbb9b792c68aa94b2d1c79b1c947

See more details on using hashes here.

File details

Details for the file intel_npu_acceleration_library-1.0.0-cp311-cp311-win32.whl.

File metadata

File hashes

Hashes for intel_npu_acceleration_library-1.0.0-cp311-cp311-win32.whl
Algorithm Hash digest
SHA256 8d9f5964ed424fdbd2b8ea7e40b48cc76d3c9d4af717854d853411080bd53f1c
MD5 4639639a7016393544e1650e5d3c428c
BLAKE2b-256 f05fdab4123966f34f10889d0ed84400e849fe803d1ea098a8106f8ba3c63ae2

See more details on using hashes here.

File details

Details for the file intel_npu_acceleration_library-1.0.0-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for intel_npu_acceleration_library-1.0.0-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 f8bd9f22e411b531a5264a1ef14070f433efca8b869f60ebf868f90f923a5831
MD5 921c1b1a11aaa309ceadde7816e198ca
BLAKE2b-256 969f4dfd30c83c15529ed5e9254ac0b6df0dc85b39cf2b18583f77b4ad0b5685

See more details on using hashes here.

File details

Details for the file intel_npu_acceleration_library-1.0.0-cp310-cp310-win32.whl.

File metadata

File hashes

Hashes for intel_npu_acceleration_library-1.0.0-cp310-cp310-win32.whl
Algorithm Hash digest
SHA256 12647c9d793246b4dbd6f9ff3d0e22d7c0ee081e4d8eef041085d509a5c808b9
MD5 828c96a5157e9518774bb4ea4393bd02
BLAKE2b-256 6226b606e5223507c4a6648fefeb95d55bee27b8eac8c57a574bd6e834720497

See more details on using hashes here.

File details

Details for the file intel_npu_acceleration_library-1.0.0-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for intel_npu_acceleration_library-1.0.0-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 5ee0580d3d04713d0ee1e41900685cdd297ac99959fe4457e2f4174b47c178a6
MD5 261801baa186a21909abf67702d68998
BLAKE2b-256 ce0bb2f8bc1eec35fc5c2824e86d791220744987f2138e27bb669808fa14e0bf

See more details on using hashes here.

File details

Details for the file intel_npu_acceleration_library-1.0.0-cp39-cp39-win32.whl.

File metadata

File hashes

Hashes for intel_npu_acceleration_library-1.0.0-cp39-cp39-win32.whl
Algorithm Hash digest
SHA256 813ffd374fed98be6fd435e935eb880d943efa5ae4fafe74bae1c31b80a546cf
MD5 4ca69965709d64f831cfcad5e593ee90
BLAKE2b-256 4a7f6204ec1a27be8b0832732ad33ff0fd7ada8295355a1feba9f574959c4c55

See more details on using hashes here.

File details

Details for the file intel_npu_acceleration_library-1.0.0-cp38-cp38-win_amd64.whl.

File metadata

File hashes

Hashes for intel_npu_acceleration_library-1.0.0-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 a3230a8ce205eb6fe677ac3247481dd1845d8b25ba828435200e3a381101f881
MD5 dffeabec258e7e96d555373bb5a89957
BLAKE2b-256 0185bc88c61558da9a6b6e4281509ece5a5f65c907f8fd3ec87d9955b4121f57

See more details on using hashes here.

File details

Details for the file intel_npu_acceleration_library-1.0.0-cp38-cp38-win32.whl.

File metadata

File hashes

Hashes for intel_npu_acceleration_library-1.0.0-cp38-cp38-win32.whl
Algorithm Hash digest
SHA256 e3518484ad671653019c928f431ed67fc83aa9b992e5b038b6d90d9f8494859c
MD5 b12eaf01a11e879b51f53aa2ee8b0908
BLAKE2b-256 1269cd92aadb2c21d0652efbd92c08b41425eecad0186ee0c1b80a6adf460435

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page