Skip to main content

Intel® NPU Acceleration Library

Project description

Intel® NPU Acceleration Library

Test Style Documentation

PyPI version Downloads

Documentation

The Intel® NPU Acceleration Library is a Python library designed to boost the efficiency of your applications by leveraging the power of the Intel Neural Processing Unit (NPU) to perform high-speed computations on compatible hardware.

Note: The Intel® NPU Acceleration Library is currently in active development, with our team working to introduce a variety of features that are anticipated to dramatically enhance performance.

Intel NPU

The Intel NPU is an AI accelerator integrated into Intel Core Ultra processors, characterized by a unique architecture comprising compute acceleration and data transfer capabilities. Its compute acceleration is facilitated by Neural Compute Engines, which consist of hardware acceleration blocks for AI operations like Matrix Multiplication and Convolution, alongside Streaming Hybrid Architecture Vector Engines for general computing tasks.

To optimize performance, the NPU features DMA engines for efficient data transfers between system memory and a managed cache, supported by device MMU and IOMMU for security isolation. The NPU's software utilizes compiler technology to optimize AI workloads by directing compute and data flow in a tiled fashion, maximizing compute utilization primarily from scratchpad SRAM while minimizing data transfers between SRAM and DRAM for optimal performance and power efficiency.

Some useful links

  • Intel AI PC (link)
  • Intel Core Ultra Processor line (link)
  • AI Acceleration and NPU explained (video)

Feature roadmap

In our quest to significantly improve the library's performance, we are directing our efforts toward implementing a range of key features, including:

  • 8-bit quantization
  • 4-bit Quantization and GPTQ
  • NPU-Native mixed precision inference
  • Float16 support
  • BFloat16 (Brain Floating Point Format)
  • torch.compile support
  • LLM MLP horizontal fusion implementation
  • Static shape inference
  • MHA NPU inference
  • NPU/GPU hetero compute
  • Paper

Make sure to stay updated with the project's progress as these exciting enhancements are on the horizon. External contributions are very welcomed! If you want to participate in this library development, please check the Contributing guide, the developer guide and the list of open issues

Setup

Check that your system has an available NPU (how-to).

You can install the packet in your machine with

   pip install intel-npu-acceleration-library

You can also install the package on Windows and Linux from source by typing

pip install "intel-npu-acceleration-library @ git+https://github.com/intel/intel-npu-acceleration-library.git"

To build the package you need a compiler in your system (Visual Studio 2019 suggested for Windows build). MacOS is not yet supported. At the moment only Ubuntu OS is supported for Linux build. If you need a library for your specific OS, please open an issue

The library is intended to be used with Intel Core Ultra processors, which have an integrated NPU (Neural Processing Unit). For best performance please install/update the NPU drivers to the latest version. (Windows, Linux).

Usage

For implemented examples, please check the examples folder

Run a single MatMul in the NPU

from intel_npu_acceleration_library.backend import MatMul
import numpy as np

inC, outC, batch = ... # Define your own values

# Create both inputs
X1 = np.random.uniform(-1, 1, (batch, inC)).astype(np.float16)
X2 = np.random.uniform(-1, 1, (outC, inC)).astype(np.float16)

mm = MatMul(inC, outC, batch, profile=False)

result = mm.run(X1, X2)

Compile a model for the NPU

If you have pytorch>=2.0.0 installed you can use torch compile to optimize your model for the NPU

import intel_npu_acceleration_library
import torch

# Compile model for the NPU
# model a torch.nn.Module class. Model can be quantized JIT
optimized_model = torch.compile(model, backend="npu")

# Use the model as usual

In windows torch.compile is not supported yet. So you might want to use the explicit function intel_npu_acceleration_library.compile. This is true also if you use a pytorch version < 2.0.0

import intel_npu_acceleration_library
optimized_model = intel_npu_acceleration_library.compile(model, dtype=torch.int8)

# Use the model as usual

Run a Tiny-llama model on the NPU

from transformers import AutoTokenizer, TextStreamer
from intel_npu_acceleration_library import NPUModelForCausalLM
import torch

model_id = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"

model = NPUModelForCausalLM.from_pretrained(model_id, use_cache=True, dtype=torch.int8).eval()
tokenizer = AutoTokenizer.from_pretrained(model_id, use_default_system_prompt=True)
tokenizer.pad_token_id = tokenizer.eos_token_id
streamer = TextStreamer(tokenizer, skip_special_tokens=True)


query = input("Ask something: ")
prefix = tokenizer(query, return_tensors="pt")["input_ids"]


generation_kwargs = dict(
    input_ids=prefix,
    streamer=streamer,
    do_sample=True,
    top_k=50,
    top_p=0.9,
    max_new_tokens=512,
)

print("Run inference")
_ = model.generate(**generation_kwargs)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

intel_npu_acceleration_library-1.2.0.tar.gz (43.6 kB view details)

Uploaded Source

Built Distributions

intel_npu_acceleration_library-1.2.0-cp312-cp312-win_amd64.whl (39.7 MB view details)

Uploaded CPython 3.12 Windows x86-64

intel_npu_acceleration_library-1.2.0-cp312-cp312-win32.whl (39.7 MB view details)

Uploaded CPython 3.12 Windows x86

intel_npu_acceleration_library-1.2.0-cp311-cp311-win_amd64.whl (39.7 MB view details)

Uploaded CPython 3.11 Windows x86-64

intel_npu_acceleration_library-1.2.0-cp311-cp311-win32.whl (39.7 MB view details)

Uploaded CPython 3.11 Windows x86

intel_npu_acceleration_library-1.2.0-cp310-cp310-win_amd64.whl (39.7 MB view details)

Uploaded CPython 3.10 Windows x86-64

intel_npu_acceleration_library-1.2.0-cp310-cp310-win32.whl (39.7 MB view details)

Uploaded CPython 3.10 Windows x86

intel_npu_acceleration_library-1.2.0-cp39-cp39-win_amd64.whl (39.7 MB view details)

Uploaded CPython 3.9 Windows x86-64

intel_npu_acceleration_library-1.2.0-cp39-cp39-win32.whl (39.7 MB view details)

Uploaded CPython 3.9 Windows x86

intel_npu_acceleration_library-1.2.0-cp38-cp38-win_amd64.whl (39.7 MB view details)

Uploaded CPython 3.8 Windows x86-64

intel_npu_acceleration_library-1.2.0-cp38-cp38-win32.whl (39.7 MB view details)

Uploaded CPython 3.8 Windows x86

File details

Details for the file intel_npu_acceleration_library-1.2.0.tar.gz.

File metadata

File hashes

Hashes for intel_npu_acceleration_library-1.2.0.tar.gz
Algorithm Hash digest
SHA256 2608cb51b10fd91a8112d03afcfb18b9e3638f3c70a5ae637b68d060a697ae49
MD5 5efb92c9dcca6a031ca5609ee0fff68f
BLAKE2b-256 0dfbe7f55739ea988c05318c7bd31ea9c6fcc089b37a24c6465f53c6d4569076

See more details on using hashes here.

File details

Details for the file intel_npu_acceleration_library-1.2.0-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for intel_npu_acceleration_library-1.2.0-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 6338551f46b51d1c3bbc94649355871dfa8d2fb1405fb1f80ecd1659f7abd7b2
MD5 87d359f1c7cdd3667f14d883d4db8462
BLAKE2b-256 7f9163c7d37346e5a097bc694f45b494be81bb4319b9f2e93c15fc93128cb8af

See more details on using hashes here.

File details

Details for the file intel_npu_acceleration_library-1.2.0-cp312-cp312-win32.whl.

File metadata

File hashes

Hashes for intel_npu_acceleration_library-1.2.0-cp312-cp312-win32.whl
Algorithm Hash digest
SHA256 e783f64bfd940b45e120547e21ae0a0a1d35c1ff4ab93925d2996d4501ad290d
MD5 f28da45fa721c901ebe3049c6a7426fd
BLAKE2b-256 aa6bf1fd9b6d4bc5d1d9a605b0171d5dad27f00384d65bd6445441ae8dfa9599

See more details on using hashes here.

File details

Details for the file intel_npu_acceleration_library-1.2.0-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for intel_npu_acceleration_library-1.2.0-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 67b7cafb7f8e4f197a6f760f9d58a8882df8a3f1d6c8635c0ad052c84092a750
MD5 01bca8106170beabd99b5f6a85fc9007
BLAKE2b-256 7a1d49eb8e2d6f85dfea79d8ff889fe1863e0610697332ec9396d6a04e520fac

See more details on using hashes here.

File details

Details for the file intel_npu_acceleration_library-1.2.0-cp311-cp311-win32.whl.

File metadata

File hashes

Hashes for intel_npu_acceleration_library-1.2.0-cp311-cp311-win32.whl
Algorithm Hash digest
SHA256 867a82a6da6f8be2b1ef77aafaaa2c94fb6cc408efc634c3e5fff2658068832c
MD5 a2ac23545600387193c3bf82f3b9290f
BLAKE2b-256 29b4372d8a002dbd6a137b4cb08313a37fbecd0ac45ca72f1ec7418772892559

See more details on using hashes here.

File details

Details for the file intel_npu_acceleration_library-1.2.0-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for intel_npu_acceleration_library-1.2.0-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 49febe2a28b4052188f4fd9bd6049248cade15e4ec84101620882d365b48466d
MD5 ccdddd7059c58b23aae318d3bb733dcb
BLAKE2b-256 a44d07950cc70087e6b7ce6f96c1bc8474078ac33cb531bf21b799441550dad1

See more details on using hashes here.

File details

Details for the file intel_npu_acceleration_library-1.2.0-cp310-cp310-win32.whl.

File metadata

File hashes

Hashes for intel_npu_acceleration_library-1.2.0-cp310-cp310-win32.whl
Algorithm Hash digest
SHA256 94aa4c3ab24ca343e845f1b70fe54f9729cb4fb6c81b8641f9c0f8f2f0f24101
MD5 ffc629d125d80bfefcc8005f93671389
BLAKE2b-256 c1fc3e14009f81b17f8c4be76c1b522d1f5bef7f8b8f5f0efcc8086f439d1086

See more details on using hashes here.

File details

Details for the file intel_npu_acceleration_library-1.2.0-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for intel_npu_acceleration_library-1.2.0-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 4459e2555c441d02608ec7f043ec200a7afd1515626fddab840c629425e265d8
MD5 0739264b52cf58fa6b0ac9d381a7d9c7
BLAKE2b-256 3d44c6e1ac8c02d51345f1e99bb9de40342db4bdd91345d26e62ac966ab124e9

See more details on using hashes here.

File details

Details for the file intel_npu_acceleration_library-1.2.0-cp39-cp39-win32.whl.

File metadata

File hashes

Hashes for intel_npu_acceleration_library-1.2.0-cp39-cp39-win32.whl
Algorithm Hash digest
SHA256 b252b6b3d81f37c81c048a1fe5789e80e88b89d1cbc30b38c65fc9e36e5692f7
MD5 b74b273df98afb33ac9f32869c60f1fa
BLAKE2b-256 fc5d2205b5daf50ad327a8fdecf0eddea5a14d660e37b46217cd689b64c4803b

See more details on using hashes here.

File details

Details for the file intel_npu_acceleration_library-1.2.0-cp38-cp38-win_amd64.whl.

File metadata

File hashes

Hashes for intel_npu_acceleration_library-1.2.0-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 678d6fc2e0ea597e9b892ba1058e9d7c23d19749ef7f3e9adba80a94d7eb26b2
MD5 ee2988459ff94c38ab96ebb7a310b6e5
BLAKE2b-256 22be05f0d11f243b2cb7b314198f6f5ee6e31f9ee9ccccd62eeb67094fafb65a

See more details on using hashes here.

File details

Details for the file intel_npu_acceleration_library-1.2.0-cp38-cp38-win32.whl.

File metadata

File hashes

Hashes for intel_npu_acceleration_library-1.2.0-cp38-cp38-win32.whl
Algorithm Hash digest
SHA256 d51d7bd326eb5e29f609388bc9ffb8d855ebce1736d6e48d018fa27fc9c87c7a
MD5 79f3c190e9cd7fdd52c0da50bc81e53e
BLAKE2b-256 39e7ba84b16bfdc9fc453d88b69f7366c760157897ec236a67d47ab3169d80dd

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page