Skip to main content

Intel® NPU Acceleration Library

Project description

Intel® NPU Acceleration Library

Test Style Documentation

Documentation

The Intel® NPU Acceleration Library is a Python library designed to boost the efficiency of your applications by leveraging the power of the Intel Neural Processing Unit (NPU) to perform high-speed computations on compatible hardware.

Note: The Intel® NPU Acceleration Library is currently in active development, with our team working to introduce a variety of features that are anticipated to dramatically enhance performance.

Intel NPU

The Intel NPU is an AI accelerator integrated into Intel Core Ultra processors, characterized by a unique architecture comprising compute acceleration and data transfer capabilities. Its compute acceleration is facilitated by Neural Compute Engines, which consist of hardware acceleration blocks for AI operations like Matrix Multiplication and Convolution, alongside Streaming Hybrid Architecture Vector Engines for general computing tasks.

To optimize performance, the NPU features DMA engines for efficient data transfers between system memory and a managed cache, supported by device MMU and IOMMU for security isolation. The NPU's software utilizes compiler technology to optimize AI workloads by directing compute and data flow in a tiled fashion, maximizing compute utilization primarily from scratchpad SRAM while minimizing data transfers between SRAM and DRAM for optimal performance and power efficiency.

Some useful links

  • Intel AI PC (link)
  • Intel Core Ultra Processor line (link)
  • AI Acceleration and NPU explained (video)

Feature roadmap

In our quest to significantly improve the library's performance, we are directing our efforts toward implementing a range of key features, including:

  • 8-bit quantization
  • 4-bit Quantization and GPTQ
  • NPU-Native mixed precision inference
  • Float16 support
  • BFloat16 (Brain Floating Point Format)
  • torch.compile support
  • LLM MLP horizontal fusion implementation
  • Static shape inference
  • MHA NPU inference
  • NPU/GPU hetero compute
  • Paper

Make sure to stay updated with the project's progress as these exciting enhancements are on the horizon. External contributions are very welcomed! If you want to participate in this library development, please check the Contributing guide, the developer guide and the list of open issues

Setup

Check that your system has an available NPU (how-to).

You can install the packet in your machine with

   pip install intel-npu-acceleration-library

You can also install the package on Windows and Linux from source by typing

pip install "intel-npu-acceleration-library @ git+https://github.com/intel/intel-npu-acceleration-library.git"

To build the package you need a compiler in your system (Visual Studio 2019 suggested for Windows build). MacOS is not yet supported. At the moment only Ubuntu OS is supported for Linux build. If you need a library for your specific OS, please open an issue

The library is intended to be used with Intel Core Ultra processors, which have an integrated NPU (Neural Processing Unit). For best performance please install/update the NPU drivers to the latest version. (Windows, Linux).

Usage

For implemented examples, please check the examples folder

Run a single MatMul in the NPU

from intel_npu_acceleration_library.backend import MatMul
import numpy as np

inC, outC, batch = ... # Define your own values

# Create both inputs
X1 = np.random.uniform(-1, 1, (batch, inC)).astype(np.float16)
X2 = np.random.uniform(-1, 1, (outC, inC)).astype(np.float16)

mm = MatMul(inC, outC, batch, profile=False)

result = mm.run(X1, X2)

Compile a model for the NPU

If you have pytorch>=2.0.0 installed you can use torch compile to optimize your model for the NPU

import intel_npu_acceleration_library
import torch

# Compile model for the NPU
# model a torch.nn.Module class. Model can be quantized JIT
optimized_model = torch.compile(model, backend="npu")

# Use the model as usual

In windows torch.compile is not supported yet. So you might want to use the explicit function intel_npu_acceleration_library.compile. This is true also if you use a pytorch version < 2.0.0

import intel_npu_acceleration_library
optimized_model = intel_npu_acceleration_library.compile(model, dtype=torch.int8)

# Use the model as usual

Run a Tiny-llama model on the NPU

from transformers import AutoTokenizer, TextStreamer, AutoModelForCausalLM
import intel_npu_acceleration_library
import torch

model_id = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"

model = AutoModelForCausalLM.from_pretrained(model_id, use_cache=True).eval()
tokenizer = AutoTokenizer.from_pretrained(model_id, use_default_system_prompt=True)
tokenizer.pad_token_id = tokenizer.eos_token_id
streamer = TextStreamer(tokenizer, skip_special_tokens=True)


print("Compile model for the NPU")
model = intel_npu_acceleration_library.compile(model, dtype=torch.int8)

query = input("Ask something: ")
prefix = tokenizer(query, return_tensors="pt")["input_ids"]


generation_kwargs = dict(
    input_ids=prefix,
    streamer=streamer,
    do_sample=True,
    top_k=50,
    top_p=0.9,
    max_new_tokens=512,
)

print("Run inference")
_ = model.generate(**generation_kwargs)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

intel_npu_acceleration_library-1.1.0.tar.gz (38.3 kB view details)

Uploaded Source

Built Distributions

intel_npu_acceleration_library-1.1.0-cp312-cp312-win_amd64.whl (39.3 MB view details)

Uploaded CPython 3.12 Windows x86-64

intel_npu_acceleration_library-1.1.0-cp312-cp312-win32.whl (39.3 MB view details)

Uploaded CPython 3.12 Windows x86

intel_npu_acceleration_library-1.1.0-cp311-cp311-win_amd64.whl (39.3 MB view details)

Uploaded CPython 3.11 Windows x86-64

intel_npu_acceleration_library-1.1.0-cp311-cp311-win32.whl (39.3 MB view details)

Uploaded CPython 3.11 Windows x86

intel_npu_acceleration_library-1.1.0-cp310-cp310-win_amd64.whl (39.3 MB view details)

Uploaded CPython 3.10 Windows x86-64

intel_npu_acceleration_library-1.1.0-cp310-cp310-win32.whl (39.3 MB view details)

Uploaded CPython 3.10 Windows x86

intel_npu_acceleration_library-1.1.0-cp39-cp39-win_amd64.whl (39.3 MB view details)

Uploaded CPython 3.9 Windows x86-64

intel_npu_acceleration_library-1.1.0-cp39-cp39-win32.whl (39.3 MB view details)

Uploaded CPython 3.9 Windows x86

intel_npu_acceleration_library-1.1.0-cp38-cp38-win_amd64.whl (39.3 MB view details)

Uploaded CPython 3.8 Windows x86-64

intel_npu_acceleration_library-1.1.0-cp38-cp38-win32.whl (39.3 MB view details)

Uploaded CPython 3.8 Windows x86

File details

Details for the file intel_npu_acceleration_library-1.1.0.tar.gz.

File metadata

File hashes

Hashes for intel_npu_acceleration_library-1.1.0.tar.gz
Algorithm Hash digest
SHA256 cbb145f92be5613b46d8af9a414d4bef31c9393e4cfa12fa4eafe1888b227028
MD5 481a56bc9540bb8c51741186ba08ff56
BLAKE2b-256 97dfec3c329b7a85f740d99384df92b389b1d6e2171b150f68ffd480a7bd6ee0

See more details on using hashes here.

File details

Details for the file intel_npu_acceleration_library-1.1.0-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for intel_npu_acceleration_library-1.1.0-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 2a70b2362b88629cc5a1fb63ba5d671634f3c8b953e827d0eed6d64b0ac7188b
MD5 46319bc36ec0a20d6e3e15a74365cb08
BLAKE2b-256 d64860d559601c8be9ae27e55bce0da71b943c38f9f33da48e5d9d5e62fd6761

See more details on using hashes here.

File details

Details for the file intel_npu_acceleration_library-1.1.0-cp312-cp312-win32.whl.

File metadata

File hashes

Hashes for intel_npu_acceleration_library-1.1.0-cp312-cp312-win32.whl
Algorithm Hash digest
SHA256 80fb580e1dc02ee8ccba929c299664ef8b684a230e6a9921d1d30ec6234b7a43
MD5 2968f17014886966ea395eaeb724e25c
BLAKE2b-256 80e6475f6c460ea8f466c08dadebdb4f44d5ec103c91ec0f1ddf54f474b8c8a6

See more details on using hashes here.

File details

Details for the file intel_npu_acceleration_library-1.1.0-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for intel_npu_acceleration_library-1.1.0-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 dd8d77adec11174dac96ce3a9e1dbf4dccae07e15bfe14c8b62d9cc735c7043f
MD5 0bd09d5ca9971979bf45d629597c27d9
BLAKE2b-256 99ab9ff1e43fb0c0f4cbcbab069a306ca6dfb242379cb8d51816faa881252c24

See more details on using hashes here.

File details

Details for the file intel_npu_acceleration_library-1.1.0-cp311-cp311-win32.whl.

File metadata

File hashes

Hashes for intel_npu_acceleration_library-1.1.0-cp311-cp311-win32.whl
Algorithm Hash digest
SHA256 e629b82cef97e33adf93ad71c8489e67aa2d6e61c993a6563eab4aa7d35b75de
MD5 c05ced41695574a56943e7ebbe68bb69
BLAKE2b-256 a4198d82bffecaebc7db900a405676cc70f25211ba3e470466e36f38ec0912a2

See more details on using hashes here.

File details

Details for the file intel_npu_acceleration_library-1.1.0-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for intel_npu_acceleration_library-1.1.0-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 a470918edf55b703b6032d71f6a664c49ba4911562b34739bc962fbccc4f9472
MD5 c4f07bc03a2e00ab2df30d3aaa7ef0d6
BLAKE2b-256 cc6a5c7b833b5a0dbdd42725da047077f6060549a6670fef25aae8cefd32408c

See more details on using hashes here.

File details

Details for the file intel_npu_acceleration_library-1.1.0-cp310-cp310-win32.whl.

File metadata

File hashes

Hashes for intel_npu_acceleration_library-1.1.0-cp310-cp310-win32.whl
Algorithm Hash digest
SHA256 aa53568b9dbe90b37e106bd50e48f2cb45c23a9009e35a02aeb7f35b55305ad4
MD5 89d48e873a1d381b82aa52015a73414d
BLAKE2b-256 3706dda1479838fbf74f9ec022c32f6766e44f2c0b84fe6da542fe2e7e1977e6

See more details on using hashes here.

File details

Details for the file intel_npu_acceleration_library-1.1.0-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for intel_npu_acceleration_library-1.1.0-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 c136fb85fb7560cd3d03c66f0f43d6b51e45be9d2b5b6850707c5b4fdc97b5e7
MD5 bc08bf87a7b4cd8140b2da03a9581ae5
BLAKE2b-256 7db2af639149216359724470a881abc6166ee224cf3ebd64b8f29929296451ff

See more details on using hashes here.

File details

Details for the file intel_npu_acceleration_library-1.1.0-cp39-cp39-win32.whl.

File metadata

File hashes

Hashes for intel_npu_acceleration_library-1.1.0-cp39-cp39-win32.whl
Algorithm Hash digest
SHA256 2f4b8902dacf73d3914278b77f3f16a60fb372602a6f41199cad4a4f2d9b4cc9
MD5 d716cb9cad648287053b943d1dc76dc6
BLAKE2b-256 f80e546bee9461a400ba94dee3ec09a9111a492fb29512ee0e6f61b13db6ebb1

See more details on using hashes here.

File details

Details for the file intel_npu_acceleration_library-1.1.0-cp38-cp38-win_amd64.whl.

File metadata

File hashes

Hashes for intel_npu_acceleration_library-1.1.0-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 d50df9ec0ef465a9ee90fd50512383b060bd761d13bb09a35eb5180d7cf6eb2c
MD5 a77ed5627047d786102a678da9cda48f
BLAKE2b-256 690537b8aa60ca7d2033d1ca5882ef044095ef1a9c401d13153ad04b6d76381b

See more details on using hashes here.

File details

Details for the file intel_npu_acceleration_library-1.1.0-cp38-cp38-win32.whl.

File metadata

File hashes

Hashes for intel_npu_acceleration_library-1.1.0-cp38-cp38-win32.whl
Algorithm Hash digest
SHA256 80e0b87379d2a4df1bec3c293447c1a97c8824f106ca7a6df3ff00e2ae55ef6e
MD5 ebab5690d82f7268a837f804b8366e84
BLAKE2b-256 b79eb45e80900662c102003f21f4135acd160e522d178265e2abac739e3258af

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page