Python bindings for the ARA-2 neural accelerator client library
Project description
Python Bindings for ARA-2
Python bindings for the ARA-2 neural network accelerator client library, providing efficient NPU inference from Python via a proxy service running on NXP i.MX platforms with Kinara ARA-2 hardware.
Published to PyPI as edgefirst-ara2.
Architecture
Python Application ──(UNIX/TCP socket)──▶ ara2-proxy ──(PCIe)──▶ ARA-2 NPU
│ (system service) (Kinara hardware)
│
edgefirst-hal ──(DMA-BUF fd)──▶ GPU preprocessing (zero-copy)
Your Python code connects to the ara2-proxy system service (not directly
to the hardware). The proxy manages device access and must be running before
your application starts.
Installation
From PyPI
pip install edgefirst-ara2
For zero-copy preprocessing with edgefirst-hal:
pip install edgefirst-ara2[hal]
Prerequisites for Development
- Python 3.11 or higher
- Rust stable toolchain (edition 2024)
- maturin (
pip install maturin) - ARA-2 client library (
libaraclient.so.1)
Development Install
cd crates/ara2-py
maturin develop --release --features abi3
Quick Start
import edgefirst_ara2
# Connect to ARA-2 proxy
session = edgefirst_ara2.Session.create_via_unix_socket("/var/run/ara2.sock")
# Get version information
versions = session.versions()
print(f"Proxy version: {versions['proxy']}")
# List endpoints
endpoints = session.list_endpoints()
print(f"Found {len(endpoints)} endpoints")
# Check endpoint status
for endpoint in endpoints:
state = endpoint.check_status()
stats = endpoint.dram_statistics()
print(f"State: {state}, Free DRAM: {stats.free_size / stats.dram_size * 100:.1f}%")
Inference with numpy
import numpy as np
import edgefirst_ara2
session = edgefirst_ara2.Session.create_via_unix_socket("/var/run/ara2.sock")
endpoints = session.list_endpoints()
model = endpoints[0].load_model("model.dvm")
# Allocate tensors and run inference
model.allocate_tensors()
input_data = np.zeros(model.input_size(0), dtype=np.uint8)
model.set_input_tensor(0, input_data)
timing = model.run()
print(f"Inference: {timing.run_time_us} us")
output = model.get_output_tensor(0)
dequantized = model.dequantize(0)
Zero-Copy DMA-BUF Pipeline
For maximum throughput, use DMA-BUF tensors with edgefirst-hal for GPU-accelerated preprocessing. This eliminates CPU memory copies between preprocessing and inference:
| Path | CPU copies | Flow |
|---|---|---|
| Standard (numpy) | 2 | numpy → shared memory → NPU |
| DMA-BUF | 0 | GPU writes directly to NPU input buffer |
How it works: allocate_tensors("dma") allocates the model's input tensor
in a DMA-BUF — a Linux kernel buffer accessible by multiple hardware devices.
input_tensor_fd(0) returns a file descriptor to that buffer. You pass this
FD to edgefirst_hal.import_image(), which maps it as a GPU image surface.
The GPU writes the preprocessed frame directly into the NPU's input buffer —
no CPU copies involved.
import os
import edgefirst_ara2 as ara2
import edgefirst_hal as hal
session = ara2.Session.create_via_unix_socket(ara2.DEFAULT_SOCKET)
endpoint = session.list_endpoints()[0]
with endpoint.load_model("yolov8s.dvm") as model:
model.allocate_tensors("dma") # Must use "dma" for tensor FD access
# Get DMA-BUF FD for the model's input tensor
input_fd = model.input_tensor_fd(0)
c, h, w = model.input_shape(0)
try:
# Import as PlanarRgb (CHW layout) to match ARA-2 tensor format
dst = hal.import_image(input_fd, w, h, hal.PixelFormat.PlanarRgb)
finally:
os.close(input_fd) # FD duplicated by import_image; close original
# GPU-accelerated convert: camera frame -> model input (zero CPU copies)
processor = hal.ImageProcessor()
src = hal.load_image("image.jpg", format=hal.PixelFormat.Rgba, mem=hal.TensorMemory.DMA)
processor.convert(src, dst)
# Run inference — NPU reads from the same DMA-BUF
timing = model.run()
print(f"Inference: {timing.run_time_us} us")
Performance
Benchmarked on NXP FRDM i.MX 95 + ARA-2 with YOLOv8m-seg (640×640). The Python API adds minimal overhead over native Rust thanks to DMA-BUF zero-copy — GPU and NPU operate on the same physical memory buffers.
| Stage | Rust | Python | Overhead |
|---|---|---|---|
| GPU preprocess (letterbox + RGBA→CHW) | 2.85 ms | 2.88 ms | +0.03 ms |
| NPU inference (wall clock) | 34.53 ms | 34.63 ms | +0.10 ms |
| NPU execution | 26.04 ms | 26.04 ms | — |
| DMA input upload | 2.02 ms | 2.05 ms | — |
| DMA output download | 3.68 ms | 3.68 ms | — |
| Decode (NMS + dequant) | 4.05 ms | 4.31 ms | +0.26 ms |
| Materialize (CPU coeff × proto → bitmaps) | 5.67 ms | 5.98 ms | +0.31 ms |
| Draw (GL mask overlay) | 5.54 ms | 5.71 ms | +0.17 ms |
| Total pipeline | 52.64 ms | 53.52 ms | +0.88 ms |
| Throughput | 19.0 FPS | 18.7 FPS |
Steady-state mean over 30 iterations after warmup. Python overhead is under 1 ms across the entire pipeline.
Run the benchmark yourself. Create a virtual environment on the target and install the packages from PyPI:
python3 -m venv ~/venv
~/venv/bin/pip install edgefirst-ara2 edgefirst-hal
~/venv/bin/python3 yolov8.py model.dvm image.jpg --benchmark 30 --save
DVM Metadata
Read model metadata without loading onto the NPU:
import edgefirst_ara2
metadata = edgefirst_ara2.read_metadata("model.dvm")
if metadata:
print(f"Task: {metadata.task}")
print(f"Classes: {metadata.classes}")
if metadata.compilation and metadata.compilation.ppa:
print(f"IPS: {metadata.compilation.ppa.ips}")
labels = edgefirst_ara2.read_labels("model.dvm")
API Reference
Session
Connection to the ARA-2 proxy service.
Static Methods:
create_via_unix_socket(socket_path: str) -> Sessioncreate_via_tcp_ipv4_socket(ip: str, port: int) -> Session
Methods:
versions() -> dict[str, str]- Get component versionslist_endpoints() -> list[Endpoint]- List available endpoints
Properties:
socket_type: str- "unix" or "tcp"
Endpoint
Represents an ARA-2 accelerator device.
Methods:
check_status() -> State- Get device statedram_statistics() -> DramStatistics- Get memory usageload_model(model_path: str) -> Model- Load a .dvm model
Model
Loaded neural network model.
Lifecycle:
allocate_tensors(memory: str | None = None)- Allocate tensors ("dma", "shm", "mem", or None)set_timeout_ms(timeout_ms: int)- Set inference timeoutrun() -> ModelTiming- Execute inference
Tensor I/O (numpy):
set_input_tensor(index: int, data: np.ndarray)- Copy data into inputget_output_tensor(index: int) -> np.ndarray- Copy output data outdequantize(index: int) -> np.ndarray- Dequantize output to float32
DMA-BUF Zero-Copy:
input_tensor_fd(index: int) -> int- Get input tensor FD (pass tohal.ImageProcessor.import_image, which dups it — close after)output_tensor_fd(index: int) -> int- Get output tensor FD (pass tohal.Tensor.from_fd, which takes ownership — do not close after)input_tensor_memory(index: int) -> str- Input memory typeoutput_tensor_memory(index: int) -> str- Output memory type
Introspection:
n_inputs: int,n_outputs: int- Tensor countsinput_shape(i) -> (C, H, W),output_shape(i) -> (C, H, W)input_size(i) -> int,output_size(i) -> int- Size in bytesinput_bpp(i) -> int,output_bpp(i) -> int- Bytes per elementinput_info(i) -> InputTensorInfo,output_info(i) -> OutputTensorInfoinput_quants(i) -> InputQuantization,output_quants(i) -> OutputQuantization
Metadata Functions
read_metadata(path: str) -> DvmMetadata | Noneread_labels(path: str) -> list[str]has_metadata(path: str) -> bool
Supporting Types
- State (enum): Init, Idle, Active, ActiveSlow, ActiveBoosted, ThermalInactive, ThermalUnknown, Inactive, Fault
- ModelOutputType (enum): Classification, Detection, SemanticSegmentation, Raw
- DramStatistics: dram_size, free_size, model_occupancy_size, ...
- ModelTiming: run_time_us, input_time_us, output_time_us
- InputQuantization: qn, scale, mean, is_signed
- OutputQuantization: qn, scale, offset, is_signed
Exceptions
Ara2Error (RuntimeError)
+-- LibraryError - libaraclient.so loading failures
+-- HardwareError - NPU faults, endpoint errors
+-- ProxyError - Proxy connection failures
+-- ModelError - Model load/inference failures
+-- TensorError - Tensor allocation, DMA-BUF errors
+-- MetadataError - DVM metadata parsing errors
Building Wheels
cd crates/ara2-py
maturin build --release --features abi3
Wheels are created in target/wheels/.
Stable ABI
The bindings use PyO3's stable ABI (abi3-py311):
- A single wheel works across Python 3.11, 3.12, 3.13, and future versions
- Minimum supported Python version is 3.11
Troubleshooting
"libaraclient.so.1 not found"
export LD_LIBRARY_PATH=/path/to/ara2/lib:$LD_LIBRARY_PATH
Verify Installation
python -c "import edgefirst_ara2; print(edgefirst_ara2.__version__)"
License
Licensed under the Apache License 2.0.
Copyright 2025 Au-Zone Technologies. All Rights Reserved.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file edgefirst_ara2-0.3.0.tar.gz.
File metadata
- Download URL: edgefirst_ara2-0.3.0.tar.gz
- Upload date:
- Size: 93.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b6846fabdc53d7d6fc9e3e6e624af3535f1a33f5778f695f7e766aea6d9e98b5
|
|
| MD5 |
10ecec902017f604dc2537f74993903e
|
|
| BLAKE2b-256 |
c590a49481a864e13369585c4e7670a1a44b7065170c3a664eba13c287befce6
|
Provenance
The following attestation bundles were made for edgefirst_ara2-0.3.0.tar.gz:
Publisher:
release.yml on EdgeFirstAI/ara2-rs
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
edgefirst_ara2-0.3.0.tar.gz -
Subject digest:
b6846fabdc53d7d6fc9e3e6e624af3535f1a33f5778f695f7e766aea6d9e98b5 - Sigstore transparency entry: 1280191597
- Sigstore integration time:
-
Permalink:
EdgeFirstAI/ara2-rs@7d12c4a41653a62c747680c1ba1f329b4e982811 -
Branch / Tag:
refs/tags/v0.3.0 - Owner: https://github.com/EdgeFirstAI
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@7d12c4a41653a62c747680c1ba1f329b4e982811 -
Trigger Event:
push
-
Statement type:
File details
Details for the file edgefirst_ara2-0.3.0-cp311-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: edgefirst_ara2-0.3.0-cp311-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 548.5 kB
- Tags: CPython 3.11+, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
526aa7bcc17d57add4cec0641101a0ffde95be1121f5b656b2e84fcb923a5adc
|
|
| MD5 |
279fcc6199cbfb5fadb52bcae277effd
|
|
| BLAKE2b-256 |
7a8592b683d5feefc53831583b5e95dc25014c035b5320510d1b75c8cc2b2631
|
Provenance
The following attestation bundles were made for edgefirst_ara2-0.3.0-cp311-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:
Publisher:
release.yml on EdgeFirstAI/ara2-rs
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
edgefirst_ara2-0.3.0-cp311-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl -
Subject digest:
526aa7bcc17d57add4cec0641101a0ffde95be1121f5b656b2e84fcb923a5adc - Sigstore transparency entry: 1280191600
- Sigstore integration time:
-
Permalink:
EdgeFirstAI/ara2-rs@7d12c4a41653a62c747680c1ba1f329b4e982811 -
Branch / Tag:
refs/tags/v0.3.0 - Owner: https://github.com/EdgeFirstAI
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@7d12c4a41653a62c747680c1ba1f329b4e982811 -
Trigger Event:
push
-
Statement type:
File details
Details for the file edgefirst_ara2-0.3.0-cp311-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.
File metadata
- Download URL: edgefirst_ara2-0.3.0-cp311-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 529.0 kB
- Tags: CPython 3.11+, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a16c271aea375e5412a58b8f6ea0c8e897f1e5619e2fcc23b69df8a6aa65e5a0
|
|
| MD5 |
ce0d482d37b81112bbf778300763e6f4
|
|
| BLAKE2b-256 |
5ed7c616164d6666033920f76f5a6b30c1b025bf3f82aa83062a0641aae0ad26
|
Provenance
The following attestation bundles were made for edgefirst_ara2-0.3.0-cp311-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:
Publisher:
release.yml on EdgeFirstAI/ara2-rs
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
edgefirst_ara2-0.3.0-cp311-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl -
Subject digest:
a16c271aea375e5412a58b8f6ea0c8e897f1e5619e2fcc23b69df8a6aa65e5a0 - Sigstore transparency entry: 1280191599
- Sigstore integration time:
-
Permalink:
EdgeFirstAI/ara2-rs@7d12c4a41653a62c747680c1ba1f329b4e982811 -
Branch / Tag:
refs/tags/v0.3.0 - Owner: https://github.com/EdgeFirstAI
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@7d12c4a41653a62c747680c1ba1f329b4e982811 -
Trigger Event:
push
-
Statement type: