Python API for TFLite inference with EdgeFirst extensions
Project description
edgefirst-tflite
Drop-in replacement Python API for TensorFlow Lite inference with EdgeFirst extensions for DMA-BUF zero-copy, NPU-accelerated camera preprocessing, and model metadata extraction.
The core inference API (Interpreter, get_input_details, get_output_details,
set_tensor, invoke, get_output_tensor) is compatible with the standard
tflite_runtime.interpreter.Interpreter, so existing TFLite Python code works
with minimal changes. On top of the standard API, edgefirst-tflite exposes
NXP i.MX platform extensions for DMA-BUF zero-copy and CameraAdaptor NPU
preprocessing that require only a few extra lines of code.
Built on the edgefirst-tflite Rust crate with native performance via PyO3.
Installation
pip install edgefirst-tflite
Requires Python 3.9+ and NumPy 1.24+. The package ships as a native wheel
with the TFLite runtime loaded dynamically at startup — no separate TFLite
installation is needed as long as libtensorflowlite_c.so is available on
the system library path.
To specify a custom library path:
interp = Interpreter(model_path="model.tflite", library_path="/usr/lib/libtensorflowlite_c.so")
Quick Start
import numpy as np
from edgefirst_tflite import Interpreter
# Load model and inspect tensors
interp = Interpreter(model_path="model.tflite", num_threads=4)
print(interp.get_input_details())
print(interp.get_output_details())
# Run inference
input_data = np.array([[1.0, 2.0, 3.0, 4.0]], dtype=np.float32)
interp.set_tensor(0, input_data)
interp.invoke()
output = interp.get_output_tensor(0)
print(output)
TFLite API Compatibility
The Interpreter class is designed as a drop-in replacement for
tflite_runtime.interpreter.Interpreter. The core inference path is
compatible:
| Method | Description |
|---|---|
Interpreter(model_path=, model_content=, num_threads=, experimental_delegates=) |
Load a model |
allocate_tensors() |
Re-allocate tensors (required after resize_tensor_input) |
resize_tensor_input(input_index, tensor_size) |
Resize an input tensor |
invoke() |
Run inference |
get_input_details() / get_output_details() |
Tensor metadata dicts |
get_input_tensor(index) / get_output_tensor(index) |
Copy tensor data to NumPy |
set_tensor(input_index, value) |
Copy NumPy data into an input tensor |
tensor(index) |
Zero-copy NumPy view (callable returning array) |
Note on tensor indices: get_input_tensor, get_output_tensor, and
set_tensor use 0-based indices relative to the input or output tensor
lists. The "index" field returned by get_input_details() /
get_output_details() matches these relative indices.
Hardware Acceleration with Delegates
Delegates provide hardware acceleration (e.g., NPU offload via VxDelegate on NXP i.MX platforms):
from edgefirst_tflite import Interpreter, load_delegate
delegate = load_delegate("libvx_delegate.so", options={
"cache_file_path": "/tmp/vx_cache",
})
interp = Interpreter(
model_path="model.tflite",
experimental_delegates=[delegate],
)
interp.invoke()
XNNPACK (CPU Acceleration)
XNNPACK accelerates floating-point and quantised models on ARM and x86 CPUs
using SIMD instructions. No external delegate library is needed — XNNPACK is
built into the TFLite library when compiled with -DTFLITE_ENABLE_XNNPACK=ON.
from edgefirst_tflite import Interpreter, xnnpack_delegate
delegate = xnnpack_delegate(num_threads=4)
interp = Interpreter(
model_path="model.tflite",
experimental_delegates=[delegate],
)
interp.invoke()
Note: If you use
library_path=on theInterpreter, pass the same path toxnnpack_delegate(library_path=...)so both use the same TFLite shared library.
EdgeFirst Extensions
DMA-BUF Zero-Copy Inference
DMA-BUF enables zero-copy data transfer between camera, CPU, and NPU by binding DMA-BUF file descriptors directly to TFLite tensors. This eliminates memory copies in the inference pipeline.
Import mode — register an externally-allocated DMA-BUF (e.g., from V4L2 camera capture):
from edgefirst_tflite import Interpreter, load_delegate
delegate = load_delegate("libvx_delegate.so")
interp = Interpreter(model_path="model.tflite", experimental_delegates=[delegate])
dmabuf = interp.dmabuf()
if dmabuf and dmabuf.is_supported():
# Register a DMA-BUF fd from the camera driver
handle = dmabuf.register(camera_fd, buffer_size, sync_mode="none")
dmabuf.bind_to_tensor(handle, tensor_index=0)
# Run inference — data flows camera → NPU with zero CPU copies
interp.invoke()
output = interp.get_output_tensor(0)
# Cleanup
dmabuf.unregister(handle)
Export mode — let the delegate allocate DMA-BUF buffers:
dmabuf = interp.dmabuf()
handle, desc = dmabuf.request(tensor_index=0, ownership="delegate")
print(f"Allocated buffer: fd={desc['fd']}, size={desc['size']}")
dmabuf.bind_to_tensor(handle, tensor_index=0)
interp.invoke()
dmabuf.release(handle)
Buffer cycling for multi-buffer pipelines (e.g., triple-buffering with V4L2):
handles = [dmabuf.register(fd, size) for fd, size in camera_buffers]
for frame in camera_stream:
dmabuf.set_active(tensor_index=0, handle=handles[frame.index])
interp.invoke()
result = interp.get_output_tensor(0)
Cache synchronization for coherent CPU access:
dmabuf.begin_cpu_access(handle, mode="read")
# ... read tensor data on CPU ...
dmabuf.end_cpu_access(handle, mode="read")
dmabuf.sync_for_device(handle) # Before NPU access
dmabuf.sync_for_cpu(handle) # Before CPU access
CameraAdaptor — NPU-Accelerated Preprocessing
CameraAdaptor offloads camera format conversion (e.g., RGBA → RGB, YUV → RGB) to the NPU, eliminating CPU-side preprocessing. The conversion is injected directly into the TIM-VX inference graph.
delegate = load_delegate("libvx_delegate.so")
# Configure BEFORE building the interpreter — CameraAdaptor modifies
# the delegate's graph compilation
adaptor = delegate.camera_adaptor
if adaptor:
# Simple format conversion: camera sends RGBA, model expects RGB
adaptor.set_format(tensor_index=0, format="rgba")
Format conversion with resize and letterboxing:
adaptor.set_format_ex(
tensor_index=0,
format="rgba",
width=1920,
height=1080,
letterbox=True,
letterbox_color=0,
)
Explicit camera and model format specification:
adaptor.set_formats(
tensor_index=0,
camera_format="rgba",
model_format="rgb",
)
Query format capabilities:
adaptor.is_supported("rgba") # True
adaptor.input_channels("rgba") # 4
adaptor.output_channels("rgba") # 3
adaptor.fourcc("rgba") # "RGBP" (V4L2 FourCC)
adaptor.from_fourcc("NV12") # "nv12"
Model Metadata
Extract metadata embedded in TFLite model files:
interp = Interpreter(model_path="model.tflite")
meta = interp.get_metadata()
if meta:
print(f"Model: {meta.name}")
print(f"Version: {meta.version}")
print(f"Author: {meta.author}")
print(f"License: {meta.license}")
print(f"Description: {meta.description}")
Zero-Copy Tensor Views
The tensor() method returns a callable that produces a NumPy array
sharing memory with the TFLite C-allocated buffer:
# Get a zero-copy accessor for output tensor 1
# (index = input_count + output_offset)
accessor = interp.tensor(interp.input_count + 0)
interp.invoke()
view = accessor() # Zero-copy NumPy view of the output
print(view) # Reflects the latest inference results
interp.invoke()
view = accessor() # Updated in-place — no copy needed
The accessor is invalidated by allocate_tensors() or
resize_tensor_input(). Call tensor() again to get a fresh one.
YOLOv8 Example
A complete YOLOv8 detection and segmentation example is included at
examples/yolov8/python/yolov8.py, demonstrating the full pipeline
with edgefirst-tflite + edgefirst-hal:
# Detection on i.MX8MP with VxDelegate
python yolov8.py yolov8n-int8.tflite zidane.jpg \
--delegate /usr/lib/libvx_delegate.so --warmup 3 --iters 10 --save
# Segmentation on i.MX95 with Neutron
python yolov8.py yolov8n-seg-int8.imx95.tflite zidane.jpg \
--delegate /usr/lib/libneutron_delegate.so --warmup 3 --iters 10 --save
The example supports --warmup N and --iters N for benchmarking with
min/max/avg/p95/p99 statistics. Image loading and preprocessing run once;
only inference, decoding, and rendering are timed per iteration.
Performance
Benchmarked with YOLOv8n int8 models on zidane.jpg (1280x720).
Both Rust and Python use the same underlying TFLite C API and NPU
delegates — the Python overhead from PyO3 FFI is negligible.
i.MX 8M Plus (VxDelegate NPU)
| Test | Rust | Python | Overhead |
|---|---|---|---|
| Detection (infer avg) | 69.9ms | 69.6ms | ~0% |
| Segmentation (infer avg) | 84.2ms | 83.8ms | ~0% |
| Detection CPU-only | 482.5ms | 484.9ms | ~0.5% |
VxDelegate NPU speedup: ~7x over CPU. DMA-BUF zero-copy and CameraAdaptor RGBA→RGB conversion active.
i.MX 95 (Neutron NPU)
| Test | Rust | Python | Overhead |
|---|---|---|---|
| Detection (infer avg) | 46.2ms | 46.4ms | ~0.4% |
| Segmentation (infer avg) | 49.9ms | 49.3ms | ~0% |
| Detection CPU-only | 266.6ms | 266.1ms | ~0% |
Neutron NPU speedup: ~5.8x over CPU. No first-run compilation overhead (Neutron models are pre-compiled).
Key Observations
- Python overhead is negligible — inference time is dominated by TFLite/NPU execution, not the Python↔Rust FFI boundary
- Same detections — both Rust and Python produce identical results (2-3 objects: persons + tie in the reference image)
- 10-iteration benchmarks with 3 warmup iterations,
--saveenabled (includes overlay rendering)
Error Handling
from edgefirst_tflite import (
TfLiteError, # Base exception
LibraryError, # TFLite library not found
DelegateError, # Delegate error status
InvalidArgumentError, # Bad arguments (index out of range, etc.)
)
try:
interp = Interpreter(model_path="missing.tflite")
except InvalidArgumentError as e:
print(f"Bad argument: {e}")
except LibraryError as e:
print(f"Library not found: {e}")
except TfLiteError as e:
print(f"TFLite error: {e}")
Platform Support
| Platform | Architecture | Delegate | DMA-BUF | CameraAdaptor |
|---|---|---|---|---|
| i.MX 8M Plus | aarch64 | VxDelegate | Yes | Yes |
| i.MX 95 | aarch64 | Neutron | No | No |
| Linux | x86_64 | CPU only | No | No |
| macOS | arm64 | CPU only | No | No |
| Windows | x86_64 | CPU only | No | No |
License
Apache-2.0. See LICENSE.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file edgefirst_tflite-0.5.0.tar.gz.
File metadata
- Download URL: edgefirst_tflite-0.5.0.tar.gz
- Upload date:
- Size: 301.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4a4453d96f8c083e15d4725a65dabf5790b4f59941db1a92183eb386f64d0da5
|
|
| MD5 |
6c468af8275539e8281f6492c5c1bbdd
|
|
| BLAKE2b-256 |
5dc584026c185fcfe0bc674925b93ffe3d5cc0849f170df2676f302770c16274
|
Provenance
The following attestation bundles were made for edgefirst_tflite-0.5.0.tar.gz:
Publisher:
release.yml on EdgeFirstAI/tflite-rs
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
edgefirst_tflite-0.5.0.tar.gz -
Subject digest:
4a4453d96f8c083e15d4725a65dabf5790b4f59941db1a92183eb386f64d0da5 - Sigstore transparency entry: 1391713245
- Sigstore integration time:
-
Permalink:
EdgeFirstAI/tflite-rs@6c3cf2cdea7e207e4086bed7b9279b69ecf2ad1d -
Branch / Tag:
refs/tags/v0.5.0 - Owner: https://github.com/EdgeFirstAI
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@6c3cf2cdea7e207e4086bed7b9279b69ecf2ad1d -
Trigger Event:
push
-
Statement type:
File details
Details for the file edgefirst_tflite-0.5.0-cp38-abi3-win_amd64.whl.
File metadata
- Download URL: edgefirst_tflite-0.5.0-cp38-abi3-win_amd64.whl
- Upload date:
- Size: 421.6 kB
- Tags: CPython 3.8+, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
701df2a7fb8e3cb2476ceec6df1a4362a5c05cf1544dbf869a744929c83e8e92
|
|
| MD5 |
8415f637797e8fdb63b149a7129b3283
|
|
| BLAKE2b-256 |
f21fad94314170a6b1329e946a4589a2fb87c31c3018ef23cecdf3f8d5f26cc5
|
Provenance
The following attestation bundles were made for edgefirst_tflite-0.5.0-cp38-abi3-win_amd64.whl:
Publisher:
release.yml on EdgeFirstAI/tflite-rs
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
edgefirst_tflite-0.5.0-cp38-abi3-win_amd64.whl -
Subject digest:
701df2a7fb8e3cb2476ceec6df1a4362a5c05cf1544dbf869a744929c83e8e92 - Sigstore transparency entry: 1391713259
- Sigstore integration time:
-
Permalink:
EdgeFirstAI/tflite-rs@6c3cf2cdea7e207e4086bed7b9279b69ecf2ad1d -
Branch / Tag:
refs/tags/v0.5.0 - Owner: https://github.com/EdgeFirstAI
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@6c3cf2cdea7e207e4086bed7b9279b69ecf2ad1d -
Trigger Event:
push
-
Statement type:
File details
Details for the file edgefirst_tflite-0.5.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: edgefirst_tflite-0.5.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 487.7 kB
- Tags: CPython 3.8+, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0bcdeb173fc1f2cecaaa1a649c193b144a67e9ad2ee8db80d1ff8072a07e75c3
|
|
| MD5 |
fc92d64033a51a8c2a05a57d3b9394b6
|
|
| BLAKE2b-256 |
170ab7835d1aa1635f5c80c13e2f070e8f74df0276ee754caecc233996ba639f
|
Provenance
The following attestation bundles were made for edgefirst_tflite-0.5.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:
Publisher:
release.yml on EdgeFirstAI/tflite-rs
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
edgefirst_tflite-0.5.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl -
Subject digest:
0bcdeb173fc1f2cecaaa1a649c193b144a67e9ad2ee8db80d1ff8072a07e75c3 - Sigstore transparency entry: 1391713308
- Sigstore integration time:
-
Permalink:
EdgeFirstAI/tflite-rs@6c3cf2cdea7e207e4086bed7b9279b69ecf2ad1d -
Branch / Tag:
refs/tags/v0.5.0 - Owner: https://github.com/EdgeFirstAI
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@6c3cf2cdea7e207e4086bed7b9279b69ecf2ad1d -
Trigger Event:
push
-
Statement type:
File details
Details for the file edgefirst_tflite-0.5.0-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.
File metadata
- Download URL: edgefirst_tflite-0.5.0-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 481.0 kB
- Tags: CPython 3.8+, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
13b4a0edff1274d9c3f657ef6fc9ac0dc68530e48a70f2c715cf0c822822eef5
|
|
| MD5 |
da704ca72fededf39c6d35df59862402
|
|
| BLAKE2b-256 |
668a63f92c3fe5d169167237ac110b6d8035291ee1d44b300d99aae029eca702
|
Provenance
The following attestation bundles were made for edgefirst_tflite-0.5.0-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:
Publisher:
release.yml on EdgeFirstAI/tflite-rs
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
edgefirst_tflite-0.5.0-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl -
Subject digest:
13b4a0edff1274d9c3f657ef6fc9ac0dc68530e48a70f2c715cf0c822822eef5 - Sigstore transparency entry: 1391713293
- Sigstore integration time:
-
Permalink:
EdgeFirstAI/tflite-rs@6c3cf2cdea7e207e4086bed7b9279b69ecf2ad1d -
Branch / Tag:
refs/tags/v0.5.0 - Owner: https://github.com/EdgeFirstAI
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@6c3cf2cdea7e207e4086bed7b9279b69ecf2ad1d -
Trigger Event:
push
-
Statement type:
File details
Details for the file edgefirst_tflite-0.5.0-cp38-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: edgefirst_tflite-0.5.0-cp38-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 516.6 kB
- Tags: CPython 3.8+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bfda8cdc83c56b5e7396b972a216cfa8c22c23dc142d4c551232baaf66f210d3
|
|
| MD5 |
18c6d6047f7d9e92e23ee1dfd8da6644
|
|
| BLAKE2b-256 |
87affa78ac66740c9cde5b802c37dc52f08eca33cdcaa9fd8174df648016b6d4
|
Provenance
The following attestation bundles were made for edgefirst_tflite-0.5.0-cp38-abi3-macosx_11_0_arm64.whl:
Publisher:
release.yml on EdgeFirstAI/tflite-rs
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
edgefirst_tflite-0.5.0-cp38-abi3-macosx_11_0_arm64.whl -
Subject digest:
bfda8cdc83c56b5e7396b972a216cfa8c22c23dc142d4c551232baaf66f210d3 - Sigstore transparency entry: 1391713277
- Sigstore integration time:
-
Permalink:
EdgeFirstAI/tflite-rs@6c3cf2cdea7e207e4086bed7b9279b69ecf2ad1d -
Branch / Tag:
refs/tags/v0.5.0 - Owner: https://github.com/EdgeFirstAI
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@6c3cf2cdea7e207e4086bed7b9279b69ecf2ad1d -
Trigger Event:
push
-
Statement type: