Highly optimized inference engine for binarized neural networks.
Project description
Larq Compute Engine
Larq Compute Engine (LCE) is a highly optimized inference engine for deploying extremely quantized neural networks, such as Binarized Neural Networks (BNNs). It currently supports various mobile platforms and has been benchmarked on a Pixel 1 phone and a Raspberry Pi. LCE provides a collection of hand-optimized TensorFlow Lite custom Ops for supported instruction sets, developed in inline assembly or in C++ using compiler intrinsics. LCE leverages optimization techniques such as tiling to maximize the number of cache hits, vectorization to maximize the computational throughput, and multi-threading parallelization to take advantage of multi-core modern desktop and mobile CPUs.
Key Features
-
Effortless end-to-end integration from training to deployment:
-
Tight integration of LCE with Larq and TensorFlow provides a smooth end-to-end training and deployment experience.
-
A collection of Larq pre-trained BNN models for common machine learning tasks is available in Larq Zoo and can be used out-of-the-box with LCE.
-
LCE provides a custom MLIR-based model converter which is fully compatible with TensorFlow Lite and performs additional network level optimizations for Larq models.
-
-
Lightning fast deployment on a variety of mobile platforms:
-
LCE enables high performance, on-device machine learning inference by providing hand-optimized kernels and network level optimizations for BNN models.
-
LCE currently supports ARM64-based mobile platforms such as Android phones and Raspberry Pi boards.
-
Thread parallelism support in LCE is essential for modern mobile devices with multi-core CPUs.
-
Performance
The table below presents single-threaded performance of Larq Compute Engine on multiple generations of Larq BNN models on the Pixel phone (2016) and (Raspberry Pi 4 BCM2711) board:
Model | Accuracy | Pixel, ms | RPi 4 (BCM2711), ms |
---|---|---|---|
TODO | TODO | TODO | TODO |
TODO | TODO | TODO | TODO |
TODO | TODO | TODO | TODO |
TODO | TODO | TODO | TODO |
The following table presents multi-threaded performance of Larq Compute Engine on a Pixel 1 phone and a Raspberry Pi 4 board:
Model | Accuracy | Pixel, ms | RPi 4 (BCM2711), ms |
---|---|---|---|
TODO | TODO | TODO | TODO |
TODO | TODO | TODO | TODO |
TODO | TODO | TODO | TODO |
TODO | TODO | TODO | TODO |
Benchmarked on February, TODO with LCE custom TFLite Model Benchmark Tool (see here) and BNN models with randomized weights and inputs.
Getting started
Follow these steps to deploy a BNN with LCE:
-
Pick a Larq model
You can use Larq to build and train your own model or pick a pre-trained model from Larq Zoo.
-
Convert the Larq model
LCE is built on top of TensorFlow Lite and uses TensorFlow Lite FlatBuffer format to convert and serialize Larq models for inference. We provide a LCE Converter with additional optimization passes to increase the speed of execution of Larq models on supported target platforms.
-
Build LCE
The LCE documentation provides the build instructions for Android and ARM64-based boards such as Raspberry Pi. Please follow the provided instructions to create a native LCE build or cross-compile for one of the supported targets.
-
Run inference
LCE uses the TensorFlow Lite Interpreter to perform an inference. In addition to the already available built-in TensorFlow Lite Ops, optimized LCE Ops are registered to the interpreter to execute the Larq specific subgraphs of the model. An example to create and build LCE compatible TensorFlow Lite interpreter in user's applications is provided here.
Next steps
- Explore Larq pre-trained models.
- Learn how to build and train BNNs for your own application with Larq.
- Try our example programs.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Hashes for larq_compute_engine-0.1.0rc1-cp37-cp37m-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 046462dfa451f62a74dcc4956e92700f658ae036cd0d7cea736e20f8fbda82c0 |
|
MD5 | 6b3352651dcc099ffe64e6d5af65aad9 |
|
BLAKE2b-256 | 460a36880089a7335708fcf1d32ea73367a81127139cabc52d3035dc4ff83c9d |
Hashes for larq_compute_engine-0.1.0rc1-cp37-cp37m-macosx_10_15_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a2be91154ee76c7027960f4fb26bb0ad2d5262139bf000723a385ef2177ce406 |
|
MD5 | 23c5b7f61a0873a72776ca3781907173 |
|
BLAKE2b-256 | 2fc39d657f2a0a233903e29e38df9d8599249bfbcae076dcabdafdfc225beb26 |
Hashes for larq_compute_engine-0.1.0rc1-cp36-cp36m-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 77d49f189e3797e22533afdec3fa538730b6786831b5d86cd457f5fe27c9052d |
|
MD5 | bded4ed2481acfdca37f5e641eb2211e |
|
BLAKE2b-256 | 194093dbac2f2b26d149fe43af46593532c4878650c84c0a9c79b0f1ff8de470 |
Hashes for larq_compute_engine-0.1.0rc1-cp36-cp36m-macosx_10_15_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3b8b777a9937d002b5b95f5c7c8b09c6a6b56728dd3c1392953e055aa05954e2 |
|
MD5 | 27588f49f0ce8797cc04656cf359efff |
|
BLAKE2b-256 | e3bf9c6457a38bc3a7fbbd7f86ef5d2421514a7c0da7f7d092131189b0bbfa8a |