Skip to main content

Highly optimized inference engine for binarized neural networks.

Project description

Larq Compute Engine larq logo

Tests PyPI - Python Version PyPI PyPI - License

Larq Compute Engine (LCE) is a highly optimized inference engine for deploying extremely quantized neural networks, such as Binarized Neural Networks (BNNs). It currently supports various mobile platforms and has been benchmarked on a Pixel 1 phone and a Raspberry Pi. LCE provides a collection of hand-optimized TensorFlow Lite custom operators for supported instruction sets, developed in inline assembly or in C++ using compiler intrinsics. LCE leverages optimization techniques such as tiling to maximize the number of cache hits, vectorization to maximize the computational throughput, and multi-threading parallelization to take advantage of multi-core modern desktop and mobile CPUs.

Larq Compute Engine is part of a family of libraries for BNN development; you can also check out Larq for building and training BNNs and Larq Zoo for pre-trained models.

Key Features

  • Effortless end-to-end integration from training to deployment:

    • Tight integration of LCE with Larq and TensorFlow provides a smooth end-to-end training and deployment experience.

    • A collection of Larq pre-trained BNN models for common machine learning tasks is available in Larq Zoo and can be used out-of-the-box with LCE.

    • LCE provides a custom MLIR-based model converter which is fully compatible with TensorFlow Lite and performs additional network level optimizations for Larq models.

  • Lightning fast deployment on a variety of mobile platforms:

    • LCE enables high performance, on-device machine learning inference by providing hand-optimized kernels and network level optimizations for BNN models.

    • LCE currently supports 64-bit ARM-based mobile platforms such as Android phones and Raspberry Pi boards.

    • Thread parallelism support in LCE is essential for modern mobile devices with multi-core CPUs.

Performance

The table below presents single-threaded performance of Larq Compute Engine on different versions of a novel BNN model called QuickNet (trained on ImageNet dataset, released on Larq Zoo) on a Raspberry Pi 4 Model B at 1.5GHz (BCM2711) board, a Pixel 1 Android phone (2016), and a Mac Mini with M1 ARM CPU:

Model Top-1 Accuracy RPi 4B 1.5GHz, 1 thread (ms) Pixel 1, 1 thread (ms) Mac Mini M1, 1 thread (ms)
QuickNetSmall 59.4% 27.7 16.8 4.0
QuickNet 63.3% 45.0 25.5 5.8
QuickNetLarge 66.9% 77.0 44.2 9.9

For reference, dabnn (the other main BNN library) reports an inference time of 61.3 ms for Bi-RealNet (56.4% accuracy) on the Pixel 1 phone, while LCE achieves an inference time of 41.6 ms for Bi-RealNet on the same device. They furthermore present a modified version, BiRealNet-Stem, which achieves the same accuracy of 56.4% in 43.2 ms.

The following table presents multi-threaded performance of Larq Compute Engine on a Pixel 1 phone and a Raspberry Pi 4 Model B at 1.5GHz (BCM2711) board:

Model Top-1 Accuracy RPi 4B 1.5GHz, 4 threads (ms) Pixel 1, 4 threads (ms) Mac Mini M1, 4 threads (ms)
QuickNetSmall 59.4% 12.1 8.9 1.8
QuickNet 63.3% 20.8 12.6 2.5
QuickNetLarge 66.9% 31.7 22.8 3.9

Benchmarked on 2021-06-11 (Pixel 1), 2021-06-13 (Mac Mini M1), and 2022-04-20 (RPi 4B) with LCE custom TFLite Model Benchmark Tool (see here) with XNNPack enabled and BNN models with randomized inputs.

Getting started

Follow these steps to deploy a BNN with LCE:

  1. Pick a Larq model

    You can use Larq to build and train your own model or pick a pre-trained model from Larq Zoo.

  2. Convert the Larq model

    LCE is built on top of TensorFlow Lite and uses the TensorFlow Lite FlatBuffer format to convert and serialize Larq models for inference. We provide an LCE Converter with additional optimization passes to increase the speed of execution of Larq models on supported target platforms.

  3. Build LCE

    The LCE documentation provides the build instructions for Android and 64-bit ARM-based boards such as Raspberry Pi. Please follow the provided instructions to create a native LCE build or cross-compile for one of the supported targets.

  4. Run inference

    LCE uses the TensorFlow Lite Interpreter to perform an inference. In addition to the already available built-in TensorFlow Lite operators, optimized LCE operators are registered to the interpreter to execute the Larq specific subgraphs of the model. An example to create and build an LCE compatible TensorFlow Lite interpreter for your own applications is provided here.

Next steps

About

Larq Compute Engine is being developed by a team of deep learning researchers and engineers at Plumerai to help accelerate both our own research and the general adoption of Binarized Neural Networks.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

larq_compute_engine-0.16.0-cp312-cp312-win_amd64.whl (43.8 MB view details)

Uploaded CPython 3.12 Windows x86-64

larq_compute_engine-0.16.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (58.3 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.17+ x86-64

larq_compute_engine-0.16.0-cp312-cp312-macosx_12_0_arm64.whl (47.4 MB view details)

Uploaded CPython 3.12 macOS 12.0+ ARM64

larq_compute_engine-0.16.0-cp312-cp312-macosx_10_15_x86_64.whl (56.0 MB view details)

Uploaded CPython 3.12 macOS 10.15+ x86-64

larq_compute_engine-0.16.0-cp311-cp311-win_amd64.whl (43.8 MB view details)

Uploaded CPython 3.11 Windows x86-64

larq_compute_engine-0.16.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (58.3 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

larq_compute_engine-0.16.0-cp311-cp311-macosx_12_0_arm64.whl (47.4 MB view details)

Uploaded CPython 3.11 macOS 12.0+ ARM64

larq_compute_engine-0.16.0-cp311-cp311-macosx_10_15_x86_64.whl (56.0 MB view details)

Uploaded CPython 3.11 macOS 10.15+ x86-64

larq_compute_engine-0.16.0-cp310-cp310-win_amd64.whl (43.8 MB view details)

Uploaded CPython 3.10 Windows x86-64

larq_compute_engine-0.16.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (58.3 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

larq_compute_engine-0.16.0-cp310-cp310-macosx_12_0_arm64.whl (47.4 MB view details)

Uploaded CPython 3.10 macOS 12.0+ ARM64

larq_compute_engine-0.16.0-cp310-cp310-macosx_10_15_x86_64.whl (56.0 MB view details)

Uploaded CPython 3.10 macOS 10.15+ x86-64

File details

Details for the file larq_compute_engine-0.16.0-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for larq_compute_engine-0.16.0-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 11f0f326640ce676de124a8039d5c6a534875dfcf32bce8d13b7929bde1470d9
MD5 36367a6679a3478979c787444bee628b
BLAKE2b-256 00818f41c71d9af9dbbe64ed13cccf07a23b8ff74a1da8b73f37c402a20d9478

See more details on using hashes here.

File details

Details for the file larq_compute_engine-0.16.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for larq_compute_engine-0.16.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 9446ecd36d6033158e7125e6db9a67b70aa6d542c37f0e1becd4ed202c26dad3
MD5 8ebf4b7dccf39249c85163303a955b54
BLAKE2b-256 0365e6560b530529b66586043e77b4c464d5a948b7d37e0dc7766a0dc5b9569c

See more details on using hashes here.

File details

Details for the file larq_compute_engine-0.16.0-cp312-cp312-macosx_12_0_arm64.whl.

File metadata

File hashes

Hashes for larq_compute_engine-0.16.0-cp312-cp312-macosx_12_0_arm64.whl
Algorithm Hash digest
SHA256 031388418b7aeb4cc274eeda76a2a76544fa54514b4b104da75b23f24b535e98
MD5 d940cb9c01d83ee63080f2e3875e496f
BLAKE2b-256 f92aa776d27f34b96e24fc226759e4723aa040ea55cfc4d8d77a4da16b5f41f7

See more details on using hashes here.

File details

Details for the file larq_compute_engine-0.16.0-cp312-cp312-macosx_10_15_x86_64.whl.

File metadata

File hashes

Hashes for larq_compute_engine-0.16.0-cp312-cp312-macosx_10_15_x86_64.whl
Algorithm Hash digest
SHA256 6f93122e43986e69ae1c73fe48e0b755e791a6f242071700e8a4efab0b25fede
MD5 c97b55b1d16d593e0a69efef8045a8b6
BLAKE2b-256 2cdc73542c9cc079b14a349cf3aee5787a97ac970be4fdbff768678c1a884943

See more details on using hashes here.

File details

Details for the file larq_compute_engine-0.16.0-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for larq_compute_engine-0.16.0-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 c4d16eb6ac0ed739c4a2be05b7fdd4b2140880a9d2680735a2c6eac51d8694b2
MD5 5d95734844cec3c55f1bd8b3d6938878
BLAKE2b-256 2abefefc4922ac55281ffe14b6b31c678386af7b2dc39dddfc792d5a1ade1f7d

See more details on using hashes here.

File details

Details for the file larq_compute_engine-0.16.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for larq_compute_engine-0.16.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 d66d0d0c4afe5f241eec28bc92aeb37dc7056a8f5dc3ea69209b18ac1ad82f36
MD5 918b6323c038236132477cd4a6058aa7
BLAKE2b-256 279f8af32811bd823ccf01b06bc6bc67c02f8b6424740e3dfc703322a02ba00b

See more details on using hashes here.

File details

Details for the file larq_compute_engine-0.16.0-cp311-cp311-macosx_12_0_arm64.whl.

File metadata

File hashes

Hashes for larq_compute_engine-0.16.0-cp311-cp311-macosx_12_0_arm64.whl
Algorithm Hash digest
SHA256 bd330b196fb813ccdf420549658d8def91f98b56b1a166a670707b71bc4d00d9
MD5 648fc8e8235ef7fe8d6ae7b42ec5abdc
BLAKE2b-256 c2d682b9504af1d200555896972da4aa20f19a69b86e8a1c1ede9483a7b0371e

See more details on using hashes here.

File details

Details for the file larq_compute_engine-0.16.0-cp311-cp311-macosx_10_15_x86_64.whl.

File metadata

File hashes

Hashes for larq_compute_engine-0.16.0-cp311-cp311-macosx_10_15_x86_64.whl
Algorithm Hash digest
SHA256 6c8189e53b1737d74ed6712bb072283d9a89fb080c6853e7029670f84fed393a
MD5 86e8e782bcf91b0a33ba45e6606a6b13
BLAKE2b-256 5bfefd9bc7f4a169f3bd62c6e86f8d7344fab7f1df530b1c3ec73269f4682683

See more details on using hashes here.

File details

Details for the file larq_compute_engine-0.16.0-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for larq_compute_engine-0.16.0-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 2359471bbd5196d7e143d3bcfcd38c1cc403cb426eea54116dfc4a71fe483677
MD5 babff868c5dcc98524f02cf6975540e0
BLAKE2b-256 ff4e5b4ddfe92d1cac4f3a6570a7dbad7c115e325c44b2edee3a53ce72b53eb3

See more details on using hashes here.

File details

Details for the file larq_compute_engine-0.16.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for larq_compute_engine-0.16.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 4a24a14577827674a847291ca190a869d946d986ac0789cb0b023e63d385479c
MD5 d8d085494721b4b476519e1a85101f4a
BLAKE2b-256 f361940685b48e1e66269e9dbe428f0cbfef41dca82e52a3f2556e9eece2b9ae

See more details on using hashes here.

File details

Details for the file larq_compute_engine-0.16.0-cp310-cp310-macosx_12_0_arm64.whl.

File metadata

File hashes

Hashes for larq_compute_engine-0.16.0-cp310-cp310-macosx_12_0_arm64.whl
Algorithm Hash digest
SHA256 297b0f5194b91eea45dd37ea8f64736707c2140daf81125e10b448354cb4afde
MD5 97468dc47c7715950d65308b6fb818b3
BLAKE2b-256 8d0e0db4e2dc797445f797057c02bace42bc03e0cfb27ba474620306f76fd3e2

See more details on using hashes here.

File details

Details for the file larq_compute_engine-0.16.0-cp310-cp310-macosx_10_15_x86_64.whl.

File metadata

File hashes

Hashes for larq_compute_engine-0.16.0-cp310-cp310-macosx_10_15_x86_64.whl
Algorithm Hash digest
SHA256 7a469365a41d404d5d8c9c7e9ddfa73770e656b4f59a902d01ee55543c07f978
MD5 7718bb19e0a669d83573a7f657235c1c
BLAKE2b-256 e25f293d34d873470a8f2eae643702ff8438acc0128503c25aa936e0f69bc48f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page