Skip to main content

BitNeural32: 1.58-bit Ternary Neural Network Compiler & QAT Library for ESP32

Project description

BitNeural32: 1.58-Bit Ternary Neural Network Compiler for ESP32

PyPI License: MIT Python 3.9+

A Python library for training, quantizing, and compiling neural networks to ultra-efficient 1.58-bit (ternary) format for deployment on ESP32 microcontrollers.

See also: BitNeural32 Inference Library

Features

1.58-Bit Quantization: Extreme compression—weights packed as 2-bit values (4 weights per byte) using ternary {-1, 0, 1}

Quantization-Aware Training (QAT): Custom Keras layers that apply quantization during training for better post-export accuracy

Production-Ready Compiler: Convert Keras models to optimized C bytecode with automatic weight flattening, packing, and metadata generation

Inference Metrics: Estimate inference time, RAM usage, and Flash size for different ESP32 variants (ESP32, ESP32-S3, ESP32-C3)

15+ Layer Types: Dense, Conv1D, Conv2D, LSTM, GRU, ReLU, LeakyReLU, Softmax, Sigmoid, Tanh, MaxPooling1D, Flatten, Dropout, and more

Type Safe: Full Python 3.9+ support with comprehensive type hints

Installation

From PyPI (recommended)

pip install bitneural32

Requirements

  • Python: 3.9 or higher
  • Keras: 3.0+
  • TensorFlow: 2.16+ (or standalone Keras 3.x)
  • NumPy: 1.21+

Quick Start

1. Train with Quantization-Aware Training (Recommended)

import numpy as np
import keras
from bitneural32.qat import TernaryDense, TernaryConv1D

# Build a QAT model
model = keras.Sequential([
    TernaryConv1D(filters=32, kernel_size=5, padding='same', input_shape=(100, 1)),
    keras.layers.ReLU(),
    keras.layers.MaxPooling1D(2),
    keras.layers.Flatten(),
    TernaryDense(64),
    keras.layers.ReLU(),
    TernaryDense(10, activation='softmax')
])

# Train normally—quantization happens automatically
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
X_train = np.random.randn(1000, 100, 1).astype('float32')
Y_train = keras.utils.to_categorical(np.random.randint(0, 10, 1000), 10)
model.fit(X_train, Y_train, epochs=10, batch_size=32, verbose=1)

# Save for export
model.save('qat_model.keras')

2. Compile to ESP32 Bytecode

from bitneural32.compiler import BitNeuralCompiler

# Load and compile
compiler = BitNeuralCompiler(board_type='ESP32-S3')
compiled_model = keras.models.load_model('qat_model.keras')
compiler.compile_model(compiled_model, input_data=X_train)
compiler.save_c_header('model_data.h', include_metrics=True)

# View metrics
report = compiler.get_compilation_report()
print(report)

Output example:

{
  "board_type": "ESP32-S3",
  "total_size_bytes": 24576,
  "num_layers": 8,
  "inference_time_ms": 12.5,
  "ram_usage_bytes": 1024,
  "total_macs": 2500000,
  "layers": [...]
}

3. Run on ESP32

Learn more at Deployment Guide

API Reference

QAT Layers

All custom QAT layers support standard Keras layer interfaces and compile seamlessly:

TernaryDense(units, **kwargs)

Fully-connected layer with ternary quantization.

layer = TernaryDense(64, activation='relu')

TernaryConv1D(filters, kernel_size, strides=1, padding='same', **kwargs)

1D convolution optimized for single-channel inputs (e.g., time-series).

layer = TernaryConv1D(32, kernel_size=5, padding='same')

TernaryConv2D(filters, kernel_size, strides=1, padding='same', **kwargs)

2D convolution supporting multi-channel inputs and outputs.

layer = TernaryConv2D(16, kernel_size=3, padding='same')

TernaryLSTM(units, return_sequences=False, **kwargs)

LSTM recurrent layer with quantized weights and float32 biases.

layer = TernaryLSTM(32, return_sequences=True)

TernaryGRU(units, return_sequences=False, **kwargs)

GRU recurrent layer with quantized weights and float32 biases.

layer = TernaryGRU(32, return_sequences=False)

Compiler API

BitNeuralCompiler(model=None, board_type='ESP32')

Parameters:

  • board_type (str): Target ESP32 variant ('ESP32', 'ESP32-S3', 'ESP32-C3')

Methods:

  • compile_model(model, input_data=None, allow_metrics=False): Compile a Keras model
  • save_c_header(filepath, include_metrics=False): Export to C header file
  • get_compilation_report(): Get human-readable report (dict)
  • export_model(filepath, allow_metrics=False): Convenience export function

Example:

compiler = BitNeuralCompiler(board_type='ESP32-S3')
compiler.compile_model(model, input_data=X_train, allow_metrics=True)
compiler.save_c_header('model.h', include_metrics=True)

Quantization Utilities

quantize_weights_ternary(weights)

Quantize float32 weights to {-1, 0, 1} using median-based thresholding.

from bitneural32.quantize import quantize_weights_ternary
quantized = quantize_weights_ternary(np.random.randn(100, 100))

pack_weights_2bit(quantized_weights)

Pack ternary weights into 2-bit format (4 weights per byte).

from bitneural32.quantize import pack_weights_2bit
packed = pack_weights_2bit(quantized)

Architecture Overview

Quantization Strategy

BitNeural32 uses ternary quantization:

  1. Median-based thresholding: Set threshold = median(|weights|)
  2. Ternary encoding:
    • Weight > threshold → 1
    • Weight < -threshold → -1
    • Otherwise → 0
  3. 2-bit packing: 4 weights per byte (2 bits each)

Encoding:

  • 00 → 0
  • 01 → 1
  • 10 → -1
  • 11 → reserved

QAT Training

Quantization-aware training applies quantization in-the-loop:

  1. Forward pass: Weights quantized to {-1, 0, 1} with learnable scale
  2. Backward pass: Straight-through estimator (STE) for gradient computation
  3. Result: Network adapts to quantization → 2-5% higher accuracy after export

Compilation Pipeline

Keras Model
    ↓
[Per-Layer Compilation]
    ↓
Weight Flattening (layer-specific order)
    ↓
Ternary Quantization + 2-Bit Packing
    ↓
Binary Blob Generation
    ↓
C Header Export
    ↓
model_data.h (ready for ESP32 inclusion)

Performance Characteristics

Memory Footprint

Example: 10→64→32→10 network

Format Size
Float32 40 KB
Ternary (1.58-bit) 2.5 KB
Compression 94%

Inference Speed (ESP32 @ 240 MHz)

Layer Type Input→Output Approx. Time
Dense 1000→1000 10-50 ms
Conv1D 100 inputs, 32 filters, kernel 5 5-20 ms
Conv2D 28×28→14×14, 32 filters 20-100 ms
LSTM 32 hidden, 50 timesteps 15-80 ms
Full Network 10→64→32→10 1-5 ms

Supported Layers

Layer QAT Version Notes
Dense TernaryDense ✅ Full support
Conv1D TernaryConv1D ✅ Mono-channel optimized
Conv2D TernaryConv2D ✅ Multi-channel support
LSTM TernaryLSTM ✅ Quantized kernel & recurrent
GRU TernaryGRU ✅ Quantized kernel & recurrent
ReLU Standard ✅ No quantization needed
LeakyReLU Standard ✅ Works as-is
Softmax Standard ✅ Uses float32 for stability
Sigmoid Standard ✅ Fast Padé approximation on ESP32
Tanh Standard ✅ Fast Padé approximation on ESP32
MaxPooling1D Standard ✅ No quantization
Flatten Standard ✅ Memory layout only
Dropout Standard ✅ No-op at inference

Tips & Best Practices

Model Design

  • Start with QAT layers for better accuracy after quantization
  • Use smaller models: Ternary networks benefit from depth over width
  • Avoid BatchNormalization before quantized layers (fuse into weights)
  • Use ReLU/LeakyReLU for better quantization robustness

Training

  • Learning rate: Use 10× lower LR than standard training
  • Epochs: Train 20-50% longer to adapt to quantization
  • Batch size: 32-128 works well for most models
  • Monitor accuracy: QAT models may drop 1-3% initially, then recover

Compilation

  • Always provide input_data: Needed for input normalization statistics
  • Check metrics: Use allow_metrics=True to estimate ESP32 performance
  • Board selection: ESP32-S3 has more RAM; ESP32-C3 is power-efficient

Deployment

  • Test on target hardware: Simulator timings differ from real ESP32
  • Use dual-core: Enable Core 1 for real-time audio/sensor processing
  • Monitor UART: Check inference logs for bottlenecks

Troubleshooting

"Unsupported layer type"

Make sure you're using QAT versions or standard Keras layers. If custom layer:

# Add to compiler mapping
from bitneural32.compiler import BitNeuralCompiler
BitNeuralCompiler.LAYER_COMPILER_MAP['MyLayer'] = MyLayerCompiler()

Model accuracy drops significantly after quantization

  • Use QAT layers instead of post-training quantization
  • Train longer (2-3× epochs)
  • Lower learning rate by 10×
  • Use warm-up training (standard float → gradual quantization)

Compiled model is too large

  • Reduce model size (fewer filters/units)
  • Use depthwise separable convolutions
  • Remove dense layers, use global pooling instead
  • Prune weights before compilation

ESP32 inference is slow

  • Check clock speed (set to 240 MHz max)
  • Profile with bn_run_inference() timing
  • Use Conv1D instead of Dense for temporal data
  • Consider smaller input resolution

Citation

If you use BitNeural32 in your research, please cite:

@software{bitneural32,
  title = {BitNeural32: 1.58-Bit Ternary Neural Network Compiler for ESP32},
  author = {Aizhee},
  year = {2025},
  url = {https://github.com/aizhee/python-bitneural32}
}

License

MIT License - See LICENSE file for details.

References


Made with ❤️ by Aizhee for embedded machine learning

ko-fi

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bitneural32-0.0.15.tar.gz (22.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bitneural32-0.0.15-py3-none-any.whl (20.4 kB view details)

Uploaded Python 3

File details

Details for the file bitneural32-0.0.15.tar.gz.

File metadata

  • Download URL: bitneural32-0.0.15.tar.gz
  • Upload date:
  • Size: 22.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for bitneural32-0.0.15.tar.gz
Algorithm Hash digest
SHA256 839f6d9e4b184a25303dcfbc76602652872a4e4ee2b27f3c1ec4c33c7606bef0
MD5 ae79a5e20168b215e6824e9802ba2450
BLAKE2b-256 5b2690a56917e6bd1689a713f36b5d891cefb4300442acf931e647ea103b8737

See more details on using hashes here.

File details

Details for the file bitneural32-0.0.15-py3-none-any.whl.

File metadata

  • Download URL: bitneural32-0.0.15-py3-none-any.whl
  • Upload date:
  • Size: 20.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for bitneural32-0.0.15-py3-none-any.whl
Algorithm Hash digest
SHA256 3755bdfdbc84682ded00871705228247de3a988074c15b5ffd718a51293e6f21
MD5 215425261d6e253b8bdf549537f86b3c
BLAKE2b-256 44ff6b9f47b0f3220c73868ceb50ed58b311c7aaa1c60347cf1c727742943ff5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page