BitNeural32: 1.58-bit Ternary Neural Network Compiler & QAT Library for ESP32
Project description
BitNeural32: 1.58-Bit Ternary Neural Network Compiler for ESP32
A Python library for training, quantizing, and compiling neural networks to ultra-efficient 1.58-bit (ternary) format for deployment on ESP32 microcontrollers.
See also: BitNeural32 Inference Library
Features
1.58-Bit Quantization: Extreme compression—weights packed as 2-bit values (4 weights per byte) using ternary {-1, 0, 1}
Quantization-Aware Training (QAT): Custom Keras layers that apply quantization during training for better post-export accuracy
Production-Ready Compiler: Convert Keras models to optimized C bytecode with automatic weight flattening, packing, and metadata generation
Inference Metrics: Estimate inference time, RAM usage, and Flash size for different ESP32 variants (ESP32, ESP32-S3, ESP32-C3)
15+ Layer Types: Dense, Conv1D, Conv2D, LSTM, GRU, ReLU, LeakyReLU, Softmax, Sigmoid, Tanh, MaxPooling1D, Flatten, Dropout, and more
Type Safe: Full Python 3.9+ support with comprehensive type hints
Installation
From PyPI (recommended)
pip install bitneural32
Requirements
- Python: 3.9 or higher
- Keras: 3.0+
- TensorFlow: 2.16+ (or standalone Keras 3.x)
- NumPy: 1.21+
Quick Start
1. Train with Quantization-Aware Training (Recommended)
import numpy as np
import keras
from bitneural32.qat import TernaryDense, TernaryConv1D
# Build a QAT model
model = keras.Sequential([
TernaryConv1D(filters=32, kernel_size=5, padding='same', input_shape=(100, 1)),
keras.layers.ReLU(),
keras.layers.MaxPooling1D(2),
keras.layers.Flatten(),
TernaryDense(64),
keras.layers.ReLU(),
TernaryDense(10, activation='softmax')
])
# Train normally—quantization happens automatically
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
X_train = np.random.randn(1000, 100, 1).astype('float32')
Y_train = keras.utils.to_categorical(np.random.randint(0, 10, 1000), 10)
model.fit(X_train, Y_train, epochs=10, batch_size=32, verbose=1)
# Save for export
model.save('qat_model.keras')
2. Compile to ESP32 Bytecode
from bitneural32.compiler import BitNeuralCompiler
# Load and compile
compiler = BitNeuralCompiler(board_type='ESP32-S3')
compiled_model = keras.models.load_model('qat_model.keras')
compiler.compile_model(compiled_model, input_data=X_train)
compiler.save_c_header('model_data.h', include_metrics=True)
# View metrics
report = compiler.get_compilation_report()
print(report)
Output example:
{
"board_type": "ESP32-S3",
"total_size_bytes": 24576,
"num_layers": 8,
"inference_time_ms": 12.5,
"ram_usage_bytes": 1024,
"total_macs": 2500000,
"layers": [...]
}
3. Run on ESP32
Learn more at Deployment Guide
API Reference
QAT Layers
All custom QAT layers support standard Keras layer interfaces and compile seamlessly:
TernaryDense(units, **kwargs)
Fully-connected layer with ternary quantization.
layer = TernaryDense(64, activation='relu')
TernaryConv1D(filters, kernel_size, strides=1, padding='same', **kwargs)
1D convolution optimized for single-channel inputs (e.g., time-series).
layer = TernaryConv1D(32, kernel_size=5, padding='same')
TernaryConv2D(filters, kernel_size, strides=1, padding='same', **kwargs)
2D convolution supporting multi-channel inputs and outputs.
layer = TernaryConv2D(16, kernel_size=3, padding='same')
TernaryLSTM(units, return_sequences=False, **kwargs)
LSTM recurrent layer with quantized weights and float32 biases.
layer = TernaryLSTM(32, return_sequences=True)
TernaryGRU(units, return_sequences=False, **kwargs)
GRU recurrent layer with quantized weights and float32 biases.
layer = TernaryGRU(32, return_sequences=False)
Compiler API
BitNeuralCompiler(model=None, board_type='ESP32')
Parameters:
board_type(str): Target ESP32 variant ('ESP32', 'ESP32-S3', 'ESP32-C3')
Methods:
compile_model(model, input_data=None, allow_metrics=False): Compile a Keras modelsave_c_header(filepath, include_metrics=False): Export to C header fileget_compilation_report(): Get human-readable report (dict)export_model(filepath, allow_metrics=False): Convenience export function
Example:
compiler = BitNeuralCompiler(board_type='ESP32-S3')
compiler.compile_model(model, input_data=X_train, allow_metrics=True)
compiler.save_c_header('model.h', include_metrics=True)
Quantization Utilities
quantize_weights_ternary(weights)
Quantize float32 weights to {-1, 0, 1} using median-based thresholding.
from bitneural32.quantize import quantize_weights_ternary
quantized = quantize_weights_ternary(np.random.randn(100, 100))
pack_weights_2bit(quantized_weights)
Pack ternary weights into 2-bit format (4 weights per byte).
from bitneural32.quantize import pack_weights_2bit
packed = pack_weights_2bit(quantized)
Architecture Overview
Quantization Strategy
BitNeural32 uses ternary quantization:
- Median-based thresholding: Set threshold = median(|weights|)
- Ternary encoding:
- Weight > threshold → 1
- Weight < -threshold → -1
- Otherwise → 0
- 2-bit packing: 4 weights per byte (2 bits each)
Encoding:
00→ 001→ 110→ -111→ reserved
QAT Training
Quantization-aware training applies quantization in-the-loop:
- Forward pass: Weights quantized to {-1, 0, 1} with learnable scale
- Backward pass: Straight-through estimator (STE) for gradient computation
- Result: Network adapts to quantization → 2-5% higher accuracy after export
Compilation Pipeline
Keras Model
↓
[Per-Layer Compilation]
↓
Weight Flattening (layer-specific order)
↓
Ternary Quantization + 2-Bit Packing
↓
Binary Blob Generation
↓
C Header Export
↓
model_data.h (ready for ESP32 inclusion)
Performance Characteristics
Memory Footprint
Example: 10→64→32→10 network
| Format | Size |
|---|---|
| Float32 | 40 KB |
| Ternary (1.58-bit) | 2.5 KB |
| Compression | 94% |
Inference Speed (ESP32 @ 240 MHz)
| Layer Type | Input→Output | Approx. Time |
|---|---|---|
| Dense | 1000→1000 | 10-50 ms |
| Conv1D | 100 inputs, 32 filters, kernel 5 | 5-20 ms |
| Conv2D | 28×28→14×14, 32 filters | 20-100 ms |
| LSTM | 32 hidden, 50 timesteps | 15-80 ms |
| Full Network | 10→64→32→10 | 1-5 ms |
Supported Layers
| Layer | QAT Version | Notes |
|---|---|---|
| Dense | TernaryDense | ✅ Full support |
| Conv1D | TernaryConv1D | ✅ Mono-channel optimized |
| Conv2D | TernaryConv2D | ✅ Multi-channel support |
| LSTM | TernaryLSTM | ✅ Quantized kernel & recurrent |
| GRU | TernaryGRU | ✅ Quantized kernel & recurrent |
| ReLU | Standard | ✅ No quantization needed |
| LeakyReLU | Standard | ✅ Works as-is |
| Softmax | Standard | ✅ Uses float32 for stability |
| Sigmoid | Standard | ✅ Fast Padé approximation on ESP32 |
| Tanh | Standard | ✅ Fast Padé approximation on ESP32 |
| MaxPooling1D | Standard | ✅ No quantization |
| Flatten | Standard | ✅ Memory layout only |
| Dropout | Standard | ✅ No-op at inference |
Tips & Best Practices
Model Design
- Start with QAT layers for better accuracy after quantization
- Use smaller models: Ternary networks benefit from depth over width
- Avoid BatchNormalization before quantized layers (fuse into weights)
- Use ReLU/LeakyReLU for better quantization robustness
Training
- Learning rate: Use 10× lower LR than standard training
- Epochs: Train 20-50% longer to adapt to quantization
- Batch size: 32-128 works well for most models
- Monitor accuracy: QAT models may drop 1-3% initially, then recover
Compilation
- Always provide input_data: Needed for input normalization statistics
- Check metrics: Use
allow_metrics=Trueto estimate ESP32 performance - Board selection: ESP32-S3 has more RAM; ESP32-C3 is power-efficient
Deployment
- Test on target hardware: Simulator timings differ from real ESP32
- Use dual-core: Enable Core 1 for real-time audio/sensor processing
- Monitor UART: Check inference logs for bottlenecks
Troubleshooting
"Unsupported layer type"
Make sure you're using QAT versions or standard Keras layers. If custom layer:
# Add to compiler mapping
from bitneural32.compiler import BitNeuralCompiler
BitNeuralCompiler.LAYER_COMPILER_MAP['MyLayer'] = MyLayerCompiler()
Model accuracy drops significantly after quantization
- Use QAT layers instead of post-training quantization
- Train longer (2-3× epochs)
- Lower learning rate by 10×
- Use warm-up training (standard float → gradual quantization)
Compiled model is too large
- Reduce model size (fewer filters/units)
- Use depthwise separable convolutions
- Remove dense layers, use global pooling instead
- Prune weights before compilation
ESP32 inference is slow
- Check clock speed (set to 240 MHz max)
- Profile with
bn_run_inference()timing - Use Conv1D instead of Dense for temporal data
- Consider smaller input resolution
Citation
If you use BitNeural32 in your research, please cite:
@software{bitneural32,
title = {BitNeural32: 1.58-Bit Ternary Neural Network Compiler for ESP32},
author = {Aizhee},
year = {2025},
url = {https://github.com/aizhee/python-bitneural32}
}
License
MIT License - See LICENSE file for details.
References
- BitNet Paper: arxiv.org/abs/2310.11453
- Ternary Networks: arxiv.org/abs/1609.00222
- ESP32 Docs: docs.espressif.com
- Keras API: keras.io
Made with ❤️ by Aizhee for embedded machine learning
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file bitneural32-0.0.13.tar.gz.
File metadata
- Download URL: bitneural32-0.0.13.tar.gz
- Upload date:
- Size: 22.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2d46bccf6a72fa4fabf77818a2127232a1ce51605516e0db06ca95269a66147c
|
|
| MD5 |
6dbd82fa6a2a448fda2624ee7763d905
|
|
| BLAKE2b-256 |
61f71308bd1c2dd41f74f1bc2f9dfc437238ec32c115bb4381a83a082461ddf3
|
File details
Details for the file bitneural32-0.0.13-py3-none-any.whl.
File metadata
- Download URL: bitneural32-0.0.13-py3-none-any.whl
- Upload date:
- Size: 20.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8b09f29b8cd03f3d515afe676e4e1b4d6a0ff5ee16b466a9968a9702577103f2
|
|
| MD5 |
11a838c0c430780c12503059e1b07924
|
|
| BLAKE2b-256 |
aca66470b9a1ab4f0b18cdb863b9a2edb7a34946c3a817c477c5815507dd8991
|