Convert ML models to optimized code for embedded systems
Project description
BlackBox2C
Convert scikit-learn models to native embedded code — C, C++, Arduino, MicroPython
BlackBox2C converts any trained scikit-learn model into a minimal if-else decision tree in your target language. The generated code has zero runtime dependencies, runs on any microcontroller with a C compiler, and fits in a few hundred bytes of FLASH.
How It Works
- Surrogate extraction — A lightweight
DecisionTreeis trained to mimic any black-box model (Random Forest, SVM, MLP, etc.) by generating synthetic boundary samples and labeling them with the original model's predictions. - Rule optimization — Redundant branches are pruned and similar leaves are merged to minimize code size.
- Code generation — The optimized tree is serialized as a pure if-else function in the target language.
Supported Models and Targets
| Input models | Output formats |
|---|---|
Any scikit-learn estimator with predict() |
Pure C (C99) |
| Decision Tree, Random Forest, SVM, MLP... | C++11 (class + namespace) |
| Classification and Regression tasks | Arduino (.h with PROGMEM) |
MicroPython (.py module) |
Installation
pip install blackbox2c
Requirements: Python 3.8+, NumPy >= 1.21, scikit-learn >= 1.0.
Tip: Use a virtual environment to keep your project isolated:
# python -m venv .venv && source .venv/bin/activate # Linux/macOS
python -m venv .venv && .venv\Scripts\activate # Windows
pip install blackbox2c
For development (from source):
git clone https://github.com/AxelSkrauba/BlackBox2C.git
cd BlackBox2C
pip install -e ".[dev]"
Quick Start
Classification
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris
from blackbox2c import convert
iris = load_iris()
model = RandomForestClassifier(n_estimators=50, random_state=42)
model.fit(iris.data, iris.target)
# Convert to C (default target)
c_code = convert(
model,
iris.data,
feature_names=list(iris.feature_names),
class_names=list(iris.target_names),
max_depth=5,
)
print(c_code)
Generated output:
/*
* Auto-generated C code by BlackBox2C
* - Input features: 4
* - Output classes: 3
* - Precision: 8-bit
*/
#include <stdint.h>
#define setosa 0
#define versicolor 1
#define virginica 2
uint8_t predict(float features[4]) {
if (features[2] <= 2.449999f) {
return 0;
} else {
if (features[3] <= 1.750000f) {
return 1;
} else {
return 2;
}
}
}
Export to Other Formats
# Arduino .ino file
arduino_code = convert(model, iris.data, target='arduino')
# C++ class
cpp_code = convert(model, iris.data, target='cpp')
# MicroPython module
mp_code = convert(model, iris.data, target='micropython')
Regression
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.datasets import load_diabetes
from blackbox2c import convert
data = load_diabetes()
model = GradientBoostingRegressor(random_state=42)
model.fit(data.data, data.target)
c_code = convert(model, data.data, max_depth=5)
# Generates: float predict(float features[10]) { ... }
Feature Analysis
from blackbox2c.analysis import FeatureSensitivityAnalyzer
analyzer = FeatureSensitivityAnalyzer(n_repeats=10, random_state=42)
results = analyzer.analyze(model, X_train, y_train, feature_names=feature_names)
print(results.summary())
# Get top 3 most important features by index
top3 = results.get_top_features(3)
Configuration
from blackbox2c import Converter, ConversionConfig
config = ConversionConfig(
max_depth=5, # Surrogate tree depth (1-10, default 5)
optimize_rules='medium', # 'low' | 'medium' | 'high'
use_fixed_point=False, # Use integer arithmetic instead of float
precision=8, # Bit width for fixed-point: 8 | 16 | 32
function_name='predict', # Name of the generated function
n_samples=10000, # Synthetic samples for surrogate training
feature_threshold=None, # Auto-select N most important features
memory_budget_kb=None, # Auto-tune params to fit a KB budget
)
converter = Converter(config)
code = converter.convert(model, X_train, target='arduino')
metrics = converter.get_metrics()
# {'fidelity': 0.97, 'complexity': {...}, 'size_estimate': {...}}
CLI
# Convert a pickled model to C
blackbox2c convert model.pkl X_train.npy -o output.c
# Export to Arduino
blackbox2c convert model.pkl X_train.npy -t arduino -o predict.h
# Analyze feature importance
blackbox2c analyze model.pkl X_train.npy --top-n 5
# Export a decision tree directly (no surrogate extraction)
blackbox2c export model.pkl -f cpp -o predictor.hpp
# Help
blackbox2c --help
blackbox2c convert --help
Benchmarks
python benchmarks/benchmark_classic_datasets.py --output results.md
Covers Iris, Wine, Diabetes, and California Housing with Decision Trees, Random Forests, SVMs, and Neural Networks. Metrics: fidelity, estimated FLASH size, tree depth, conversion time.
Note: Code size figures are estimates from BlackBox2C's built-in size estimator, not measurements on real hardware.
Project Structure
blackbox2c/
├── blackbox2c/
│ ├── __init__.py # Public API: convert(), Converter, ConversionConfig
│ ├── converter.py # Main orchestration pipeline
│ ├── config.py # ConversionConfig dataclass
│ ├── surrogate.py # Surrogate tree extraction
│ ├── codegen.py # C code generation
│ ├── optimizer.py # Rule pruning and merging
│ ├── exporters.py # C++, Arduino, MicroPython exporters
│ ├── analysis.py # Feature sensitivity analysis
│ └── cli.py # Command-line interface
├── tests/ # 167 tests, >91% coverage
├── notebooks/ # Jupyter notebook examples (runnable on Colab)
├── benchmarks/ # Classic dataset benchmarks
├── examples/ # Script-based end-to-end examples
└── docs/ # MkDocs documentation source
Comparison with Alternatives
| Feature | BlackBox2C | emlearn | MicroMLGen | TFLite Micro |
|---|---|---|---|---|
| Any sklearn model | ✅ | ⚠️ Trees only | ⚠️ Trees only | ❌ TF only |
| Pure if-else output | ✅ | ✅ | ✅ | ❌ |
| C++ / Arduino / MicroPython | ✅ | ⚠️ Partial | ❌ | ⚠️ Partial |
| Feature selection built-in | ✅ | ❌ | ❌ | ❌ |
| Memory budget control | ✅ | ❌ | ❌ | ⚠️ |
| Zero runtime dependencies | ✅ | ✅ | ✅ | ❌ |
Roadmap (v0.2)
- Quine-McCluskey and BDD rule optimization
- Hardware-validated benchmarks on real MCUs
- Quantization-aware training integration
License
MIT — see LICENSE.
Contributing
Issues and PRs welcome at github.com/AxelSkrauba/BlackBox2C.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file blackbox2c-0.1.1.tar.gz.
File metadata
- Download URL: blackbox2c-0.1.1.tar.gz
- Upload date:
- Size: 59.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5bc57a7808bb6f8b4462a872941b4b807dd37d7ae79ee3312a9ba2fd5ec5d846
|
|
| MD5 |
7a0e2e98c4a38d3aef7374de62deecfd
|
|
| BLAKE2b-256 |
0182b9dbf8dcb66ad8d24e9d0f32b04aa73c52eec91c79026d25339700a3c095
|
Provenance
The following attestation bundles were made for blackbox2c-0.1.1.tar.gz:
Publisher:
publish.yml on AxelSkrauba/BlackBox2C
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
blackbox2c-0.1.1.tar.gz -
Subject digest:
5bc57a7808bb6f8b4462a872941b4b807dd37d7ae79ee3312a9ba2fd5ec5d846 - Sigstore transparency entry: 1319140156
- Sigstore integration time:
-
Permalink:
AxelSkrauba/BlackBox2C@6ee058c2638b882f0a5d8f21c4768c8560bb0432 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/AxelSkrauba
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@6ee058c2638b882f0a5d8f21c4768c8560bb0432 -
Trigger Event:
push
-
Statement type:
File details
Details for the file blackbox2c-0.1.1-py3-none-any.whl.
File metadata
- Download URL: blackbox2c-0.1.1-py3-none-any.whl
- Upload date:
- Size: 31.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0a9f2067ab5879d22d2922bd06e603a9aa0edd58eb06abfda1f9022bade0e1be
|
|
| MD5 |
5459ae2cd32a74309941db06db42601f
|
|
| BLAKE2b-256 |
56a0d56db395985c54222ab05ec962e151871f7ddbb32d2fc2a965f3072c2a49
|
Provenance
The following attestation bundles were made for blackbox2c-0.1.1-py3-none-any.whl:
Publisher:
publish.yml on AxelSkrauba/BlackBox2C
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
blackbox2c-0.1.1-py3-none-any.whl -
Subject digest:
0a9f2067ab5879d22d2922bd06e603a9aa0edd58eb06abfda1f9022bade0e1be - Sigstore transparency entry: 1319140220
- Sigstore integration time:
-
Permalink:
AxelSkrauba/BlackBox2C@6ee058c2638b882f0a5d8f21c4768c8560bb0432 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/AxelSkrauba
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@6ee058c2638b882f0a5d8f21c4768c8560bb0432 -
Trigger Event:
push
-
Statement type: