Skip to main content

Convert ML models to optimized code for embedded systems

Project description

BlackBox2C

Convert scikit-learn models to native embedded code — C, C++, Arduino, MicroPython

Tests PyPI Python 3.8+ License: MIT

BlackBox2C converts any trained scikit-learn model into a minimal if-else decision tree in your target language. The generated code has zero runtime dependencies, runs on any microcontroller with a C compiler, and fits in a few hundred bytes of FLASH.


How It Works

  1. Surrogate extraction — A lightweight DecisionTree is trained to mimic any black-box model (Random Forest, SVM, MLP, etc.) by generating synthetic boundary samples and labeling them with the original model's predictions.
  2. Rule optimization — Redundant branches are pruned and similar leaves are merged to minimize code size.
  3. Code generation — The optimized tree is serialized as a pure if-else function in the target language.

Supported Models and Targets

Input models Output formats
Any scikit-learn estimator with predict() Pure C (C99)
Decision Tree, Random Forest, SVM, MLP... C++11 (class + namespace)
Classification and Regression tasks Arduino (.h with PROGMEM)
MicroPython (.py module)

Installation

pip install blackbox2c

Requirements: Python 3.8+, NumPy >= 1.21, scikit-learn >= 1.0.

Tip: Use a virtual environment to keep your project isolated:

# python -m venv .venv && source .venv/bin/activate  # Linux/macOS
python -m venv .venv && .venv\Scripts\activate     # Windows
pip install blackbox2c

For development (from source):

git clone https://github.com/AxelSkrauba/BlackBox2C.git
cd BlackBox2C
pip install -e ".[dev]"

Quick Start

Classification

from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris
from blackbox2c import convert

iris = load_iris()
model = RandomForestClassifier(n_estimators=50, random_state=42)
model.fit(iris.data, iris.target)

# Convert to C (default target)
c_code = convert(
    model,
    iris.data,
    feature_names=list(iris.feature_names),
    class_names=list(iris.target_names),
    max_depth=5,
)
print(c_code)

Generated output:

/*
 * Auto-generated C code by BlackBox2C
 *   - Input features: 4
 *   - Output classes: 3
 *   - Precision: 8-bit
 */
#include <stdint.h>

#define setosa 0
#define versicolor 1
#define virginica 2

uint8_t predict(float features[4]) {
    if (features[2] <= 2.449999f) {
        return 0;
    } else {
        if (features[3] <= 1.750000f) {
            return 1;
        } else {
            return 2;
        }
    }
}

Export to Other Formats

# Arduino .ino file
arduino_code = convert(model, iris.data, target='arduino')

# C++ class
cpp_code = convert(model, iris.data, target='cpp')

# MicroPython module
mp_code = convert(model, iris.data, target='micropython')

Regression

from sklearn.ensemble import GradientBoostingRegressor
from sklearn.datasets import load_diabetes
from blackbox2c import convert

data = load_diabetes()
model = GradientBoostingRegressor(random_state=42)
model.fit(data.data, data.target)

c_code = convert(model, data.data, max_depth=5)
# Generates: float predict(float features[10]) { ... }

Feature Analysis

from blackbox2c.analysis import FeatureSensitivityAnalyzer

analyzer = FeatureSensitivityAnalyzer(n_repeats=10, random_state=42)
results = analyzer.analyze(model, X_train, y_train, feature_names=feature_names)
print(results.summary())

# Get top 3 most important features by index
top3 = results.get_top_features(3)

Configuration

from blackbox2c import Converter, ConversionConfig

config = ConversionConfig(
    max_depth=5,             # Surrogate tree depth (1-10, default 5)
    optimize_rules='medium', # 'low' | 'medium' | 'high' | 'qm' | 'bdd' | 'auto'
    qm_max_literals=12,      # Hard cap on literals processed by QM (v0.2+)
    bdd_max_literals=24,     # Hard cap on literals processed by BDD (v0.2+)
    use_fixed_point=False,   # Use integer arithmetic instead of float
    precision=8,             # Bit width for fixed-point: 8 | 16 | 32
    function_name='predict', # Name of the generated function
    n_samples=10000,         # Synthetic samples for surrogate training
    feature_threshold=None,  # Auto-select N most important features
    memory_budget_kb=None,   # Auto-tune params to fit a KB budget
)

converter = Converter(config)
code = converter.convert(model, X_train, target='arduino')
metrics = converter.get_metrics()
# {'fidelity': 0.97, 'complexity': {...}, 'size_estimate': {...}}

Rule optimization levels

Level Algorithm Notes
low No post-processing. Returns the surrogate tree as-is.
medium Prune internal nodes whose direct children are same-class leaves. Default. Backward-compatible with v0.1.
high medium + merge sibling leaves with very similar class distributions. Backward-compatible with v0.1.
qm Multi-valued Quine-McCluskey boolean minimisation lifted to continuous splits. (v0.2) Classification only. Capped by qm_max_literals.
bdd Frequency-ordered Reduced Ordered BDD rebuilt as a tree. (v0.2) Classification only. Capped by bdd_max_literals.
auto Runs every applicable optimiser plus the no-op baseline; returns the smallest. (v0.2) Recommended default if you only care about code size.

Advanced levels (qm, bdd, auto) preserve 100 % functional equivalence with the surrogate tree (verified by the test suite). On regression they emit a single UserWarning and fall back to the legacy 'high' path. Best-case savings on the benchmark (benchmarks/results/v0.2.md): −47 % FLASH on Iris + RandomForest. See notebooks/07_advanced_optimization.ipynb for a guided walkthrough.


CLI

# Convert a pickled model to C
blackbox2c convert model.pkl X_train.npy -o output.c

# Export to Arduino
blackbox2c convert model.pkl X_train.npy -t arduino -o predict.h

# Analyze feature importance
blackbox2c analyze model.pkl X_train.npy --top-n 5

# Export a decision tree directly (no surrogate extraction)
blackbox2c export model.pkl -f cpp -o predictor.hpp

# Help
blackbox2c --help
blackbox2c convert --help

Benchmarks

python benchmarks/benchmark_classic_datasets.py --output results.md

Covers Iris, Wine, Diabetes, and California Housing with Decision Trees, Random Forests, SVMs, and Neural Networks. Metrics: fidelity, estimated FLASH size, tree depth, conversion time.

Note: Code size figures are estimates from BlackBox2C's built-in size estimator, not measurements on real hardware.


Project Structure

blackbox2c/
├── blackbox2c/
│   ├── __init__.py      # Public API: convert(), Converter, ConversionConfig
│   ├── converter.py     # Main orchestration pipeline
│   ├── config.py        # ConversionConfig dataclass
│   ├── surrogate.py     # Surrogate tree extraction
│   ├── codegen.py       # C code generation
│   ├── optimizer.py     # Rule pruning and merging
│   ├── exporters.py     # C++, Arduino, MicroPython exporters
│   ├── analysis.py      # Feature sensitivity analysis
│   └── cli.py           # Command-line interface
├── tests/               # 182 tests, >91% coverage
├── notebooks/           # Jupyter notebook examples (runnable on Colab)
├── benchmarks/          # Classic dataset benchmarks
├── examples/            # Script-based end-to-end examples
└── docs/                # MkDocs documentation source

Comparison with Alternatives

Feature BlackBox2C emlearn MicroMLGen TFLite Micro
Any sklearn model ⚠️ Trees only ⚠️ Trees only ❌ TF only
Pure if-else output
C++ / Arduino / MicroPython ⚠️ Partial ⚠️ Partial
Feature selection built-in
Memory budget control ⚠️
Zero runtime dependencies

Roadmap (v0.2)

  • Quine-McCluskey and BDD rule optimization
  • Hardware-validated benchmarks on real MCUs
  • Quantization-aware training integration

License

MIT — see LICENSE.

Contributing

Issues and PRs welcome at github.com/AxelSkrauba/BlackBox2C.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

blackbox2c-0.2.1.tar.gz (101.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

blackbox2c-0.2.1-py3-none-any.whl (57.8 kB view details)

Uploaded Python 3

File details

Details for the file blackbox2c-0.2.1.tar.gz.

File metadata

  • Download URL: blackbox2c-0.2.1.tar.gz
  • Upload date:
  • Size: 101.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for blackbox2c-0.2.1.tar.gz
Algorithm Hash digest
SHA256 48cfb5e43f5c6ca1389e6ed4baf5fa99ef77b3ddb4a7646b16be8dddb42648fd
MD5 be579d10cf3eeff607e6dd3fc43322b0
BLAKE2b-256 de3a38b8d3efab5be83afaddaf22b557fe0f83ff997bdf9aca96d0e689b93388

See more details on using hashes here.

Provenance

The following attestation bundles were made for blackbox2c-0.2.1.tar.gz:

Publisher: publish.yml on AxelSkrauba/BlackBox2C

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file blackbox2c-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: blackbox2c-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 57.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for blackbox2c-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e1f231492b1a63271190a960ad3adf774895902fad8688119970d2e85079654b
MD5 28c0aa782b0bfa9fad4cf9ae29090975
BLAKE2b-256 d20b5b2fadc443861fa0e251ccfe0d75f593f1bb70b5d5a1913dfd5636810ef7

See more details on using hashes here.

Provenance

The following attestation bundles were made for blackbox2c-0.2.1-py3-none-any.whl:

Publisher: publish.yml on AxelSkrauba/BlackBox2C

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page