Skip to main content

Convert ML models to optimized code for embedded systems

Project description

BlackBox2C

Convert scikit-learn models to native embedded code — C, C++, Arduino, MicroPython

Tests PyPI Python 3.8+ License: MIT

BlackBox2C converts any trained scikit-learn model into a minimal if-else decision tree in your target language. The generated code has zero runtime dependencies, runs on any microcontroller with a C compiler, and fits in a few hundred bytes of FLASH.


How It Works

  1. Surrogate extraction — A lightweight DecisionTree is trained to mimic any black-box model (Random Forest, SVM, MLP, etc.) by generating synthetic boundary samples and labeling them with the original model's predictions.
  2. Rule optimization — Redundant branches are pruned and similar leaves are merged to minimize code size.
  3. Code generation — The optimized tree is serialized as a pure if-else function in the target language.

Supported Models and Targets

Input models Output formats
Any scikit-learn estimator with predict() Pure C (C99)
Decision Tree, Random Forest, SVM, MLP... C++11 (class + namespace)
Classification and Regression tasks Arduino (.h with PROGMEM)
MicroPython (.py module)

Installation

pip install blackbox2c

Requirements: Python 3.8+, NumPy >= 1.21, scikit-learn >= 1.0.

Tip: Use a virtual environment to keep your project isolated:

# python -m venv .venv && source .venv/bin/activate  # Linux/macOS
python -m venv .venv && .venv\Scripts\activate     # Windows
pip install blackbox2c

For development (from source):

git clone https://github.com/AxelSkrauba/BlackBox2C.git
cd BlackBox2C
pip install -e ".[dev]"

Quick Start

Classification

from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris
from blackbox2c import convert

iris = load_iris()
model = RandomForestClassifier(n_estimators=50, random_state=42)
model.fit(iris.data, iris.target)

# Convert to C (default target)
c_code = convert(
    model,
    iris.data,
    feature_names=list(iris.feature_names),
    class_names=list(iris.target_names),
    max_depth=5,
)
print(c_code)

Generated output:

/*
 * Auto-generated C code by BlackBox2C
 *   - Input features: 4
 *   - Output classes: 3
 *   - Precision: 8-bit
 */
#include <stdint.h>

#define setosa 0
#define versicolor 1
#define virginica 2

uint8_t predict(float features[4]) {
    if (features[2] <= 2.449999f) {
        return 0;
    } else {
        if (features[3] <= 1.750000f) {
            return 1;
        } else {
            return 2;
        }
    }
}

Export to Other Formats

# Arduino .ino file
arduino_code = convert(model, iris.data, target='arduino')

# C++ class
cpp_code = convert(model, iris.data, target='cpp')

# MicroPython module
mp_code = convert(model, iris.data, target='micropython')

Regression

from sklearn.ensemble import GradientBoostingRegressor
from sklearn.datasets import load_diabetes
from blackbox2c import convert

data = load_diabetes()
model = GradientBoostingRegressor(random_state=42)
model.fit(data.data, data.target)

c_code = convert(model, data.data, max_depth=5)
# Generates: float predict(float features[10]) { ... }

Feature Analysis

from blackbox2c.analysis import FeatureSensitivityAnalyzer

analyzer = FeatureSensitivityAnalyzer(n_repeats=10, random_state=42)
results = analyzer.analyze(model, X_train, y_train, feature_names=feature_names)
print(results.summary())

# Get top 3 most important features by index
top3 = results.get_top_features(3)

Configuration

from blackbox2c import Converter, ConversionConfig

config = ConversionConfig(
    max_depth=5,             # Surrogate tree depth (1-10, default 5)
    optimize_rules='medium', # 'low' | 'medium' | 'high'
    use_fixed_point=False,   # Use integer arithmetic instead of float
    precision=8,             # Bit width for fixed-point: 8 | 16 | 32
    function_name='predict', # Name of the generated function
    n_samples=10000,         # Synthetic samples for surrogate training
    feature_threshold=None,  # Auto-select N most important features
    memory_budget_kb=None,   # Auto-tune params to fit a KB budget
)

converter = Converter(config)
code = converter.convert(model, X_train, target='arduino')
metrics = converter.get_metrics()
# {'fidelity': 0.97, 'complexity': {...}, 'size_estimate': {...}}

CLI

# Convert a pickled model to C
blackbox2c convert model.pkl X_train.npy -o output.c

# Export to Arduino
blackbox2c convert model.pkl X_train.npy -t arduino -o predict.h

# Analyze feature importance
blackbox2c analyze model.pkl X_train.npy --top-n 5

# Export a decision tree directly (no surrogate extraction)
blackbox2c export model.pkl -f cpp -o predictor.hpp

# Help
blackbox2c --help
blackbox2c convert --help

Benchmarks

python benchmarks/benchmark_classic_datasets.py --output results.md

Covers Iris, Wine, Diabetes, and California Housing with Decision Trees, Random Forests, SVMs, and Neural Networks. Metrics: fidelity, estimated FLASH size, tree depth, conversion time.

Note: Code size figures are estimates from BlackBox2C's built-in size estimator, not measurements on real hardware.


Project Structure

blackbox2c/
├── blackbox2c/
│   ├── __init__.py      # Public API: convert(), Converter, ConversionConfig
│   ├── converter.py     # Main orchestration pipeline
│   ├── config.py        # ConversionConfig dataclass
│   ├── surrogate.py     # Surrogate tree extraction
│   ├── codegen.py       # C code generation
│   ├── optimizer.py     # Rule pruning and merging
│   ├── exporters.py     # C++, Arduino, MicroPython exporters
│   ├── analysis.py      # Feature sensitivity analysis
│   └── cli.py           # Command-line interface
├── tests/               # 167 tests, >91% coverage
├── notebooks/           # Jupyter notebook examples (runnable on Colab)
├── benchmarks/          # Classic dataset benchmarks
├── examples/            # Script-based end-to-end examples
└── docs/                # MkDocs documentation source

Comparison with Alternatives

Feature BlackBox2C emlearn MicroMLGen TFLite Micro
Any sklearn model ⚠️ Trees only ⚠️ Trees only ❌ TF only
Pure if-else output
C++ / Arduino / MicroPython ⚠️ Partial ⚠️ Partial
Feature selection built-in
Memory budget control ⚠️
Zero runtime dependencies

Roadmap (v0.2)

  • Quine-McCluskey and BDD rule optimization
  • Hardware-validated benchmarks on real MCUs
  • Quantization-aware training integration

License

MIT — see LICENSE.

Contributing

Issues and PRs welcome at github.com/AxelSkrauba/BlackBox2C.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

blackbox2c-0.1.1.tar.gz (59.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

blackbox2c-0.1.1-py3-none-any.whl (31.1 kB view details)

Uploaded Python 3

File details

Details for the file blackbox2c-0.1.1.tar.gz.

File metadata

  • Download URL: blackbox2c-0.1.1.tar.gz
  • Upload date:
  • Size: 59.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for blackbox2c-0.1.1.tar.gz
Algorithm Hash digest
SHA256 5bc57a7808bb6f8b4462a872941b4b807dd37d7ae79ee3312a9ba2fd5ec5d846
MD5 7a0e2e98c4a38d3aef7374de62deecfd
BLAKE2b-256 0182b9dbf8dcb66ad8d24e9d0f32b04aa73c52eec91c79026d25339700a3c095

See more details on using hashes here.

Provenance

The following attestation bundles were made for blackbox2c-0.1.1.tar.gz:

Publisher: publish.yml on AxelSkrauba/BlackBox2C

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file blackbox2c-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: blackbox2c-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 31.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for blackbox2c-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 0a9f2067ab5879d22d2922bd06e603a9aa0edd58eb06abfda1f9022bade0e1be
MD5 5459ae2cd32a74309941db06db42601f
BLAKE2b-256 56a0d56db395985c54222ab05ec962e151871f7ddbb32d2fc2a965f3072c2a49

See more details on using hashes here.

Provenance

The following attestation bundles were made for blackbox2c-0.1.1-py3-none-any.whl:

Publisher: publish.yml on AxelSkrauba/BlackBox2C

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page