Skip to main content

Convert ML models to optimized code for embedded systems

Project description

BlackBox2C

Convert scikit-learn models to native embedded code — C, C++, Arduino, MicroPython

Tests Python 3.8+ License: MIT Version

BlackBox2C converts any trained scikit-learn model into a minimal if-else decision tree in your target language. The generated code has zero runtime dependencies, runs on any microcontroller with a C compiler, and fits in a few hundred bytes of FLASH.


How It Works

  1. Surrogate extraction — A lightweight DecisionTree is trained to mimic any black-box model (Random Forest, SVM, MLP, etc.) by generating synthetic boundary samples and labeling them with the original model's predictions.
  2. Rule optimization — Redundant branches are pruned and similar leaves are merged to minimize code size.
  3. Code generation — The optimized tree is serialized as a pure if-else function in the target language.

Supported Models and Targets

Input models Output formats
Any scikit-learn estimator with predict() Pure C (C99)
Decision Tree, Random Forest, SVM, MLP... C++11 (class + namespace)
Classification and Regression tasks Arduino (.h with PROGMEM)
MicroPython (.py module)

Installation

# From source (development)
pip install -e .

# With dev dependencies (pytest, mkdocs, etc.)
pip install -e ".[dev]"

Requirements: Python 3.8+, NumPy >= 1.21, scikit-learn >= 1.0.


Quick Start

Classification

from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris
from blackbox2c import convert

iris = load_iris()
model = RandomForestClassifier(n_estimators=50, random_state=42)
model.fit(iris.data, iris.target)

# Convert to C (default target)
c_code = convert(
    model,
    iris.data,
    feature_names=list(iris.feature_names),
    class_names=list(iris.target_names),
    max_depth=5,
)
print(c_code)

Generated output:

/*
 * Auto-generated C code by BlackBox2C
 *   - Input features: 4
 *   - Output classes: 3
 *   - Precision: 8-bit
 */
#include <stdint.h>

#define setosa 0
#define versicolor 1
#define virginica 2

uint8_t predict(float features[4]) {
    if (features[2] <= 2.449999f) {
        return 0;
    } else {
        if (features[3] <= 1.750000f) {
            return 1;
        } else {
            return 2;
        }
    }
}

Export to Other Formats

# Arduino .h file
arduino_code = convert(model, iris.data, target='arduino')

# C++ class
cpp_code = convert(model, iris.data, target='cpp')

# MicroPython module
mp_code = convert(model, iris.data, target='micropython')

Regression

from sklearn.ensemble import GradientBoostingRegressor
from sklearn.datasets import load_diabetes
from blackbox2c import convert

data = load_diabetes()
model = GradientBoostingRegressor(random_state=42)
model.fit(data.data, data.target)

c_code = convert(model, data.data, max_depth=5)
# Generates: float predict(float features[10]) { ... }

Feature Analysis

from blackbox2c.analysis import FeatureSensitivityAnalyzer

analyzer = FeatureSensitivityAnalyzer(n_repeats=10, random_state=42)
results = analyzer.analyze(model, X_train, y_train, feature_names=feature_names)
print(results.summary())

# Get top 3 most important features by index
top3 = results.get_top_features(3)

Configuration

from blackbox2c import Converter, ConversionConfig

config = ConversionConfig(
    max_depth=5,             # Surrogate tree depth (1-10, default 5)
    optimize_rules='medium', # 'low' | 'medium' | 'high'
    use_fixed_point=False,   # Use integer arithmetic instead of float
    precision=8,             # Bit width for fixed-point: 8 | 16 | 32
    function_name='predict', # Name of the generated function
    n_samples=10000,         # Synthetic samples for surrogate training
    feature_threshold=None,  # Auto-select N most important features
    memory_budget_kb=None,   # Auto-tune params to fit a KB budget
)

converter = Converter(config)
code = converter.convert(model, X_train, target='arduino')
metrics = converter.get_metrics()
# {'fidelity': 0.97, 'complexity': {...}, 'size_estimate': {...}}

CLI

# Convert a pickled model to C
blackbox2c convert model.pkl X_train.npy -o output.c

# Export to Arduino
blackbox2c convert model.pkl X_train.npy -t arduino -o predict.h

# Analyze feature importance
blackbox2c analyze model.pkl X_train.npy --top-n 5

# Export a decision tree directly (no surrogate extraction)
blackbox2c export model.pkl -f cpp -o predictor.hpp

# Help
blackbox2c --help
blackbox2c convert --help

Benchmarks

python benchmarks/benchmark_classic_datasets.py --output results.md

Covers Iris, Wine, Diabetes, and California Housing with Decision Trees, Random Forests, SVMs, and Neural Networks. Metrics: fidelity, estimated FLASH size, tree depth, conversion time.

Note: Code size figures are estimates from BlackBox2C's built-in size estimator, not measurements on real hardware.


Project Structure

blackbox2c/
├── blackbox2c/
│   ├── __init__.py      # Public API: convert(), Converter, ConversionConfig
│   ├── converter.py     # Main orchestration pipeline
│   ├── config.py        # ConversionConfig dataclass
│   ├── surrogate.py     # Surrogate tree extraction
│   ├── codegen.py       # C code generation
│   ├── optimizer.py     # Rule pruning and merging
│   ├── exporters.py     # C++, Arduino, MicroPython exporters
│   ├── analysis.py      # Feature sensitivity analysis
│   └── cli.py           # Command-line interface
├── tests/               # 167 tests, >91% coverage
├── benchmarks/          # Classic dataset benchmarks
├── examples/            # End-to-end usage examples
└── docs/                # MkDocs documentation source

Comparison with Alternatives

Feature BlackBox2C emlearn MicroMLGen TFLite Micro
Any sklearn model ⚠️ Trees only ⚠️ Trees only ❌ TF only
Pure if-else output
C++ / Arduino / MicroPython ⚠️ Partial ⚠️ Partial
Feature selection built-in
Memory budget control ⚠️
Zero runtime dependencies

Roadmap (v0.2)

  • Quine-McCluskey and BDD rule optimization
  • Hardware-validated benchmarks on real MCUs
  • Quantization-aware training integration

License

MIT — see LICENSE.

Contributing

Issues and PRs welcome at github.com/AxelSkrauba/BlackBox2C.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

blackbox2c-0.1.0.tar.gz (56.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

blackbox2c-0.1.0-py3-none-any.whl (30.7 kB view details)

Uploaded Python 3

File details

Details for the file blackbox2c-0.1.0.tar.gz.

File metadata

  • Download URL: blackbox2c-0.1.0.tar.gz
  • Upload date:
  • Size: 56.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for blackbox2c-0.1.0.tar.gz
Algorithm Hash digest
SHA256 19850d04f7eb0b0f99e4973657079b69eb50f86f90969f3ff812a1436dee4755
MD5 118f865aaaa5bf628d3aa516f274a41b
BLAKE2b-256 ff81eaadb30f51685e57948dc51cefc552ecd0fc93ccf6e5f26e607d75297c0a

See more details on using hashes here.

File details

Details for the file blackbox2c-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: blackbox2c-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 30.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for blackbox2c-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 df9a1ef45fc6541c3d3467142b48038adc411ab22f56ae8e2f8bb8b40fa1fde9
MD5 a5d93a78b374c72b119ad3cdc208bf3e
BLAKE2b-256 2e2590a3f102c96fd157361fb3fb48d446d303152af5651d4c8f32a2be6c694b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page