Skip to main content

Compiled scikit-learn decision trees for faster evaluation

Project description

Build Status PyPI

Installation

Released under the MIT License.

pip install sklearn-compiledtrees

Rationale

In some use cases, predicting given a model is in the hot-path, so speeding up decision tree evaluation is very useful.

An effective way of speeding up evaluation of decision trees can be to generate code representing the evaluation of the tree, compile that to optimized object code, and dynamically load that file via dlopen/dlsym or equivalent.

See https://courses.cs.washington.edu/courses/cse501/10au/compile-machlearn.pdf for a detailed discussion, and http://tullo.ch/articles/decision-tree-evaluation/ for a more pedagogical explanation and more benchmarks in C++.

This package implements compiled decision tree evaluation for the simple case of a single-output regression tree or ensemble.

It has been tested to work on both OS X and Linux. We do not currently support Windows platforms for compiled evaluation, although this should not be a signficant amount of work.

Usage

import compiledtrees
import sklearn.ensemble

X_train, y_train, X_test, y_test = ...

clf = ensemble.GradientBoostingRegressor()
clf.fit(X_train, y_train)

compiled_predictor = compiledtrees.CompiledRegressionPredictor(clf)
predictions = compiled_predictor.predict(X_test)

Benchmarks

For random forests, we see 5x to 8x speedup in evaluation. For gradient boosted ensembles, it’s between a 1.5x and 3x speedup in evaluation. This is due to the fact that gradient boosted trees already have an optimized prediction implementation.

There is a benchmark script attached that allows us to examine the performance of evaluation across a range of ensemble configurations and datasets.

In the graphs attached, GB is Gradient Boosted, RF is Random Forest, D1, etc correspond to setting max-depth=1, and B10 corresponds to setting max_leaf_nodes=10.

Graphs

for dataset in friedman1 friedman2 friedman3 uniform hastie; do
    python ../benchmarks/bench_compiled_tree.py \
        --iterations=10 \
        --num_examples=1000 \
        --num_features=50 \
        --dataset=$dataset \
        --max_estimators=300 \
        --num_estimator_values=6
done

timings3907426606273805268 timings-1162001441413946416 timings5617004024503483042 timings2681645894201472305 timings2070620222460516071

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sklearn-compiledtrees-1.1.1.tar.gz (46.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file sklearn-compiledtrees-1.1.1.tar.gz.

File metadata

File hashes

Hashes for sklearn-compiledtrees-1.1.1.tar.gz
Algorithm Hash digest
SHA256 9060dd4f7c143ecce0a2372d340aef0983275287686b04d0b00dfb7cb0f451e2
MD5 20a20adf352ac4af7835f2735b0bee08
BLAKE2b-256 9f717f2cca81984dec516581c38fd6059d074aa55a8602fbeadde82b55da820e

See more details on using hashes here.

File details

Details for the file sklearn-compiledtrees-1.1.1.macosx-10.8-x86_64.tar.gz.

File metadata

File hashes

Hashes for sklearn-compiledtrees-1.1.1.macosx-10.8-x86_64.tar.gz
Algorithm Hash digest
SHA256 50f953f1045ac7c178784a37253ceaaa3d4e708f2bf5ee554e96e20283ec3ac1
MD5 6a6020c14979cb512ae359b5b531cdd9
BLAKE2b-256 d3db26eaf5655759a10611e00d1d554f233e553937826133ab7ecf06079ed38d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page