This is a pre-production deployment of Warehouse. Changes made here affect the production instance of PyPI (pypi.python.org).
Help us improve Python packaging - Donate today!

Compiled scikit-learn decision trees for faster evaluation

Project Description

Installation

Released under the MIT License.

pip install sklearn-compiledtrees

Rationale

In some use cases, predicting given a model is in the hot-path, so speeding up decision tree evaluation is very useful.

An effective way of speeding up evaluation of decision trees can be to generate code representing the evaluation of the tree, compile that to optimized object code, and dynamically load that file via dlopen/dlsym or equivalent.

See https://courses.cs.washington.edu/courses/cse501/10au/compile-machlearn.pdf for a detailed discussion, and http://tullo.ch/articles/decision-tree-evaluation/ for a more pedagogical explanation and more benchmarks in C++.

This package implements compiled decision tree evaluation for the simple case of a single-output regression tree or ensemble.

It has been tested to work on both OS X and Linux. We do not currently support Windows platforms for compiled evaluation, although this should not be a signficant amount of work.

Usage

import compiledtrees
import sklearn.ensemble

X_train, y_train, X_test, y_test = ...

clf = ensemble.GradientBoostingRegressor()
clf.fit(X_train, y_train)

compiled_predictor = compiledtrees.CompiledRegressionPredictor(clf)
predictions = compiled_predictor.predict(X_test)

Benchmarks

For random forests, we see 5x to 8x speedup in evaluation. For gradient boosted ensembles, it’s between a 1.5x and 3x speedup in evaluation. This is due to the fact that gradient boosted trees already have an optimized prediction implementation.

There is a benchmark script attached that allows us to examine the performance of evaluation across a range of ensemble configurations and datasets.

In the graphs attached, GB is Gradient Boosted, RF is Random Forest, D1, etc correspond to setting max-depth=1, and B10 corresponds to setting max_leaf_nodes=10.

Graphs

for dataset in friedman1 friedman2 friedman3 uniform hastie; do
    python ../benchmarks/bench_compiled_tree.py \
        --iterations=10 \
        --num_examples=1000 \
        --num_features=50 \
        --dataset=$dataset \
        --max_estimators=300 \
        --num_estimator_values=6
done

Release History

Release History

This version
History Node

1.2

History Node

1.1.1

History Node

1.1

History Node

1.0.5

History Node

1.0.4

History Node

1.0.3

History Node

1.0.2

History Node

1.0.1

History Node

1.0

Download Files

Download Files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

File Name & Checksum SHA256 Checksum Help Version File Type Upload Date
sklearn-compiledtrees-1.2.macosx-10.9-x86_64.tar.gz (30.1 kB) Copy SHA256 Checksum SHA256 2.7 Dumb Binary Apr 4, 2014
sklearn-compiledtrees-1.2.tar.gz (46.7 kB) Copy SHA256 Checksum SHA256 Source Apr 4, 2014

Supported By

WebFaction WebFaction Technical Writing Elastic Elastic Search Pingdom Pingdom Monitoring Dyn Dyn DNS Sentry Sentry Error Logging CloudAMQP CloudAMQP RabbitMQ Heroku Heroku PaaS Kabu Creative Kabu Creative UX & Design Fastly Fastly CDN DigiCert DigiCert EV Certificate Rackspace Rackspace Cloud Servers DreamHost DreamHost Log Hosting