Skip to main content

Modern decision trees in Python

Project description

Code style: black CircleCI Main Checked with mypy codecov PyPI Download count Latest PyPI release

scikit-tree

scikit-tree is a scikit-learn compatible API for building state-of-the-art decision trees. These include unsupervised trees, oblique trees, uncertainty trees, quantile trees and causal trees.

Tree-models have withstood the test of time, and are consistently used for modern-day data science and machine learning applications. They especially perform well when there are limited samples for a problem and are flexible learners that can be applied to a wide variety of different settings, such as tabular, images, time-series, genomics, EEG data and more.

Documentation

See here for the documentation for our dev version: https://docs.neurodata.io/scikit-tree/dev/index.html

Why oblique trees and why trees beyond those in scikit-learn?

In 2001, Leo Breiman proposed two types of Random Forests. One was known as Forest-RI, which is the axis-aligned traditional random forest. One was known as Forest-RC, which is the random oblique linear combinations random forest. This leveraged random combinations of features to perform splits. MORF builds upon Forest-RC by proposing additional functions to combine features. Other modern tree variants such as Canonical Correlation Forests (CCF), Extended Isolation Forests, Quantile Forests, or unsupervised random forests are also important at solving real-world problems using robust decision tree models.

Installation

Our installation will try to follow scikit-learn installation as close as possible, as we contain Cython code subclassed, or inspired by the scikit-learn tree submodule.

Dependencies

We minimally require:

* Python (>=3.9)
* numpy
* scipy
* scikit-learn >= 1.3

Installation with Pip (https://pypi.org/project/scikit-tree/)

Installing with pip on a conda environment is the recommended route.

pip install scikit-tree

Building locally with Meson (For developers)

Make sure you have the necessary packages installed

# install build dependencies
pip install numpy scipy meson ninja meson-python Cython scikit-learn scikit-learn-tree

# you may need these optional dependencies to build scikit-learn locally
conda install -c conda-forge joblib threadpoolctl pytest compilers llvm-openmp

We use the spin CLI to abstract away build details:

# run the build using Meson/Ninja
./spin build

# you can run the following command to see what other options there are
./spin --help
./spin build --help

# For example, you might want to start from a clean build
./spin build --clean

# or build in parallel for faster builds
./spin build -j 2

# you will need to double check the build-install has the proper path
# this might be different from machine to machine
export PYTHONPATH=${PWD}/build-install/usr/lib/python3.9/site-packages

# run specific unit tests
./spin test -- sktree/tree/tests/test_tree.py

# you can bring up the CLI menu
./spin --help

You can also do the same thing using Meson/Ninja itself. Run the following to build the local files:

# generate ninja make files
meson build --prefix=$PWD/build

# compile
ninja -C build

# install scikit-tree package
meson install -C build

export PYTHONPATH=${PWD}/build/lib/python3.9/site-packages

# to check installation, you need to be in a different directory
cd docs;  
python -c "from sktree import tree"
python -c "import sklearn; print(sklearn.__version__);"

After building locally, you can use editable installs (warning: this only registers Python changes locally)

pip install --no-build-isolation --editable .

Development

We welcome contributions for modern tree-based algorithms. We use Cython to achieve fast C/C++ speeds, while abiding by a scikit-learn compatible (tested) API. Moreover, our Cython internals are easily extensible because they follow the internal Cython API of scikit-learn as well.

Due to the current state of scikit-learn's internal Cython code for trees, we have to instead leverage a fork of scikit-learn at https://github.com/neurodata/scikit-learn when extending the decision tree model API of scikit-learn. Specifically, we extend the Python and Cython API of the tree submodule in scikit-learn in our submodule, so we can introduce the tree models housed in this package. Thus these extend the functionality of decision-tree based models in a way that is not possible yet in scikit-learn itself. As one example, we introduce an abstract API to allow users to implement their own oblique splits. Our plan in the future is to benchmark these functionalities and introduce them upstream to scikit-learn where applicable and inclusion criterion are met.

References

[1]: Li, Adam, et al. "Manifold Oblique Random Forests: Towards Closing the Gap on Convolutional Deep Networks" SIAM Journal on Mathematics of Data Science, 5(1), 77-96, 2023

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scikit_tree-0.2.0.tar.gz (14.6 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

scikit_tree-0.2.0-cp311-cp311-win_amd64.whl (13.3 MB view details)

Uploaded CPython 3.11Windows x86-64

scikit_tree-0.2.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.2 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

scikit_tree-0.2.0-cp311-cp311-macosx_11_0_arm64.whl (1.7 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

scikit_tree-0.2.0-cp311-cp311-macosx_10_9_x86_64.whl (1.8 MB view details)

Uploaded CPython 3.11macOS 10.9+ x86-64

scikit_tree-0.2.0-cp310-cp310-win_amd64.whl (13.3 MB view details)

Uploaded CPython 3.10Windows x86-64

scikit_tree-0.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.2 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

scikit_tree-0.2.0-cp310-cp310-macosx_11_0_arm64.whl (1.7 MB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

scikit_tree-0.2.0-cp310-cp310-macosx_10_9_x86_64.whl (1.8 MB view details)

Uploaded CPython 3.10macOS 10.9+ x86-64

scikit_tree-0.2.0-cp39-cp39-win_amd64.whl (13.3 MB view details)

Uploaded CPython 3.9Windows x86-64

scikit_tree-0.2.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.2 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64

scikit_tree-0.2.0-cp39-cp39-macosx_11_0_arm64.whl (1.7 MB view details)

Uploaded CPython 3.9macOS 11.0+ ARM64

scikit_tree-0.2.0-cp39-cp39-macosx_10_9_x86_64.whl (1.8 MB view details)

Uploaded CPython 3.9macOS 10.9+ x86-64

File details

Details for the file scikit_tree-0.2.0.tar.gz.

File metadata

  • Download URL: scikit_tree-0.2.0.tar.gz
  • Upload date:
  • Size: 14.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.15

File hashes

Hashes for scikit_tree-0.2.0.tar.gz
Algorithm Hash digest
SHA256 ad344550521dce78c5f9105d12571e9156af6da06a3866c72b663d1e2235a345
MD5 a434d25bc940ba20a40633f21a15ef89
BLAKE2b-256 436af68cc15da7e322f7970b73e5262bf643dc74f277588fc9d4f06cbb5d1bc4

See more details on using hashes here.

File details

Details for the file scikit_tree-0.2.0-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for scikit_tree-0.2.0-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 6ded495382f96d14783445b734054fa522e4989ca52d5d84c8d04e7326217226
MD5 0c7c8e8d90cf2809cef575cd627c011f
BLAKE2b-256 8fc5700a7c3d8539ca22d99af0d9ebd8bb97414ca02a80a07008ff040c4e6e28

See more details on using hashes here.

File details

Details for the file scikit_tree-0.2.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for scikit_tree-0.2.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 fceb99564dd9be4148e24dae4e66967b1cc26311d3bc842d24bbbe86c235835a
MD5 7e54bc60859d21e36f149bb8d1f6f058
BLAKE2b-256 bab385c8813e2388bb2ba6fdc5ce000cd4f4723dbfbfd2ff28bad39c641dd8cb

See more details on using hashes here.

File details

Details for the file scikit_tree-0.2.0-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for scikit_tree-0.2.0-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 83fd5409326520e69c8dc28ccab6a25cf1e604935861ab7915fa5d1d17bb9d73
MD5 4b1ab8041f8fb75843eb9840c6a2f8af
BLAKE2b-256 bd5a1e75bebdf2ed271a4f67163c03d59cedbdfe7af4f6c6272ed37d87acf01b

See more details on using hashes here.

File details

Details for the file scikit_tree-0.2.0-cp311-cp311-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for scikit_tree-0.2.0-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 a07ad6bb34abc3ca53ec2ec67fcebf166d33599835051e98982a8411bc38461c
MD5 1490cf4f461db74d04b8387418109bdd
BLAKE2b-256 26be3e8ee9096dc85aea9c065f4761bb472b8a402f23b00f93d7312bc6fb002b

See more details on using hashes here.

File details

Details for the file scikit_tree-0.2.0-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for scikit_tree-0.2.0-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 70c160c73ec742260365c64f63abac85fa6e2019b20b329f6ffb6070898db9f0
MD5 6cda63d27e25133dfd74d0d39e244d00
BLAKE2b-256 6c6abd68a07638093cb8b35f56bbb4da4bec6d6b09026dbc58745df4124683fb

See more details on using hashes here.

File details

Details for the file scikit_tree-0.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for scikit_tree-0.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 8b7457dc7b7a209dc1bffbb042e6ce8096e3764c9173fe031b6c926c98e5fa7e
MD5 d32eefb73ddcb8db01370838e717e6b2
BLAKE2b-256 0c5937e19b85ca6493bfe86b0451c1dee5a9d518f8e75e356443e2223ae9c040

See more details on using hashes here.

File details

Details for the file scikit_tree-0.2.0-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for scikit_tree-0.2.0-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 742879d486d168bde5471848d7a4cbba12f5b9e3702ae3a5ec978429398f5557
MD5 9fdda9f6248922e35b658ee74e5addf5
BLAKE2b-256 d64bc86b18e25d16654a45db7ae4c36455c1a3fd912fcbeda804a36baca40860

See more details on using hashes here.

File details

Details for the file scikit_tree-0.2.0-cp310-cp310-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for scikit_tree-0.2.0-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 fcb8d4adba7fd991679ee7d95a3ec143d942d9408a0be2e33bd1a9dcfa62ca67
MD5 638b5361274249c5cda98ed48ebc8d74
BLAKE2b-256 0c7abf8c3de7af0e1adee55b5d2aef388d49ac10798d1889d4f4e53d726d5cdb

See more details on using hashes here.

File details

Details for the file scikit_tree-0.2.0-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: scikit_tree-0.2.0-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 13.3 MB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.15

File hashes

Hashes for scikit_tree-0.2.0-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 736590c80349fe377935f69aa2cde1bfc6347c0ae8dc6e60b792c4c7c2ba7b0f
MD5 a76f9cb27b22d049c4ea7e70ddab270d
BLAKE2b-256 99422a076bf0226de42deeec7b78e8737e26d3128d56836a78590d753a0f1554

See more details on using hashes here.

File details

Details for the file scikit_tree-0.2.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for scikit_tree-0.2.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 a4307901d65a6c4d1d6f01011b5580d60d8d22b0699c9f348a8fb7b152a972a6
MD5 65eadd5b6ef5de56ba95fe5c2f016066
BLAKE2b-256 35bbd42a9b28175352cf31c5b8a5a464469b12311c9e9770236756fce6f5fe8a

See more details on using hashes here.

File details

Details for the file scikit_tree-0.2.0-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for scikit_tree-0.2.0-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 691677fd962f7676b1a074a60def25009583564f3bf1d281bab63f51ae00c43d
MD5 681eb34c0fc8b780d10f429f0935f3c2
BLAKE2b-256 87def447b1fa62e1027482ba80b643f223d4cc19c85c4d7f8c6906bbf199387a

See more details on using hashes here.

File details

Details for the file scikit_tree-0.2.0-cp39-cp39-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for scikit_tree-0.2.0-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 3530d955c7566a9c310865b8e1e7f73b3072366287c955aa932e556f1b7592b0
MD5 eb3bb66972ef2a633bd45587f877810b
BLAKE2b-256 b9442d2d95ae45b983b5fe86e1f5a9515cbd57e7a555109e158e6377b9e4e4e1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page