Skip to main content

Modern decision trees in Python

Project description

Code style: black CircleCI Main Checked with mypy codecov

scikit-tree

scikit-tree is a scikit-learn compatible API for building state-of-the-art decision trees. These include unsupervised trees, oblique trees, uncertainty trees, quantile trees and causal trees.

Tree-models have withstood the test of time, and are consistently used for modern-day data science and machine learning applications. They especially perform well when there are limited samples for a problem and are flexible learners that can be applied to a wide variety of different settings, such as tabular, images, time-series, genomics, EEG data and more.

We welcome contributions for modern tree-based algorithms. We use Cython to achieve fast C/C++ speeds, while abiding by a scikit-learn compatible (tested) API. Moreover, our Cython internals are easily extensible because they follow the internal Cython API of scikit-learn as well.

Submodule dependency on a fork of scikit-learn Due to the current state of scikit-learn's internal Cython code for trees, we have to instead leverage a maintained fork of scikit-learn at https://github.com/neurodata/scikit-learn, where specifically, the fork branch is used to build and install this repo. We keep that fork well-maintained and up-to-date with respect to the main sklearn repo. The only difference is the refactoring of the tree/ submodule. This fork is used internally under the namespace sktree._lib.sklearn. It is necessary to use this fork for anything related to:

  • RandomForest*
  • ExtraTrees*
  • or any importable items from the tree/ submodule, whether it is a Cython or Python object

If you are developing for scikit-tree, we will always depend on the most up-to-date commit of https://github.com/neurodata/scikit-learn/submodulev2 as a submodule within scikit-tee. This branch is consistently maintained for changes upstream that occur in the scikit-learn tree submodule. This ensures that our fork maintains consistency and robustness due to bug fixes and improvements upstream.

Documentation

See here for the documentation for our dev version: https://docs.neurodata.io/scikit-tree/dev/index.html

Why oblique trees and why trees beyond those in scikit-learn?

In 2001, Leo Breiman proposed two types of Random Forests. One was known as Forest-RI, which is the axis-aligned traditional random forest. One was known as Forest-RC, which is the random oblique linear combinations random forest. This leveraged random combinations of features to perform splits. MORF builds upon Forest-RC by proposing additional functions to combine features. Other modern tree variants such as Canonical Correlation Forests (CCF), or unsupervised random forests are also important at solving real-world problems using robust decision tree models.

Installation

Our installation will try to follow scikit-learn installation as close as possible, as we contain Cython code subclassed, or inspired by the scikit-learn tree submodule.

AS OF NOW, scikit-tree is in development stage and the installation is still finicky due to the upstream scikit-learn's stalled refactoring PRs of the tree submodule. Once those are merged, the installation will be simpler. The current recommended installation is done locally with meson.

Dependencies

We minimally require:

* Python (>=3.8)
* numpy
* scipy
* scikit-learn >= 1.3

Installation with Pip

pip install sktree

Building locally with Meson (RECOMMENDED)

Make sure you have the necessary packages installed

# install build dependencies
pip install numpy scipy meson ninja meson-python Cython scikit-learn scikit-learn-tree

# you may need these optional dependencies to build scikit-learn locally
conda install -c conda-forge joblib threadpoolctl pytest compilers llvm-openmp

We use the spin CLI to abstract away build details:

# run the build using Meson/Ninja
./spin build

# you can run the following command to see what other options there are
./spin --help
./spin build --help

# For example, you might want to start from a clean build
./spin build --clean

# or build in parallel for faster builds
./spin build -j 2

# you will need to double check the build-install has the proper path 
# this might be different from machine to machine
export PYTHONPATH=${PWD}/build-install/usr/lib/python3.9/site-packages

# run specific unit tests
./spin test -- sktree/tree/tests/test_tree.py

# you can bring up the CLI menu
./spin --help

You can also do the same thing using Meson/Ninja itself. Run the following to build the local files:

# generate ninja make files
meson build --prefix=$PWD/build

# compile
ninja -C build

# install scikit-tree package
meson install -C build

export PYTHONPATH=${PWD}/build/lib/python3.9/site-packages

# to check installation, you need to be in a different directory
cd docs;  
python -c "from sktree import tree"
python -c "import sklearn; print(sklearn.__version__);"

Alternatively, you can use editable installs

pip install --no-build-isolation --editable .

References

[1]: Li, Adam, et al. "Manifold Oblique Random Forests: Towards Closing the Gap on Convolutional Deep Networks." arXiv preprint arXiv:1909.11799 (2019)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scikit_tree-0.1.4.tar.gz (13.8 MB view details)

Uploaded Source

Built Distributions

scikit_tree-0.1.4-cp311-cp311-win_amd64.whl (12.4 MB view details)

Uploaded CPython 3.11 Windows x86-64

scikit_tree-0.1.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.9 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

scikit_tree-0.1.4-cp311-cp311-macosx_11_0_arm64.whl (1.3 MB view details)

Uploaded CPython 3.11 macOS 11.0+ ARM64

scikit_tree-0.1.4-cp311-cp311-macosx_10_9_x86_64.whl (1.5 MB view details)

Uploaded CPython 3.11 macOS 10.9+ x86-64

scikit_tree-0.1.4-cp310-cp310-win_amd64.whl (12.4 MB view details)

Uploaded CPython 3.10 Windows x86-64

scikit_tree-0.1.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.9 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

scikit_tree-0.1.4-cp310-cp310-macosx_11_0_arm64.whl (1.4 MB view details)

Uploaded CPython 3.10 macOS 11.0+ ARM64

scikit_tree-0.1.4-cp310-cp310-macosx_10_9_x86_64.whl (1.5 MB view details)

Uploaded CPython 3.10 macOS 10.9+ x86-64

scikit_tree-0.1.4-cp39-cp39-win_amd64.whl (12.5 MB view details)

Uploaded CPython 3.9 Windows x86-64

scikit_tree-0.1.4-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.9 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

scikit_tree-0.1.4-cp39-cp39-macosx_11_0_arm64.whl (1.4 MB view details)

Uploaded CPython 3.9 macOS 11.0+ ARM64

scikit_tree-0.1.4-cp39-cp39-macosx_10_9_x86_64.whl (1.5 MB view details)

Uploaded CPython 3.9 macOS 10.9+ x86-64

File details

Details for the file scikit_tree-0.1.4.tar.gz.

File metadata

  • Download URL: scikit_tree-0.1.4.tar.gz
  • Upload date:
  • Size: 13.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.9.10

File hashes

Hashes for scikit_tree-0.1.4.tar.gz
Algorithm Hash digest
SHA256 85c943776f10b4342e69ad8faa4e8aefbbd948c697b819cbc0c79542d6cfbc33
MD5 ce1c1ca1ec1bf54dab376e05d0cca55d
BLAKE2b-256 d4a39f30d8d7446d0432e7f607eece930fbb22966e2f101657bad04dc1811d5d

See more details on using hashes here.

File details

Details for the file scikit_tree-0.1.4-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for scikit_tree-0.1.4-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 469d4a4ca80a09c52aa0c2242ab9678140547c81d58b721fe3379ead2e891842
MD5 d489b3a40f156c2f15f1082cb55d9c83
BLAKE2b-256 31d4232a72270e8cabdc20f941068932d06aff9003b99ec681722030f6430960

See more details on using hashes here.

File details

Details for the file scikit_tree-0.1.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for scikit_tree-0.1.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 e25e90d5f21e12a8cffca9d12f0831ee2d50a938bb262d9a8837639615b16e46
MD5 b038a5ba2742bb5b4e245968a92cb0d5
BLAKE2b-256 fe6a6d80355e846224b7a8555b295feb481a22dd20cb0e029b512ef20ac5da47

See more details on using hashes here.

File details

Details for the file scikit_tree-0.1.4-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for scikit_tree-0.1.4-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 093a7c9dba2c308724aeb1b1a5c2174053a1afaceec1ab3adeb8c20752b45ab9
MD5 d78841d6a003dd13143c21cdc06e6e30
BLAKE2b-256 79433574afb73f9420b4a24adaccc2af763fefd9ef094aabd05d7bf4ed587919

See more details on using hashes here.

File details

Details for the file scikit_tree-0.1.4-cp311-cp311-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for scikit_tree-0.1.4-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 d2ef213327fa06861b61ec2068184965b9cf84a5f2263ab32c119197df3f939d
MD5 a19400941e3a650c4ae3727aae5092e4
BLAKE2b-256 5e85bb44c333d768c63e847101bbb1ecff01990a788a00211c26eeda173d4b4f

See more details on using hashes here.

File details

Details for the file scikit_tree-0.1.4-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for scikit_tree-0.1.4-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 ebde8f48d1d1ced4e8b79d3ead4621f3d5bc53112a17deda6089b39459879f91
MD5 b1c6585949a3663ec096364f3f0d68fb
BLAKE2b-256 916c41fab4f9ecbe84ae9ddf80b450955fc673bb468cae67aca12269c34dbced

See more details on using hashes here.

File details

Details for the file scikit_tree-0.1.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for scikit_tree-0.1.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 0f5b07c93306d7d627d732b0f7f85d9ebc7cf5bf78db0f8ed44c89d613e87d43
MD5 2990f9f20bc5de6b9e75df009e575d94
BLAKE2b-256 2a942cd73da2065384ad11dceda89e90fed50ce0f73c9ecaa0664d346545c52a

See more details on using hashes here.

File details

Details for the file scikit_tree-0.1.4-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for scikit_tree-0.1.4-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 b9a3ea5d539436f163b5f4376249b3971c968415097aadbd14412460f2b53400
MD5 c844b050a4f218df37c5005ac67cb1f3
BLAKE2b-256 9310a3780eef5d4deb049867e9377416c811fe9457aeb9d46b72064b96ed1f52

See more details on using hashes here.

File details

Details for the file scikit_tree-0.1.4-cp310-cp310-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for scikit_tree-0.1.4-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 150be0bb92a83a8fcae9c2578e80df4faa25ea230413a5269b4281180b9c0aa2
MD5 181a488e3df8bdf166341a2c453f4a48
BLAKE2b-256 a4a5fbe5068627958f50c53c4b4ac1bc4c396e3f927a6a3aec0224a0b17ad928

See more details on using hashes here.

File details

Details for the file scikit_tree-0.1.4-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for scikit_tree-0.1.4-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 c989c5b0c7462d0fed9ba8df14167a0177b7247def2534bdad971a2f68246079
MD5 eb9c7b30e7668e7b4a72022fb1114184
BLAKE2b-256 6413cfc482b80f689a47053a9974c2f722479a87249381292c79a31f85df5768

See more details on using hashes here.

File details

Details for the file scikit_tree-0.1.4-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for scikit_tree-0.1.4-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 9131b9e03e6e2845b6721bd2d331b77b324c3b513cb439a3bb17c66eaa66cd2f
MD5 ee03f61e8940101bf5f0784e0be837d1
BLAKE2b-256 807d3d8802fcf7670c4dfa9c497b1cda6fa6ff3147063b3976aea72e81e4e222

See more details on using hashes here.

File details

Details for the file scikit_tree-0.1.4-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for scikit_tree-0.1.4-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 a6639f4b7ddf0b6d982323361958c6d7811a066b555be2699ff63960e6209d9b
MD5 2c4c691b63d68978f69fa49e651b7ec7
BLAKE2b-256 824b802e5c8cac806021a22c40a25d65288978880ca6ee757408df0684d18834

See more details on using hashes here.

File details

Details for the file scikit_tree-0.1.4-cp39-cp39-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for scikit_tree-0.1.4-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 35f8f76845e466fa5bef37b98a48e9c16fcbe578ce55756fb473022f00356a8f
MD5 4939159c98c0c7c80a187460e7463902
BLAKE2b-256 96bcf5c0e9fea176d2faadb9af8cc30b80ba31ff59bfe150ca32c7e1bac03a6e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page