Skip to main content

Modern decision trees in Python

Project description

Code style: black CircleCI Main Checked with mypy codecov PyPI Download count Latest PyPI release DOI

scikit-tree

scikit-tree is a scikit-learn compatible API for building state-of-the-art decision trees. These include unsupervised trees, oblique trees, uncertainty trees, quantile trees and causal trees.

Tree-models have withstood the test of time, and are consistently used for modern-day data science and machine learning applications. They especially perform well when there are limited samples for a problem and are flexible learners that can be applied to a wide variety of different settings, such as tabular, images, time-series, genomics, EEG data and more.

Documentation

See here for the documentation for our dev version: https://docs.neurodata.io/scikit-tree/dev/index.html

Why oblique trees and why trees beyond those in scikit-learn?

In 2001, Leo Breiman proposed two types of Random Forests. One was known as Forest-RI, which is the axis-aligned traditional random forest. One was known as Forest-RC, which is the random oblique linear combinations random forest. This leveraged random combinations of features to perform splits. MORF builds upon Forest-RC by proposing additional functions to combine features. Other modern tree variants such as Canonical Correlation Forests (CCF), Extended Isolation Forests, Quantile Forests, or unsupervised random forests are also important at solving real-world problems using robust decision tree models.

Installation

Our installation will try to follow scikit-learn installation as close as possible, as we contain Cython code subclassed, or inspired by the scikit-learn tree submodule.

Dependencies

We minimally require:

* Python (>=3.9)
* numpy
* scipy
* scikit-learn >= 1.3

Installation with Pip (https://pypi.org/project/scikit-tree/)

Installing with pip on a conda environment is the recommended route.

pip install scikit-tree

Building locally with Meson (For developers)

Make sure you have the necessary packages installed

# install build dependencies
pip install -r build_requirements.txt

# you may need these optional dependencies to build scikit-learn locally
conda install -c conda-forge joblib threadpoolctl pytest compilers llvm-openmp

We use the spin CLI to abstract away build details:

# run the build using Meson/Ninja
./spin build

# you can run the following command to see what other options there are
./spin --help
./spin build --help

# For example, you might want to start from a clean build
./spin build --clean

# or build in parallel for faster builds
./spin build -j 2

# you will need to double check the build-install has the proper path
# this might be different from machine to machine
export PYTHONPATH=${PWD}/build-install/usr/lib/python3.9/site-packages

# run specific unit tests
./spin test -- sktree/tree/tests/test_tree.py

# you can bring up the CLI menu
./spin --help

You can also do the same thing using Meson/Ninja itself. Run the following to build the local files:

# generate ninja make files
meson build --prefix=$PWD/build

# compile
ninja -C build

# install scikit-tree package
meson install -C build

export PYTHONPATH=${PWD}/build/lib/python3.9/site-packages

# to check installation, you need to be in a different directory
cd docs;  
python -c "from sktree import tree"
python -c "import sklearn; print(sklearn.__version__);"

After building locally, you can use editable installs (warning: this only registers Python changes locally)

pip install --no-build-isolation --editable .

Or if you have spin v0.8+ installed, you can just run directly

spin install

Development

We welcome contributions for modern tree-based algorithms. We use Cython to achieve fast C/C++ speeds, while abiding by a scikit-learn compatible (tested) API. Moreover, our Cython internals are easily extensible because they follow the internal Cython API of scikit-learn as well.

Due to the current state of scikit-learn's internal Cython code for trees, we have to instead leverage a fork of scikit-learn at https://github.com/neurodata/scikit-learn when extending the decision tree model API of scikit-learn. Specifically, we extend the Python and Cython API of the tree submodule in scikit-learn in our submodule, so we can introduce the tree models housed in this package. Thus these extend the functionality of decision-tree based models in a way that is not possible yet in scikit-learn itself. As one example, we introduce an abstract API to allow users to implement their own oblique splits. Our plan in the future is to benchmark these functionalities and introduce them upstream to scikit-learn where applicable and inclusion criterion are met.

References

[1]: Li, Adam, et al. "Manifold Oblique Random Forests: Towards Closing the Gap on Convolutional Deep Networks" SIAM Journal on Mathematics of Data Science, 5(1), 77-96, 2023

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scikit_tree-0.5.0.tar.gz (16.2 MB view details)

Uploaded Source

Built Distributions

scikit_tree-0.5.0-cp311-cp311-win_amd64.whl (5.0 MB view details)

Uploaded CPython 3.11 Windows x86-64

scikit_tree-0.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.6 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

scikit_tree-0.5.0-cp311-cp311-macosx_11_0_arm64.whl (2.0 MB view details)

Uploaded CPython 3.11 macOS 11.0+ ARM64

scikit_tree-0.5.0-cp311-cp311-macosx_10_9_x86_64.whl (2.2 MB view details)

Uploaded CPython 3.11 macOS 10.9+ x86-64

scikit_tree-0.5.0-cp310-cp310-win_amd64.whl (5.0 MB view details)

Uploaded CPython 3.10 Windows x86-64

scikit_tree-0.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.6 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

scikit_tree-0.5.0-cp310-cp310-macosx_11_0_arm64.whl (2.0 MB view details)

Uploaded CPython 3.10 macOS 11.0+ ARM64

scikit_tree-0.5.0-cp310-cp310-macosx_10_9_x86_64.whl (2.2 MB view details)

Uploaded CPython 3.10 macOS 10.9+ x86-64

scikit_tree-0.5.0-cp39-cp39-win_amd64.whl (5.0 MB view details)

Uploaded CPython 3.9 Windows x86-64

scikit_tree-0.5.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.7 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

scikit_tree-0.5.0-cp39-cp39-macosx_11_0_arm64.whl (2.0 MB view details)

Uploaded CPython 3.9 macOS 11.0+ ARM64

scikit_tree-0.5.0-cp39-cp39-macosx_10_9_x86_64.whl (2.2 MB view details)

Uploaded CPython 3.9 macOS 10.9+ x86-64

File details

Details for the file scikit_tree-0.5.0.tar.gz.

File metadata

  • Download URL: scikit_tree-0.5.0.tar.gz
  • Upload date:
  • Size: 16.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for scikit_tree-0.5.0.tar.gz
Algorithm Hash digest
SHA256 c5aa5b18d89bc498789dc6e9e007c6d166ff93ab59fdbc8c37371a768f96e829
MD5 3cbcef2f12041ce69d6f10473a2c3b20
BLAKE2b-256 8837b495b7e0cb5d1a55d5da9b857708c579830aff2981c625a7ae31b6bd7718

See more details on using hashes here.

File details

Details for the file scikit_tree-0.5.0-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for scikit_tree-0.5.0-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 d43ad01ced6e5ef0d28eda4cfec4c2cf356914675b3e4a0fea1001f5cf8c08c9
MD5 fbca0f78311a741e9b6cd28381c0b153
BLAKE2b-256 e5bbb8e9cf6a5072ab045c12cd45a1be661489c06066e9463ee87a21ec0a7816

See more details on using hashes here.

File details

Details for the file scikit_tree-0.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for scikit_tree-0.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 c15b6f2b8d565a7a49d812ee0756d3fd99ee5902d3ad8a83f539fee9c179445d
MD5 02fc9053f880a86b4a93ab656cb73589
BLAKE2b-256 6021017cfc09e31472d00209de24e0b7778d68be38fcbc481e6dae870d350126

See more details on using hashes here.

File details

Details for the file scikit_tree-0.5.0-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for scikit_tree-0.5.0-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 8fbaa38cf219711e92235385ad78be2b15f9d9dc363e4bbb88031fde24e9bda5
MD5 0675df3c6ee6dbe02387ae28758cd6a4
BLAKE2b-256 a15312a12ea99d5385476ebd1182e6e334b1b3091d40b562a55a0b87d62eb486

See more details on using hashes here.

File details

Details for the file scikit_tree-0.5.0-cp311-cp311-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for scikit_tree-0.5.0-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 6235ad6ffddd7b2e24cf80b561c188b8867f509ab3a528736bfe223ac9a17750
MD5 fb8f6eed6ec8c618ac08e7060f394eb8
BLAKE2b-256 f9d56005bfdd4cffb1099cb8fd5eb0bcf71e4eb53117b708d901e13432413263

See more details on using hashes here.

File details

Details for the file scikit_tree-0.5.0-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for scikit_tree-0.5.0-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 0f4c0df2caa7be7c537181a6d949cc9bab44abe5641c4d347112f04a64090586
MD5 9bad855a19d06fcd616b9f16df0b642e
BLAKE2b-256 9f92f09dbf9c1594cd5e2a4a1e1b215eabc9def1f81656cceac4ae228cee181c

See more details on using hashes here.

File details

Details for the file scikit_tree-0.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for scikit_tree-0.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 41bd324ec40cc639b8e453920130c5350ff32478335a0f263438df899a131be4
MD5 6980358c7aa42db5c17e481ea4e21810
BLAKE2b-256 e07c3390ca5a5af57526168fef5c60787673566dd79d5d1344c7d855a79a37c9

See more details on using hashes here.

File details

Details for the file scikit_tree-0.5.0-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for scikit_tree-0.5.0-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 36555a35641032c19dedb66a30e5a277c0470b9c22fb09d30d500b5a43c43884
MD5 2b44178a7c2d3cdd5d78d40a487c0f5d
BLAKE2b-256 2a15812f4a68ff07f9d89433cdbb04df54841758cda0cdf6d52c9b326ec3b3d8

See more details on using hashes here.

File details

Details for the file scikit_tree-0.5.0-cp310-cp310-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for scikit_tree-0.5.0-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 ca0f7be10c05b194edd4dc3e41f8775aee1433c1939f320091f13ae28cbb74fa
MD5 97e4ae558e4bc5892ddad064bb82bdd1
BLAKE2b-256 bf6ed81402f35c278b314cdedda8811ea0ca55bf2aad1eede382595d7b6d66ef

See more details on using hashes here.

File details

Details for the file scikit_tree-0.5.0-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for scikit_tree-0.5.0-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 866e23a63df824d0a678279b5fb2b1dd6db6661bfdac7c60e2935d6b743a3127
MD5 cf2d40c3b139537d82876d9461b4d129
BLAKE2b-256 39e99dc85cd6ef0bf09726f76b38ea2eb2c81c9636d2ac103a890bc4d1f7cce3

See more details on using hashes here.

File details

Details for the file scikit_tree-0.5.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for scikit_tree-0.5.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 3ce684309391d4070fcedb1fcd443f9b76515a970fece29aebb8c23a5008e6d9
MD5 2ac28bd3d5d61a10e5f410dc5da1f8ee
BLAKE2b-256 16138894ea74da837e9b62fab4c29bddcda3c1cc3a80b10325cf8c05fe5bdc2b

See more details on using hashes here.

File details

Details for the file scikit_tree-0.5.0-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for scikit_tree-0.5.0-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 69e4e4bf0befe659d77c75b977cef0a94e3cada880882b0ca33109db48f61546
MD5 5ff0aaf446b9f512512856bc5e164adb
BLAKE2b-256 53879ac34a2ff0accdd3703da22abce71c4a7f5b4076eea020e3db974fe0b047

See more details on using hashes here.

File details

Details for the file scikit_tree-0.5.0-cp39-cp39-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for scikit_tree-0.5.0-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 6c1d945819bf32926574fa865478faf60c325d0e3d4d2852562f123f9f4e39a2
MD5 0b5b24f966928f1d40e6e7a95da0df57
BLAKE2b-256 e686e67e580045f8283a24622e128777ffa176b34cc3142413428ab033710a21

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page