Skip to main content

A Python package for fitting Quinlan's Cubist regression model.

Project description

Cubist

A Python package for fitting Quinlan's Cubist v2.07 regression model. Inspired by and based on the R wrapper for Cubist. Designed after and inherits from the scikit-learn framework.

Background

Cubist is a regression algorithm develped by John Ross Quinlan for generating rule-based predictive models. This has been available in the R world thanks to the work of Max Kuhn and his colleagues. With this package it is introduced to the Python ecosystem and made scikit-learn compatible for easy use with existing data and model pipelines.

Advantages

Unlike other ensemble models such as RandomForest and XGBoost, Cubist generates a set of rules, making it easy to understand precisely how the model makes it's predictive decisions. Thus tools such as SHAP and LIME are not needed as Cubist doesn't exhibit black box behavior. Like XGBoost, Cubist can perform boosting by the addition of more models (here called committees) that correct for the error of prior models (i.e. the second model created corrects for the prediction error of the first, the third for the error of the second, etc.). In addition to boosting, the model can perform instance-based (nearest-neighbor) corrections to create composite models, thus combining the advantages of these two methods.

Use

>>> from sklearn.datasets import load_boston
>>> from cubist import Cubist
>>> X, y = load_boston(return_X_y=True)
>>> model = Cubist()
>>> model.fit(X, y)
>>> model.predict(X)
>>> model.score(X, y)

Model Parameters

The following parameters can be passed as arguments to the Cubist() class instantiation:

  • n_rules (int, default=500): Limit of the number of rules Cubist will build. Recommended value is 500.
  • n_committees (int, default=1): Number of committees to construct. Each committee is a rule based model and beyond the first tries to correct the prediction errors of the prior constructed model. Recommended value is 5.
  • neighbors (int, default=1): Number between 1 and 9 for how many instances should be used to correct the rule-based prediction.
  • unbiased (bool, default=False): Should unbiased rules be used? Since Cubist minimizes the MAE of the predicted values, the rules may be biased and the mean predicted value may differ from the actual mean. This is recommended when there are frequent occurrences of the same value in a training dataset. Note that MAE may be slightly higher.
  • extrapolation (float, default=0.05): Adjusts how much rule predictions are adjusted to be consistent with the training dataset. Recommended value is 5% as a decimal (0.05)
  • sample (float, default=0.0): Percentage of the data set to be randomly selected for model building.
  • random_state (int, default=randint(0, 4095)): An integer to set the random seed for the C Cubist code.
  • target_label (str, default="outcome"): A label for the outcome variable. This is only used for printing rules.
  • verbose (int, default=0) Should the Cubist output be printed? 1 if yes, 0 if no.

Model Attributes

The following attributes are exposed to understand the Cubist model results:

  • feature_importances_ (pd.DataFrame): Table of how training data variables are used in the Cubist model.
  • rules_ (pd.DataFrame): Table of the rules built by the Cubist model.
  • coeff_ (pd.DataFrame): Table of the regression coefficients found by the Cubist model.
  • variables_ (dict): Information about all the variables passed to the model and those that were actually used.

Benchmarks

From literature, there are examples of Cubist outperforming RandomForest and other boostrapped/boosted models, to demonstrate this, the following benchmarks are provided to compare models. The scripts that achieved these results are provided in the benchmarks folder.

Installing

pip install cubist

or

pip install --upgrade cubist

Literature for Cubist Model

Publications Using Cubist

To Do

  • Continue adding tests
  • Add visualization utilities
  • Enable more features from the C-code model
  • Make Windows-compatible and continue verifying sklearn API integration

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cubist-0.0.13.tar.gz (148.1 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cubist-0.0.13-cp39-cp39-manylinux2010_x86_64.whl (490.8 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.12+ x86-64

cubist-0.0.13-cp38-cp38-manylinux2010_x86_64.whl (488.9 kB view details)

Uploaded CPython 3.8manylinux: glibc 2.12+ x86-64

cubist-0.0.13-cp37-cp37m-manylinux2010_x86_64.whl (485.3 kB view details)

Uploaded CPython 3.7mmanylinux: glibc 2.12+ x86-64

cubist-0.0.13-cp36-cp36m-manylinux2010_x86_64.whl (482.2 kB view details)

Uploaded CPython 3.6mmanylinux: glibc 2.12+ x86-64

File details

Details for the file cubist-0.0.13.tar.gz.

File metadata

  • Download URL: cubist-0.0.13.tar.gz
  • Upload date:
  • Size: 148.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for cubist-0.0.13.tar.gz
Algorithm Hash digest
SHA256 268b0d5db0cc30a7d523d93a8e093af3e7120944f258e7aebd96370b488b9580
MD5 06e0696fefb61fa8fb6a7c37f91c6c5f
BLAKE2b-256 8c26a1e84c8a939fb4370ddc2fb59e6b52076c40b5ed226047c9239286c9916f

See more details on using hashes here.

File details

Details for the file cubist-0.0.13-cp39-cp39-manylinux2010_x86_64.whl.

File metadata

  • Download URL: cubist-0.0.13-cp39-cp39-manylinux2010_x86_64.whl
  • Upload date:
  • Size: 490.8 kB
  • Tags: CPython 3.9, manylinux: glibc 2.12+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for cubist-0.0.13-cp39-cp39-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 9357ce0302ab2c50d0922eabed1fec2752d2fccd65f8e1c77d8346736642a8d3
MD5 3b232c84bd79e734dd838822164c672c
BLAKE2b-256 d87b4c44014cc646c96a3cd48c10030e7d9de7198f2033a3ca33d35576f208d0

See more details on using hashes here.

File details

Details for the file cubist-0.0.13-cp38-cp38-manylinux2010_x86_64.whl.

File metadata

  • Download URL: cubist-0.0.13-cp38-cp38-manylinux2010_x86_64.whl
  • Upload date:
  • Size: 488.9 kB
  • Tags: CPython 3.8, manylinux: glibc 2.12+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for cubist-0.0.13-cp38-cp38-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 207e410b6d2f1fde8c3b875614e062e714febd9e40f3d1d3aec2c85f34dd05a9
MD5 4690ac032ca2b5cb6904e36c19621ee0
BLAKE2b-256 88265d3137c89bdb9ae74f07095cd87cfe1860d9581daeaac3d51c608a40f31d

See more details on using hashes here.

File details

Details for the file cubist-0.0.13-cp37-cp37m-manylinux2010_x86_64.whl.

File metadata

  • Download URL: cubist-0.0.13-cp37-cp37m-manylinux2010_x86_64.whl
  • Upload date:
  • Size: 485.3 kB
  • Tags: CPython 3.7m, manylinux: glibc 2.12+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for cubist-0.0.13-cp37-cp37m-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 42e1f544d3e779eec65b4e675f3f4492a3b6d1a9277e835d2bdf2d262803ef36
MD5 3bfe8e8d1cc107fef6ab10f897350b34
BLAKE2b-256 ca9334920cd7359cf3ee81f4704f56d57af56bf743769009e8937353ca32eff1

See more details on using hashes here.

File details

Details for the file cubist-0.0.13-cp36-cp36m-manylinux2010_x86_64.whl.

File metadata

  • Download URL: cubist-0.0.13-cp36-cp36m-manylinux2010_x86_64.whl
  • Upload date:
  • Size: 482.2 kB
  • Tags: CPython 3.6m, manylinux: glibc 2.12+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for cubist-0.0.13-cp36-cp36m-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 4df857af15071f5b85629de7dbbd4c95a02463270adb235bc9ae5a563b80f64f
MD5 6ce3ee9c9d7f032a21536954f1ff5dd9
BLAKE2b-256 fede22a20e317451fbf89ec1cdc52cb5bba8d908cdb0f2593f30c3f400676c57

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page