Skip to main content

Learn rule lists from data for classification, regression or subgroup discovery

Project description

MDL Rule Lists for prediction and subgroup discovery.

PyPI version PyPI - Python Version License: MIT

This repository contains the code for using rule lists for univariate or multivariate classification or regression and its equivalents in Data Mining and Subgroup Discovery. These models use the Minimum Description Length (MDL) principle as optimality criteria.

Dependencies

This project was written for Python 3.7. All required packages from PyPI are specified in the requirements.txt.

NOTE: This list of packages includes the gmpy2 package.

Installation

The latest release can be installed using pip:

pip install rulelist

If you run into issues regarding the gmpy2 package mentioned above, please refer to their documentation for help.

For the current version, you can clone the repository and install the dependencies locally:

git clone https://github.com/HMProenca/RuleList.git
cd RuleList
pip install -r requirements.txt

Example of usage for prediction:

import pandas as pd
from rulelist import RuleList
from sklearn import datasets
from sklearn.model_selection import train_test_split

task = 'prediction'
target_model = 'categorical'

data = datasets.load_breast_cancer()
Y = pd.Series(data.target)
X = pd.DataFrame(data.data)

X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size = 0.3)

model = RuleList(task = task, target_model = target_model)

model.fit(X_train, y_train)

y_pred = model.predict(X_test)
from sklearn.metrics import accuracy_score
accuracy_score(y_test.values,y_pred)

print(model)

Example of usage for subgroup discovery:

import pandas as pd
from rulelist.rulelist import RuleList
from sklearn import datasets

task = 'discovery'
target_model = 'gaussian'

data = datasets.load_boston()
y = pd.Series(data.target)
X = pd.DataFrame(data.data)

model = RuleList(task = task, target_model = target_model)

model.fit(X, y)

print(model)

Contact

If there are any questions or issues, please contact me by mail at hugo.manuel.proenca@gmail.com or open an issue here on Github.

Citation

In a machine learning (prediction) context for problems of classification, regression, multi-label classification, multi-category classification, or multivariate regression cite the corresponding bibtex of the first classification application of MDL rule lists:

@article{proencca2020interpretable,
  title={Interpretable multiclass classification by MDL-based rule lists},
  author={Proen{\c{c}}a, Hugo M and van Leeuwen, Matthijs},
  journal={Information Sciences},
  volume={512},
  pages={1372--1393},
  year={2020},
  publisher={Elsevier}
}

in the context of data mining and subgroup discovery please refer to subgroup lists:

@article{proencca2020discovering,
  title={Discovering outstanding subgroup lists for numeric targets using MDL},
  author={Proen{\c{c}}a, Hugo M and Gr{\"u}nwald, Peter and B{\"a}ck, Thomas and van Leeuwen, Matthijs},
  journal={arXiv preprint arXiv:2006.09186},
  year={2020}
} 

and

@article{proencca2021robust,
  title={Robust subgroup discovery},
  author={Proen{\c{c}}a, Hugo Manuel and B{\"a}ck, Thomas and van Leeuwen, Matthijs},
  journal={arXiv preprint arXiv:2103.13686},
  year={2021}
}

References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rulelist-0.2.0.tar.gz (38.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rulelist-0.2.0-py3-none-any.whl (55.3 kB view details)

Uploaded Python 3

File details

Details for the file rulelist-0.2.0.tar.gz.

File metadata

  • Download URL: rulelist-0.2.0.tar.gz
  • Upload date:
  • Size: 38.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.6.1 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.8.8

File hashes

Hashes for rulelist-0.2.0.tar.gz
Algorithm Hash digest
SHA256 58dbd8bffde004c012a0dc5162b45f92267a4dc9109ecea2fdb057204069172a
MD5 d1ecac397a75fec47803d9c4d671b337
BLAKE2b-256 59c8fd020e685dc41032ee35fd9b6cbecbe1fc131c0038c0f3d755661a68b09c

See more details on using hashes here.

File details

Details for the file rulelist-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: rulelist-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 55.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.6.1 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.8.8

File hashes

Hashes for rulelist-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 01333fa8a6400de71374afb0e16ac00701b7c47e56b02283da78cdbf89e6fcef
MD5 970690a8e53af8d822445077c4d20d9e
BLAKE2b-256 09362677f239a15cd409f78932a983acc99cbd7496f05e8e7be8c86986dd0835

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page