Skip to main content

No project description provided

Project description

Stars PyPI Docs Cite

Forest-Guided Clustering - Explainability for Random Forest Models

This python package is about explainability of Random Forest models. Standard explainability methods (e.g. feature importance) assume independence of model features and hence, are not suited in the presence of correlated features. The Forest-Guided Clustering algorithm does not assume independence of model features, because it computes the feature importance based on subgroups of instances that follow similar decision rules within the Random Forest model. Hence, this method is well suited for cases with high correlation among model features.

For detailed documentation and usage examples, please visit the Read the Docs documentation.

Installation

Requirements:

  • >= Python 3.8

  • pandas, numpy, tqdm

  • sklearn, scikit-learn-extra, scipy, statsmodels

  • matplotlib, seaborn

All required packages are automatically installed if installation is done via pip.

Install Options:

PyPI install:

pip install fgclustering

Usage

To get explainability of your Random Forest model via Forest-Guided Clustering, you simply need to run the following commands:

from fgclustering import FgClustering

# initialize and run fgclustering object
fgc = FgClustering(model=rf, data=data_boston, target_column='target')
fgc.run()

# visualize results
fgc.plot_global_feature_importance()
fgc.plot_local_feature_importance()
fgc.plot_decision_paths()

# obtain optimal number of clusters and vector that contains the cluster label of each data point
optimal_number_of_clusters = fgc.k
cluster_labels = fgc.cluster_labels

where

  • model=rf is a Random Forest Classifier or Regressor object,

  • data=data_boston is the dataset on which the Random Forest model was trained on, e.g. boston housing dataset, and

  • target_column='target' is the name of the target column (i.e. target) in the provided dataset.

For a detailed tutorial see the IPython Notebook tutorial.ipynb.

License

The fgclustering package is MIT licensed.

Contributing

Contributions are more than welcome! Everything from code to notebooks to examples and documentation are all equally valuable so please don’t feel you can’t contribute. To contribute please fork the project make your changes and submit a pull request. We will do our best to work through any issues with you and get your code merged into the main branch.

How to cite

If Forest-Guided Clustering is useful for your research, consider citing the package:

@software{lisa_sousa_2022_6445529,
  author       = {Lisa Barros de Andrade e Sousa and
                  Dominik Thalmeier,
                  Helena Pelin,
                  Marie Piraud},
  title        = {{Forest-Guided Clustering - Explainability for Random Forest Models}},
  month        = april,
  year         = 2022,
  publisher    = {Zenodo},
  version      = {v0.2.0},
  doi          = {10.5281/zenodo.6445529},
  url          = {https://doi.org/10.5281/zenodo.6445529}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fgclustering-0.3.tar.gz (15.8 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page