Skip to main content

Exact computation of Shapley R-squared in polynomial time

Project description

Q-SHAP: Feature-Specific $R^2$ Values for Tree Ensembles

PyPI Downloads

This package is used to compute feature-specific $R^2$ values, following Shapley decomposition of the total $R^2$, for tree ensembles in polynomial time based on the paper.

This version only takes outputs from XGBoost, scikit-learn Decision Tree, and scikit-learn GBDT. We are working to update it for random forests in the next version. Please check Q-SHAP Tutorial.ipynb for more details using Q-SHAP.

Installation

qshap can be installed through PyPI:

pip install qshap

Quick Start

# Import necessary libraries
from sklearn.datasets import fetch_california_housing
from qshap import gazer, vis
import xgboost as xgb
import numpy as np

# Load the California Housing dataset and fit a XGBoost regressor
housing = fetch_california_housing()
x, y, feature_names = housing.data, housing.target, housing.feature_names
model = xgb.XGBRegressor(max_depth=2, n_estimators=50, random_state=42).fit(x, y)

# Obtain feature-specific R^2 using qshap, using 5% of the data (around 1000)
gazer_rsq = gazer(model)
phi_rsq = gazer.rsq(gazer_rsq, x, y, nfrac = 0.05, random_state=42)

# Visualize top values of feature-specific R^2
vis.rsq(phi_rsq, label=np.array(feature_names), rotation=30, save_name="cal_housing", color_map_name="Pastel2")

Citation

@article{jiang2024feature,
 title={Feature-Specific Coefficients of Determination in Tree Ensembles},
 author={Jiang, Zhongli and Zhang, Dabao and Zhang, Min},
 journal={arXiv preprint arXiv:2407.03515},
 year={2024}
}

References

  • Jiang, Z., Zhang, D., & Zhang, M. (2024). "Feature-specific coefficients of determination in tree ensembles." arXiv preprint arXiv:2407.03515.
  • Lundberg, Scott M., et al. "From local explanations to global understanding with explainable AI for trees." Nature Machine Intelligence 2.1 (2020): 56-67.
  • Karczmarz, Adam, et al. "Improved feature importance computation for tree models based on the Banzhaf value." Uncertainty in Artificial Intelligence. PMLR, 2022.
  • Bifet, Albert, Jesse Read, and Chao Xu. "Linear tree shap." Advances in Neural Information Processing Systems 35 (2022): 25818-25828.
  • Chen, Tianqi, and Carlos Guestrin. "Xgboost: A scalable tree boosting system." Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2016.

Container Images

We provide pre-built images, available for both Docker and Singularity, with all necessary packages for Q-SHAP in Python 3.11:

  • Docker:
    You can pull the Docker image using the following command:
    docker pull catstat/xai
    
  • Singularity:
    You can pull the Docker image using the following command:
    singularity pull docker://catstat/xai:0.0
    

Task List

  • Task 1: Lightgbm version
  • Task 2: Catboost version

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

qshap-0.3.0.tar.gz (15.7 kB view details)

Uploaded Source

Built Distribution

qshap-0.3.0-py3-none-any.whl (15.8 kB view details)

Uploaded Python 3

File details

Details for the file qshap-0.3.0.tar.gz.

File metadata

  • Download URL: qshap-0.3.0.tar.gz
  • Upload date:
  • Size: 15.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.4

File hashes

Hashes for qshap-0.3.0.tar.gz
Algorithm Hash digest
SHA256 b2d0c0655d8ee210e779048cfb015bdd61dd67d3de709d97e3893430c8c3cd63
MD5 95e7784ba744c3f31e1bff45e7be0e83
BLAKE2b-256 507c2e6077ac0c6c72b347e1b6bdecc26c25f3fb7ff892599865a27ab7491e37

See more details on using hashes here.

File details

Details for the file qshap-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: qshap-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 15.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.4

File hashes

Hashes for qshap-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ca0a5e2b209b1b6e416306bb78b0f1352ba0b04b2fc9961c235ef509fa053364
MD5 74754b680f7e29f704f11394167e8616
BLAKE2b-256 0a7ed8ce36fe87d09eb32c731e8ed5c8f0ce9633f137e0d30188ad3be9c54d73

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page