Skip to main content

Explainable Boosted Scoring

Project description

xBooster 🚀

xBooster is a Python package designed to enhance the interpretability and explainability of XGBoost models.

It provides tools for constructing gradient boosted scorecards, generating local interpretations, and visualizing model explanations.

Features ✨

1️⃣ Construct (credit) scorecards for XGBoost models and make inference.

2️⃣ Visualize feature importances using several metrics and two methods.

3️⃣ Generate local explanations for model predictions.

4️⃣ Generate SQL queries for boosted scorecards for easy deployment (e.g., with DuckDB).

The methodology for explainers leverages the concepts of Weight-of-Evidence (WOE) and Fisher's Likelihood in calculating feature importances and local explanations. 🎲 For instance, booster's margins are seen as likelihoods and are conceptually similar to WOE. 📈 A scorecard can be constructed from WOE (natural logarithm of likelihood) based on booster's split information.

The results from explainer are highly consistent with SHAP values, but do not require significant computational resources, since all information is taken from the booster's model. 💡 This means that you can gain valuable insights into your model's behavior without the heavy computational overhead typically associated with SHAP computations. 🚀

Installation 🛠️

You can install xBooster via pip:

pip install xbooster

Usage 📝

Here's a quick example of how to use xBooster to construct a scorecard for an XGBoost model:

import pandas as pd
import xgboost as xgb
from xbooster.constructor import XGBScorecardConstructor
from sklearn.model_selection import train_test_split

# Load data and train XGBoost model
data = pd.read_csv("data.csv")
X = data.drop(columns=["target"])
y = data["target"]
model = xgb.XGBClassifier()

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)
model.fit(X_train, y_train)

# Initialize XGBScorecardConstructor
scorecard_constructor = XGBScorecardConstructor(model, X_train, y_train)
scorecard_constructor.construct_scorecard()

# Print the scorecard
print(scorecard_constructor.scorecard)

After this we can create a scorecard and test its discrimination skill (Gini score):

from xbooster.constructor import XGBScorecardConstructor

# Create scoring points
xgb_scorecard_with_points = scorecard_constructor.create_points(
    pdo=50, target_points=600, target_odds=50
)
# Make predictions using the scorecard
credit_scores = scorecard_constructor.predict_score(X_test)
gini = roc_auc_score(y_test, -credit_scores) * 2 - 1
print(f"Test Gini score: {gini:.2%}")

We can also visualize the score distribution between the events of interest:

from xbooster import explainer

explainer.plot_score_distribution(
    y_test, 
    credit_scores,
    num_bins=30, 
    figsize=(8, 3),
    dpi=100
)

We can further examine feature importances.

Below we can visualize the global feature importances using Points as our metric:

from xbooster import explainer

explainer.plot_importance(
    scorecard_constructor,
    metric='Points',
    method='global',
    normalize=True,
    figsize=(3, 3)
)

Alternatively, we can calculate local feature importances, which is important for booster with a depth larger than 1.

from xbooster import explainer

explainer.plot_importance(
    scorecard_constructor,
    metric='Likelihood',
    method='local',
    normalize=True,
    color='#ffd43b',
    edgecolor='#1e1e1e',
    figsize=(3, 3)
)

Finally, we can generate a scorecard in SQL format.

sql_query = scorecard_constructor.generate_sql_query(table_name='my_table')
print(scorecard_constructor.sql_query)

For more detailed examples and documentation, please refer to the documentation and check out the \notebooks directory.

Contributing 🤝

Contributions are welcome! For bug reports or feature requests, please open an issue.

For code contributions, please open a pull request.

License 📄

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xbooster-0.1.0.tar.gz (21.3 kB view details)

Uploaded Source

Built Distribution

xbooster-0.1.0-py3-none-any.whl (22.1 kB view details)

Uploaded Python 3

File details

Details for the file xbooster-0.1.0.tar.gz.

File metadata

  • Download URL: xbooster-0.1.0.tar.gz
  • Upload date:
  • Size: 21.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.6.1 CPython/3.10.12 Darwin/21.6.0

File hashes

Hashes for xbooster-0.1.0.tar.gz
Algorithm Hash digest
SHA256 6122aa44418727f9b8ef99ef26d96205de2be1bb67685fe481ad97b9f89c9c22
MD5 42d351455f9ae9f99702a943040b9a45
BLAKE2b-256 c6257860c05f568e1d8a08f0d61be6a93538b9110253afa5ebeda97c82e0487e

See more details on using hashes here.

File details

Details for the file xbooster-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: xbooster-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 22.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.6.1 CPython/3.10.12 Darwin/21.6.0

File hashes

Hashes for xbooster-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c8ea6f5a15af4662e6abd6bf5ed319357a842af1244f13a87d29ff1da9de0bf2
MD5 0bd35c669e701413fa91b9b5336249fc
BLAKE2b-256 4299fc5e6f4ba33cd7b5b1044d0417d1a6edc23e1696ecf833db971b291452eb

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page