Skip to main content

A Python package for data analysis and model optimization.

Project description

Logo OptiMask apyxl

The apyxl package (Another PYthon package for eXplainable Learning) is a simple wrapper around xgboost, hyperopt, and shap. It provides the user with the ability to build a performant regression or classification model and use the power of the SHAP analysis to gain a better understanding of the links the model builds between its inputs and outputs. With apyxl, processing categorical features, fitting the model using Bayesian hyperparameter search, and instantiating the associated SHAP explainer can all be accomplished in a single line of code, streamlining the entire process from data preparation to model explanation.

Current Features:

  • Automatic One-Hot-Encoding for categorical variables
  • Basic hyperparameter optimization using hyperopt with K-Folds cross-validation
  • Simple explainability visualizations using shap (beeswarm, decision, force, scatter)
  • Focus on classification and regression tasks

Planned Enhancements:

  • Time-series data handling and normalization
  • A/B test analysis capabilities

Installation

To install the package, use:

pip install apyxl

Basic Usage

Regression

from apyxl import XGBRegressorWrapper
from sklearn.datasets import fetch_california_housing

X.shape, y.shape
>>> ((20640, 8), (20640,))

model = XGBRegressorWrapper().fit(X, y)
# defaults to r2 score
model.best_score
>>> 0.6671771984999055

# Plot methods can handle internally the computation of the SHAP values
model.beeswarm(X=X.sample(2_500))
model.scatter(X=X.sample(2_500), feature='Latitude')

Classification

from apyxl import XGBClassifierWrapper
from sklearn.datasets import fetch_covtype

X, y = fetch_covtype(as_frame=True, return_X_y=True)
y -= 1
y.unique()
>>> array([4, 1, 0, 6, 2, 5, 3])

X.shape, y.shape
>>> ((581012, 54), (581012,))

# To speed up the process, Bayesian hyperparameter optimization can be performed on a subset of the dataset.
# The model is then fitted on the entire dataset using the optimized hyperparameters.
model = XGBClassifierWrapper().fit(X, y, n=25_000)
# defaults to Matthews correlation coefficient
model.best_score
>>> 0.5892932365687379

# Computing SHAP values can be resource-intensive, so it's advisable to calculate them once for multiple future
# uses, especially in multiclass classification scenarios where the cost is even higher compared to binary
# classification (shap values shape equals (n_samples, n_features, n_classes))
shap_values = model.compute_shap_values(X.sample(1_000))
shap_values.shape
>>> (1000, 54, 7)
# The `output` argument selects the shap values associated to the desired class
model.beeswarm(shap_values=shap_values, output=2, max_display=15)
model.scatter(shap_values=shap_values, feature='Elevation', output=4)

Note

Please note that this package is still under development, and features may change or expand in future versions.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

apyxl-0.1.1.tar.gz (9.9 kB view details)

Uploaded Source

File details

Details for the file apyxl-0.1.1.tar.gz.

File metadata

  • Download URL: apyxl-0.1.1.tar.gz
  • Upload date:
  • Size: 9.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.4

File hashes

Hashes for apyxl-0.1.1.tar.gz
Algorithm Hash digest
SHA256 67c87b7e1c6dd502e0680fa6b5fee12491c309bb7291cf079f54393faa2fe41a
MD5 c9211b2ecea7d01aac687fcdc89de976
BLAKE2b-256 4ecd29c0cd7bf1f8ce73cccd2a8cc751ef7b01e0c6c52c4e040aff9c0990926d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page