A Python package for data analysis and model optimization.
Project description
apyxl
The apyxl
package (Another PYthon package for eXplainable Learning) is a simple wrapper around xgboost
, hyperopt
, and shap
. It provides the user with the ability to build a performant regression or classification model and use the power of the SHAP analysis to gain a better understanding of the links the model builds between its inputs and outputs. With apyxl
, processing categorical features, fitting the model using Bayesian hyperparameter search, and instantiating the associated SHAP explainer can all be accomplished in a single line of code, streamlining the entire process from data preparation to model explanation.
Current Features:
- Automatic One-Hot-Encoding for categorical variables
- Basic hyperparameter optimization using
hyperopt
with K-Folds cross-validation - Simple explainability visualizations using
shap
(beeswarm
,decision
,force
,scatter
) - Focus on classification and regression tasks
Planned Enhancements:
- Time-series data handling and normalization
- A/B test analysis capabilities
Installation
To install the package, use:
pip install apyxl
Basic Usage
Regression
from apyxl import XGBRegressorWrapper
from sklearn.datasets import fetch_california_housing
X.shape, y.shape
>>> ((20640, 8), (20640,))
model = XGBRegressorWrapper().fit(X, y)
# defaults to r2 score
model.best_score
>>> 0.6671771984999055
# Plot methods can handle internally the computation of the SHAP values
model.beeswarm(X=X.sample(2_500))
model.scatter(X=X.sample(2_500), feature='Latitude')
Classification
from apyxl import XGBClassifierWrapper
from sklearn.datasets import fetch_covtype
X, y = fetch_covtype(as_frame=True, return_X_y=True)
y -= 1
y.unique()
>>> array([4, 1, 0, 6, 2, 5, 3])
X.shape, y.shape
>>> ((581012, 54), (581012,))
# To speed up the process, Bayesian hyperparameter optimization can be performed on a subset of the dataset.
# The model is then fitted on the entire dataset using the optimized hyperparameters.
model = XGBClassifierWrapper().fit(X, y, n=25_000)
# defaults to Matthews correlation coefficient
model.best_score
>>> 0.5892932365687379
# Computing SHAP values can be resource-intensive, so it's advisable to calculate them once for multiple future
# uses, especially in multiclass classification scenarios where the cost is even higher compared to binary
# classification (shap values shape equals (n_samples, n_features, n_classes))
shap_values = model.compute_shap_values(X.sample(1_000))
shap_values.shape
>>> (1000, 54, 7)
# The `output` argument selects the shap values associated to the desired class
model.beeswarm(shap_values=shap_values, output=2, max_display=15)
model.scatter(shap_values=shap_values, feature='Elevation', output=4)
Note
Please note that this package is still under development, and features may change or expand in future versions.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file apyxl-0.1.1.tar.gz
.
File metadata
- Download URL: apyxl-0.1.1.tar.gz
- Upload date:
- Size: 9.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 67c87b7e1c6dd502e0680fa6b5fee12491c309bb7291cf079f54393faa2fe41a |
|
MD5 | c9211b2ecea7d01aac687fcdc89de976 |
|
BLAKE2b-256 | 4ecd29c0cd7bf1f8ce73cccd2a8cc751ef7b01e0c6c52c4e040aff9c0990926d |