A Python package for ordinal regression and model-agnostic interpretation methods
Project description
Ordinal XAI
A Python package for explainable ordinal regression models and interpretation methods.
Overview
This package provides a comprehensive suite of ordinal regression models and interpretation methods, designed to handle ordinal data while providing transparent and interpretable results. The implementation follows scikit-learn's API design, making it easy to integrate with existing machine learning workflows.
Features
Models
-
Cumulative Link Model (CLM)
- Supports both logit and probit link functions
- Handles categorical and numerical features automatically
- Provides probability estimates for each class
-
Ordinal Neural Network (ONN)
- Neural network architecture for ordinal regression
- Configurable hidden layers
- Supports various output layers and loss functions
- Parameters like learning rate, batch size, dropout, etc. can be tuned
- Automatic GPU acceleration when available
- Early stopping to prevent overfitting
-
Ordinal Binary Decomposition (OBD)
- Decomposes ordinal problem into binary classification tasks
- Supports two decomposition strategies:
- One-vs-following: Each class vs all higher classes
- One-vs-next: Each class vs next class only
- Multiple base classifier options:
- Logistic Regression
- SVM
- Random Forest
- XGBoost
- Base classifier parameters can be tuned
-
Ordinal Gradient Boosting (OGBoost)
- Gradient Boosting implementation for ordinal regression
- Based on the ogboost.GradientBoostingOrdinal model
- Supports standard gradient boosting parameters:
- n_estimators: Number of boosting stages
- learning_rate: Shrinks the contribution of each tree
- max_depth: Maximum depth of the individual trees
- min_samples_split: Minimum samples required to split a node
- Handles both categorical and numerical features
- Provides feature importance scores
The models provided in this package are only examples. The interpretation methods are model-agnostic and designed to work with any ordinal regression model. Necessary requirements for custom models are that they implement the BaseOrdinalModel and sklearn BaseEstimator interfaces.
Interpretation Methods
-
Feature Effects Analysis
- Partial Dependence Plots (PDP)
- Shows average effect of features on predictions
- Handles both categorical and numerical features
- Automatic subplot arrangement
- PDP with Probabilities (PDPProb)
- Visualizes average probability distributions across feature values
- Shows class probability changes
- Detailed probability annotations
- Individual Conditional Expectation (ICE)
- Analyzes individual instance behavior
- Shows heterogeneous effects across samples
- Displays either entire population or individual instances (recommended)
- ICE with Probabilities (ICEProb)
- Visualizes individual probability distributions across feature values
- Shows class probability changes per instance
- Detailed probability annotations at original values
- Displays either entire population or individual instances (recommended)
- Partial Dependence Plots (PDP)
-
Feature Importance Analysis
- Permutation Feature Importance (PFI)
- Global feature importance through permutation
- Multiple evaluation metrics support, subset can be specified
- Handles both categorical and numerical features
- Subset of features to perform analysis on can be specified
- Visualizes feature importance scores in a bar plot
- Leave-One-Covariate-Out (LOCO)
- Global feature importance through feature removal and refitting
- Uses train - test split to fit the models and evaluate performance
- Supports multiple evaluation metrics, subset can be specified
- Visualizes feature importance scores in a bar plot
- Permutation Feature Importance (PFI)
-
Local Explanations
- Local Interpretable Model-agnostic Explanations (LIME)
- Provides local explanations for individual predictions
- Supports both logistic regression and decision tree surrogate models
- Multiple sampling strategies (grid, uniform, permutation)
- Customizable kernel functions for sample weighting
- Visualizes feature importance through coefficients or tree plot
- Local Interpretable Model-agnostic Explanations (LIME)
Datasets
The package includes several benchmark datasets for ordinal regression:
-
Wine Quality (winequality.csv)
- Combined dataset of red and white wines
- 6499 samples, 12 features
- Separate datasets available for red (winequality-red.csv) and white (winequality-white.csv) wines
- Ordinal target: 3-9 (wine quality rating)
- Features include physicochemical properties (acidity, sugar, alcohol, etc.)
-
Student Performance (student-mat.csv, student-por.csv)
- Two datasets: Mathematics and Portuguese
- Mathematics: 395 samples, 33 features
- Portuguese: 649 samples, 33 features
- Ordinal targets: G1, G2, G3 (grades in three periods)
- Features include demographic, social, and educational factors
- Mixed numerical and categorical features
-
Abalone (abalone.csv)
- 4,177 samples, 8 features
- Predicts the age of abalone from physical measurements
- Features include length, diameter, height, and various weight measurements
- Target variable 'Rings' represents age (1-29)
-
Automobile (automobile.csv)
- 205 samples, 26 features
- Predicts automobile price based on various specifications
- Mixed numerical and categorical features
- Features include make, fuel-type, engine-size, horsepower, etc.
-
Balance Scale (balance_scale.csv)
- 625 samples, 4 features
- Predicts balance scale state (left, balanced, right)
- Simple dataset useful for testing ordinal classification
-
Car Evaluation (car_evaluation.csv)
- 1,728 samples, 6 features
- Predicts car acceptability (unacc, acc, good, vgood)
- All features are categorical
-
FI Test Dataset (FI_test.csv)
- Synthetic benchmark for feature importance methods
- 1,000 samples, 10 features
- First 5 features have predefined importance: [1.0, 0.8, 0.4, 0.2, 0.1]
- Last 5 features are irrelevant noise variables
- Includes both clean (y) and noisy (y_noisy) target variables
- Designed to evaluate feature importance ranking and robustness to label noise
-
Dummy Dataset (dummy.csv)
- 1,000 samples of synthetic data
- Used for testing and demonstration purposes
- Contains synthetic data with known patterns
- Labels represent a health risk level influenced by health and gender
- Latent variable: 0.006age + 0.0005height + 0.3gender_w + 0.05normal(0,1)
- Ranks generated by creating 3 equal-sized buckets based on the latent variable
Installation
pip install ordinal-xai
Quick Start
from ordinal_xai.models import CLM, ONN, OBD, OGBoost
from ordinal_xai.interpretation import LIME, LOCO, ICE, ICEProb, PDP, PDPProb, PFI
import pandas as pd
import numpy as np
# Create sample data
X = pd.DataFrame(np.random.randn(100, 5))
y = pd.Series(np.random.randint(0, 3, 100))
# Initialize and train model
model = CLM(link='logit')
model.fit(X, y)
# Make predictions
predictions = model.predict(X)
probabilities = model.predict_proba(X)
# Generate explanations
# Feature Effects
pdp = PDP(model, X, y)
pdp_effects = pdp.explain(features=['feature1', 'feature2'], plot=True)
pdp_prob = PDPProb(model, X, y)
pdp_prob_effects = pdp_prob.explain(features=['feature1', 'feature2'], plot=True)
ice = ICE(model, X, y)
ice_effects = ice.explain(features=['feature1', 'feature2'], plot=True)
ice_prob = ICEProb(model, X, y)
ice_prob_effects = ice_prob.explain(features=['feature1', 'feature2'], plot=True)
# Feature Importance
pfi = PFI(model, X, y)
pfi_importance = pfi.explain(plot=True)
loco = LOCO(model, X, y)
loco_importance = loco.explain(plot=True)
# Local Explanations
lime = LIME(model, X, y)
lime_explanation = lime.explain(observation_idx=0, plot=True)
Documentation
For detailed documentation, including API reference and examples, visit our documentation page.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ordinal_xai-0.2.1.tar.gz.
File metadata
- Download URL: ordinal_xai-0.2.1.tar.gz
- Upload date:
- Size: 429.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
45ca8c52eddfeb443d1ca20c1e574611762db4039885fb3eabb20fadacf1848b
|
|
| MD5 |
cef6d6156306903b3c7e351537ba1f0b
|
|
| BLAKE2b-256 |
51912c188e38853ff876c3cce47c26f5373c86a75583c7ebf14337c0d69747a6
|
File details
Details for the file ordinal_xai-0.2.1-py3-none-any.whl.
File metadata
- Download URL: ordinal_xai-0.2.1-py3-none-any.whl
- Upload date:
- Size: 447.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ceeab04d52fb8f75d9b400ffff41d8619e314c3b8c95605063d3c3fb1dc2a9c0
|
|
| MD5 |
c48d90e9ab6a1892c1f2e72dfa23559b
|
|
| BLAKE2b-256 |
79ac9c991add8affb217990fc5a49f57d3a5308405b0c17f2e4c7010b83e0225
|