Skip to main content

A Python package for ordinal regression and model-agnostic interpretation methods

Project description

Ordinal XAI

A Python package for explainable ordinal regression models and interpretation methods.

Overview

This package provides a comprehensive suite of ordinal regression models and interpretation methods, designed to handle ordinal data while providing transparent and interpretable results. The implementation follows scikit-learn's API design, making it easy to integrate with existing machine learning workflows.

Features

Models

  1. Cumulative Link Model (CLM)

    • Supports both logit and probit link functions
    • Handles categorical and numerical features automatically
    • Provides probability estimates for each class
  2. Ordinal Neural Network (ONN)

    • Neural network architecture for ordinal regression
    • Configurable hidden layers
    • Supports various output layers and loss functions
    • Parameters like learning rate, batch size, dropout, etc. can be tuned
    • Automatic GPU acceleration when available
    • Early stopping to prevent overfitting
  3. Ordinal Binary Decomposition (OBD)

    • Decomposes ordinal problem into binary classification tasks
    • Supports two decomposition strategies:
      • One-vs-following: Each class vs all higher classes
      • One-vs-next: Each class vs next class only
    • Multiple base classifier options:
      • Logistic Regression
      • SVM
      • Random Forest
      • XGBoost
    • Base classifier parameters can be tuned
  4. Ordinal Gradient Boosting (OGBoost)

    • Gradient Boosting implementation for ordinal regression
    • Based on the ogboost.GradientBoostingOrdinal model
    • Supports standard gradient boosting parameters:
      • n_estimators: Number of boosting stages
      • learning_rate: Shrinks the contribution of each tree
      • max_depth: Maximum depth of the individual trees
      • min_samples_split: Minimum samples required to split a node
    • Handles both categorical and numerical features
    • Provides feature importance scores

The models provided in this package are only examples. The interpretation methods are model-agnostic and designed to work with any ordinal regression model. Necessary requirements for custom models are that they implement the BaseOrdinalModel and sklearn BaseEstimator interfaces.

Interpretation Methods

  1. Feature Effects Analysis

    • Partial Dependence Plots (PDP)
      • Shows average effect of features on predictions
      • Handles both categorical and numerical features
      • Automatic subplot arrangement
    • PDP with Probabilities (PDPProb)
      • Visualizes average probability distributions across feature values
      • Shows class probability changes
      • Detailed probability annotations
    • Individual Conditional Expectation (ICE)
      • Analyzes individual instance behavior
      • Shows heterogeneous effects across samples
      • Displays either entire population or individual instances (recommended)
    • ICE with Probabilities (ICEProb)
      • Visualizes individual probability distributions across feature values
      • Shows class probability changes per instance
      • Detailed probability annotations at original values
      • Displays either entire population or individual instances (recommended)
  2. Feature Importance Analysis

    • Permutation Feature Importance (PFI)
      • Global feature importance through permutation
      • Multiple evaluation metrics support, subset can be specified
      • Handles both categorical and numerical features
      • Subset of features to perform analysis on can be specified
      • Visualizes feature importance scores in a bar plot
    • Leave-One-Covariate-Out (LOCO)
      • Global feature importance through feature removal and refitting
      • Uses train - test split to fit the models and evaluate performance
      • Supports multiple evaluation metrics, subset can be specified
      • Visualizes feature importance scores in a bar plot
  3. Local Explanations

    • Local Interpretable Model-agnostic Explanations (LIME)
      • Provides local explanations for individual predictions
      • Supports both logistic regression and decision tree surrogate models
      • Multiple sampling strategies (grid, uniform, permutation)
      • Customizable kernel functions for sample weighting
      • Visualizes feature importance through coefficients or tree plot

Datasets

The package includes several benchmark datasets for ordinal regression:

  1. Wine Quality (winequality.csv)

    • Combined dataset of red and white wines
    • 6499 samples, 12 features
    • Separate datasets available for red (winequality-red.csv) and white (winequality-white.csv) wines
    • Ordinal target: 3-9 (wine quality rating)
    • Features include physicochemical properties (acidity, sugar, alcohol, etc.)
  2. Student Performance (student-mat.csv, student-por.csv)

    • Two datasets: Mathematics and Portuguese
    • Mathematics: 395 samples, 33 features
    • Portuguese: 649 samples, 33 features
    • Ordinal targets: G1, G2, G3 (grades in three periods)
    • Features include demographic, social, and educational factors
    • Mixed numerical and categorical features
  3. Abalone (abalone.csv)

    • 4,177 samples, 8 features
    • Predicts the age of abalone from physical measurements
    • Features include length, diameter, height, and various weight measurements
    • Target variable 'Rings' represents age (1-29)
  4. Automobile (automobile.csv)

    • 205 samples, 26 features
    • Predicts automobile price based on various specifications
    • Mixed numerical and categorical features
    • Features include make, fuel-type, engine-size, horsepower, etc.
  5. Balance Scale (balance_scale.csv)

    • 625 samples, 4 features
    • Predicts balance scale state (left, balanced, right)
    • Simple dataset useful for testing ordinal classification
  6. Car Evaluation (car_evaluation.csv)

    • 1,728 samples, 6 features
    • Predicts car acceptability (unacc, acc, good, vgood)
    • All features are categorical
  7. FI Test Dataset (FI_test.csv)

    • Synthetic benchmark for feature importance methods
    • 1,000 samples, 10 features
    • First 5 features have predefined importance: [1.0, 0.8, 0.4, 0.2, 0.1]
    • Last 5 features are irrelevant noise variables
    • Includes both clean (y) and noisy (y_noisy) target variables
    • Designed to evaluate feature importance ranking and robustness to label noise
  8. Dummy Dataset (dummy.csv)

    • 1,000 samples of synthetic data
    • Used for testing and demonstration purposes
    • Contains synthetic data with known patterns
    • Labels represent a health risk level influenced by health and gender
    • Latent variable: 0.006age + 0.0005height + 0.3gender_w + 0.05normal(0,1)
    • Ranks generated by creating 3 equal-sized buckets based on the latent variable

Installation

pip install ordinal-xai

Quick Start

from ordinal_xai.models import CLM, ONN, OBD, OGBoost
from ordinal_xai.interpretation import LIME, LOCO, ICE, ICEProb, PDP, PDPProb, PFI
import pandas as pd
import numpy as np

# Create sample data
X = pd.DataFrame(np.random.randn(100, 5))
y = pd.Series(np.random.randint(0, 3, 100))

# Initialize and train model
model = CLM(link='logit')
model.fit(X, y)

# Make predictions
predictions = model.predict(X)
probabilities = model.predict_proba(X)

# Generate explanations
# Feature Effects
pdp = PDP(model, X, y)
pdp_effects = pdp.explain(features=['feature1', 'feature2'], plot=True)

pdp_prob = PDPProb(model, X, y)
pdp_prob_effects = pdp_prob.explain(features=['feature1', 'feature2'], plot=True)

ice = ICE(model, X, y)
ice_effects = ice.explain(features=['feature1', 'feature2'], plot=True)

ice_prob = ICEProb(model, X, y)
ice_prob_effects = ice_prob.explain(features=['feature1', 'feature2'], plot=True)

# Feature Importance
pfi = PFI(model, X, y)
pfi_importance = pfi.explain(plot=True)

loco = LOCO(model, X, y)
loco_importance = loco.explain(plot=True)

# Local Explanations
lime = LIME(model, X, y)
lime_explanation = lime.explain(observation_idx=0, plot=True)

Documentation

For detailed documentation, including API reference and examples, visit our documentation page.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ordinal_xai-0.2.1.tar.gz (429.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ordinal_xai-0.2.1-py3-none-any.whl (447.8 kB view details)

Uploaded Python 3

File details

Details for the file ordinal_xai-0.2.1.tar.gz.

File metadata

  • Download URL: ordinal_xai-0.2.1.tar.gz
  • Upload date:
  • Size: 429.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.6

File hashes

Hashes for ordinal_xai-0.2.1.tar.gz
Algorithm Hash digest
SHA256 45ca8c52eddfeb443d1ca20c1e574611762db4039885fb3eabb20fadacf1848b
MD5 cef6d6156306903b3c7e351537ba1f0b
BLAKE2b-256 51912c188e38853ff876c3cce47c26f5373c86a75583c7ebf14337c0d69747a6

See more details on using hashes here.

File details

Details for the file ordinal_xai-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: ordinal_xai-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 447.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.6

File hashes

Hashes for ordinal_xai-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ceeab04d52fb8f75d9b400ffff41d8619e314c3b8c95605063d3c3fb1dc2a9c0
MD5 c48d90e9ab6a1892c1f2e72dfa23559b
BLAKE2b-256 79ac9c991add8affb217990fc5a49f57d3a5308405b0c17f2e4c7010b83e0225

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page