Ordinal Gradient Boosting

These details have not been verified by PyPI

Project description

Ordinal Gradient Boosting (`OGBoost`)

Overview

OGBoost is a scikit-learn-compatible, Python package for gradient boosting tailored to ordinal regression problems. It does so by alternating between:

Fitting a Machine Learning (ML) regression model - such as a decision tree - to predict a latent score that specifies the mean of a probability density function (PDF), and
Fitting a set of thresholds that generate discrete outcomes from the PDF.

In other words, OGBoost implements coordinate-descent optimization that combines functional gradient descent - for updating the regression function - with ordinary gradient descent - for updating the threshold vector.

The main class of the package, GradientBoostingOrdinal, is designed to have the same look and feel as scikit-learn's GradientBoostingClassifier. It includes many of the same features such as custom link functions, sample weighting, early stopping using a validation set, and staged predictions.

There are, however, important differences as well.

Unique Features of `OGBoost`

Latent-Score Prediction

The decision_function method of the GradientBoostingOrdinal behaves differently from scikit-learn's classifiers. Assuming the target variable has K distinct classes, a nominal classifier's decision function would return K values for each sample. On the other hand, decision_function in ogboost would return the latent score for each sample, which is a single value. This latent score can be considered a high-resolution alternative to class labels, and thus may have superior ranking performance.

Early Stopping using Cross-Validation (CV)

In addition to using a single validation set for early stopping, similar to GradientBoostingClassifier, ogboost implements early stopping using CV, which means the entire data is used for calculating out-of-sample performance. This can improve the robustness of early-stopping, especially for small and/or imbalanced datasets.

Heterogeneous Ensemble

While most gradient-boosting software packages exclusively use decision trees with a predetermined set of hyperparameters as the base learner in all boosting iterations, ogboost offers significantly more flexibility.

Users can pass in a base_learner parameter to the class initializer to override the default choice of a DecisionTreeRegressor. This can be any scikit-learn regression algorithm such as a feed-forward neural network (MLPRegressor), or a K-nearest-neighbor regressor (KNeighborsRegressor), etc.
Rather than a single base learner, users can specify a list (or a generator) of base learners, which will be drawn from in that order in each boosting iteration. This amounts to creating a heterogeneous ensemble as opposed to a homogeneous ensemble.

Installation

pip install ogboost

To access StatsModelsOrderedModel, which is a wrapper for the OrderedModel class from the statsmodels package to make it compatible with scikit-learn, please run:

pip install ogboost[linear]

Package Vignette

For a more detailed introduction to OGBoost, including the underlying math, see the package vignette, available on arXiv.

Quick Start

Load the Wine Quality Dataset

The package includes a utility to load the wine quality dataset (red and white) from the UCI repository. Note that load_wine_quality shifts the target variable (quality) to start from 0. (This is required by the GradientBoostingOrdinal class.)

from ogboost import load_wine_quality
X, y, _, _ = load_wine_quality(return_X_y=True)

Training, Prediction and Evaluation

Latent scores perform better on discrminative tasks vs. class labels as they contain more information due to higher resolution:

from ogboost import GradientBoostingOrdinal

## training ##
model = GradientBoostingOrdinal(n_estimators=100, link_function='logit', verbose=1)
model.fit(X, y)

## prediction ##
# class labels
predicted_labels = model.predict(X)
# class probabilities
predicted_probabilities = model.predict_proba(X)
# latent score
predicted_latent = model.decision_function(X)

# evaluation
concordance_latent = model.score(X, y) # concordance using latent scores
concordance_label = model.score(X, y, pred_type = 'labels') # concordance using class labels
print(f"Concordance - class labels: {concordance_label:.3f}")
print(f"Concordance - latent scores: {concordance_latent:.3f}")

Early-Stopping using Cross-Validation

Using cross-validation for early stopping can produce more robust results compared to a single holdout set, especially for small and/or imbalanced datasets:

from sklearn.model_selection import cross_val_score
from sklearn.model_selection import RepeatedKFold
import time

n_splits = 10
n_repeats = 10
kf = RepeatedKFold(n_splits=n_splits, n_repeats=n_repeats)

# early-stopping using a simple holdout set
model_earlystop_simple = GradientBoostingOrdinal(n_iter_no_change=10, validation_fraction=0.2)
start = time.time()
c_index_simple = cross_val_score(model_earlystop_simple, X, y, cv=kf, n_jobs=-1)
end = time.time()
print(f'Simple early stopping: {c_index_simple.mean():.3f} ({end - start:.1f} seconds)')

# early-stopping using cross-validation
model_earlystop_cv = GradientBoostingOrdinal(n_iter_no_change=10, cv_early_stopping_splits=5)
start = time.time()
c_index_cv = cross_val_score(model_earlystop_cv, X, y, cv=kf, n_jobs=-1)
end = time.time()
print(f'CV early stopping: {c_index_cv.mean():.3f} ({end - start:.1f} seconds)')

Heterogeneous Ensemble

Rather than a single base learner, users can supply a heterogeneous list of base learners to GradientBoostingOrdinal. The utility function generate_heterogeneous_learners can be used to easily generate random samples from hyperparameter spaces of one or more base learners:

import numpy as np
from sklearn.tree import DecisionTreeRegressor
from ogboost import generate_heterogeneous_learners

# Number of samples to generate
n_samples = 100

max_depth_choices = [3, 6, 9, None]
max_leaf_nodes_choices = [10, 20, 30, None]

dt_overrides = {
    "max_depth": lambda rng: rng.choice(max_depth_choices),
    "max_leaf_nodes": lambda rng: rng.choice(max_leaf_nodes_choices)
}

# Create list of DecisionTreeRegressor models
random_learners = generate_heterogeneous_learners(
    [DecisionTreeRegressor()], 
    [dt_overrides], 
    total_samples=n_samples
)

Such heterogenous boosting ensembles can be a more efficient alternative to hyperparameter tuning (e.g., via grid search):

model_heter = GradientBoostingOrdinal(
    base_learner=random_learners,
    n_estimators=n_samples
)
cv_heter = cross_val_score(model_heter, X, y, cv=kf, n_jobs=-1)
print(f'average cv score of heterogeneous ensemble: {np.mean(cv_heter):.3f}')

Linear Ordinal Regression

The StatsModelsOrderedModel is a scikit-learn wrapper for the OrderedModel class of the statsmodels package:

from ogboost import StatsModelsOrderedModel

cv_linear = cross_val_score(StatsModelsOrderedModel(), X, y, cv=kf, n_jobs=-1)
print(f'average cv score of linear model: {np.mean(cv_linear):.3f}')

This model can be useful for benchmarking against ML models, or as part of an ensemble alongside them.

Release Notes

0.7.0

Added the StatsModelsOrderedModel class, which provides a scikit-learn wrapper for the OrderedModel class from the statsmodels package.
Added support for custom (user-supplied) link functions.

0.6.3

Improved documentation.

0.6.2

Added link to package vignette on arxiv to README.md.
Simplified the initialization of fold level models in _fit_cv.
Fixed a bug in _fit_cv that prevented using CV-based early stopping with heterogeneous base learners.

0.6.1

Debugged _fit_cv and plot_loss methods of GradientBoostingOrdinal to produce correct plots of training/validation loss, and loss improvement after each g and theta update when using cross-validation for early stopping.
Enhanced docstrings for plot_loss.

0.6.0

Improved the logic for detecting random_state as a parameter in the base learners (switching from hasattr to get_params), as the old method was tricked by sklearn's inheritance mechanics into thinking estimators such as SVM included random_state as a modifiable parameter.
Added a utility function, generate_heterogeneous_learners, to stochastically generate a list of base learners to supply to GradientBoostingOrdinal (heterogenous boosting ensemble).
Edited code examples in README.md to reflect the enhancements to the package.
Enhanced load_wine_quality to add option for returning X and y - instead of a single dataframe - for red and white datasets.

0.5.6

Tweaked the default hyperparameters of DecisionTreeRegressor (itself the default base_learner for GradientBoostingOrdinal) to match those in scikit-learn's GradientBoostingClassifier.
Small improvements to the plot_loss method of GradientBoostingOrdinal.
Added the Release Notes section to the README.md file.
Small edits to the text and code in README.md.

0.5.5

First public release.

License

This package is licensed under the MIT License.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.8.2

Oct 4, 2025

0.8.1

Oct 2, 2025

0.8.0

Sep 28, 2025

0.7.1

Mar 14, 2025

This version

0.7.0

Mar 7, 2025

0.6.3

Feb 22, 2025

0.6.2

Feb 20, 2025

0.6.1

Feb 17, 2025

0.6.0

Feb 14, 2025

0.5.6

Feb 12, 2025

0.5.5

Feb 6, 2025

0.5.1

Oct 2, 2025

0.5.0

Oct 2, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ogboost-0.7.0.tar.gz (31.4 kB view details)

Uploaded Mar 7, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ogboost-0.7.0-py3-none-any.whl (31.7 kB view details)

Uploaded Mar 7, 2025 Python 3

File details

Details for the file ogboost-0.7.0.tar.gz.

File metadata

Download URL: ogboost-0.7.0.tar.gz
Upload date: Mar 7, 2025
Size: 31.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.7

File hashes

Hashes for ogboost-0.7.0.tar.gz
Algorithm	Hash digest
SHA256	`18538143e4655eab31cfb2173fcc7976a96a7657421670b3f00ef22a8bf769b1`
MD5	`2d915040a6f8d9b2579063c97420813d`
BLAKE2b-256	`917a776efa8fe59242da6f5dc3e7bcafb304aeaab2943281eafe3a483a679823`

See more details on using hashes here.

File details

Details for the file ogboost-0.7.0-py3-none-any.whl.

File metadata

Download URL: ogboost-0.7.0-py3-none-any.whl
Upload date: Mar 7, 2025
Size: 31.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.7

File hashes

Hashes for ogboost-0.7.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a0d6144636049d2b7b29256b1b1830d6c3aa7dddeb429d0cd4ec2a46ef156a4b`
MD5	`83fe51675610dcb4f8dad4bc82005f9f`
BLAKE2b-256	`55201f478db6cd20a9b974f14ad05bc1c613215b5f76562a1e8142f5afbc9d01`

See more details on using hashes here.

ogboost 0.7.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Ordinal Gradient Boosting (OGBoost)

Overview

Unique Features of OGBoost

Latent-Score Prediction

Early Stopping using Cross-Validation (CV)

Heterogeneous Ensemble

Installation

Package Vignette

Quick Start

Load the Wine Quality Dataset

Training, Prediction and Evaluation

Early-Stopping using Cross-Validation

Heterogeneous Ensemble

Linear Ordinal Regression

Release Notes

0.7.0

0.6.3

0.6.2

0.6.1

0.6.0

0.5.6

0.5.5

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Ordinal Gradient Boosting (`OGBoost`)

Unique Features of `OGBoost`