Skip to main content

LP-based ensemble learning with column generation

Project description

colboost: Ensemble Boosting with Column Generation

Tests License: MIT Python Version

colboost is a Python library for training ensemble classifiers using mathematical programming based boosting methods such as LPBoost. Each iteration fits a weak learner and solves a mathematical program to determine optimal ensemble weights. The implementation is compatible with scikit-learn and supports any scikit-learn-compatible base learner. Currently, the library only supports binary classification.

Installation

The easiest way to install colboost is using pip:

pip install colboost

This project requires the Gurobi solver. Free academic licenses are available:

https://www.gurobi.com/academia/academic-program-and-licenses/

Available Parameters

Parameter Default Description
solver "nm_boost" Which formulation to use. Options: "nm_boost", "cg_boost", "erlp_boost", "lp_boost", "md_boost", "qrlp_boost".
base_estimator None Optional base estimator (defaults to CART decision tree if not provided).
max_depth 1 Maximum depth of individual trees (only relevant when using default, base_estimator=None).
max_iter 100 Maximum number of boosting iterations.
use_crb False Whether to use confidence-rated boosting (soft-voting, only applicable when using tree-based base_estimator).
check_dual_const True Whether to check dual feasibility in each iteration.
early_stopping True Stop boosting early if no improvement is observed.
acc_eps 1e-4 Tolerance for accuracy-based stopping criteria.
acc_check_interval 5 How often (in iterations) to check accuracy for early stopping.
gurobi_time_limit 60 Time limit (in seconds) for each Gurobi solve.
gurobi_num_threads 1 Number of threads Gurobi uses.
tradeoff_hyperparam 1e-2 Trade-off parameter for regularization.
seed 1 Random seed for reproducibility.

Example 1: fitting an ensemble

from sklearn.datasets import make_classification
from colboost.ensemble import EnsembleClassifier

# Create a synthetic binary classification problem
X, y = make_classification(n_samples=200, n_features=20, random_state=0)
y = 2 * y - 1  # Convert labels from {0, 1} to {-1, +1}

# Train an NMBoost-based ensemble
model = EnsembleClassifier(solver="nm_boost", max_iter=50)
model.fit(X, y)
print("Training accuracy:", model.score(X, y))

# Obtain margin values y * f(x)
margins = model.compute_margins(X, y)
print("First 5 margins:", margins[:5])

Example 2: Reweighting an existing ensemble

from sklearn.ensemble import AdaBoostClassifier
from sklearn.datasets import make_classification
from colboost.ensemble import EnsembleClassifier
import numpy as np

# Generate data
X, y = make_classification(n_samples=200, n_features=20, random_state=42)
y = 2 * y - 1  # Convert labels to {-1, +1}

# Train AdaBoost with sklearn
ada = AdaBoostClassifier(n_estimators=100, random_state=0)
ada.fit(X, y)

# Reweight AdaBoost base estimators using NMBoost
model = EnsembleClassifier(solver="nm_boost")
model.reweight_ensemble(X, y, learners=ada.estimators_)

print("Training accuracy after reweighting:", model.score(X, y))
print("Number of non-zero weights after reweighting:", np.count_nonzero(model.weights))

Inspecting model attributes after training

# assuming 'model' is the fitted colboost model
print("Learners:", model.learners) 
print("Weights:", model.weights) 
print("Objective values:", model.objective_values_)
print("Solve times:", model.solve_times_)    
print("Training accuracy per iter:", model.train_accuracies_)
print("Number of iterations:", model.n_iter_)
print("Solver used:", model.model_name_)

# compute margin distribution
margins = model.compute_margins(X, y)
print("First 5 margins (y * f(x)):", margins[:5])

Implemented Formulations

  • NMBoost
    Negative Margin Boosting, emphasizing both accuracy and penalization of negative margins.
    Introduced in our paper (2025)

  • QRLPBoost
    Quadratically Regularized LPBoost with second-order KL-divergence approximation.
    Introduced in our paper (2025)

  • LPBoost
    Linear Programming Boosting with slack variables (soft-margin).
    Demiriz, Bennett, Shawe-Taylor (2002)

  • MDBoost
    Margin Distribution Boosting, optimizing both margin mean and variance.
    Shen & Li (2009)

  • CGBoost
    Column Generation Boosting with L2-regularized margin formulation.
    Bi, Zhang, Bennett (2004)

  • ERLPBoost
    Entropy-Regularized LPBoost using KL-divergence between successive distributions.
    Warmuth, Glocer, Vishwanathan (2008)

Installation (developers)

To install in development mode, clone this repo and:

python3 -m venv env
source env/bin/activate
pip install -e .

To verify the installation, in the root execute:

pytest

Note: the install requires recent versions of pip and of the setuptools library. If needed, update both using:

pip install --upgrade pip setuptools

Contributing

If you have proposed extensions to this codebase, feel free to do a pull request! If you experience issues, please open an issue in GitHub and provide a clear explanation.

Citation

When using the code or data in this repo, please cite the following work:

@article{akkerman2025boosting,
    title={Boosting Revisited: Benchmarking and Advancing {LP}-Based Ensemble Methods},
    author={Fabian Akkerman and Julien Ferry and Christian Artigues and Emmanuel Hébrard and Thibaut Vidal},
    journal={Transactions on Machine Learning Research},
    year={2025},
    url={https://openreview.net/forum?id=lscC4PZUE4},
}

Note: This library is a clean reimplementation of the original code from the paper. While we have carefully validated the implementation, there may be minor discrepancies in results compared to those reported in the paper. For full reproducibility, we provide a separate repository containing the exact codebase used for the paper, along with all result files, including tested hyperparameter configurations and results not shown in the paper, see: https://doi.org/10.4121/f82dcdaa-fc94-43c5-b66d-02579bd3de4f.

  • MIT license
  • Copyright 2025 © Fabian Akkerman, Julien Ferry, Christian Artigues, Emmanuel Hébrard, Thibaut Vidal

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

colboost-0.1.5.tar.gz (18.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

colboost-0.1.5-py3-none-any.whl (24.9 kB view details)

Uploaded Python 3

File details

Details for the file colboost-0.1.5.tar.gz.

File metadata

  • Download URL: colboost-0.1.5.tar.gz
  • Upload date:
  • Size: 18.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.24

File hashes

Hashes for colboost-0.1.5.tar.gz
Algorithm Hash digest
SHA256 a7452b65bdc0fa3f5044d0181f354ea1c52403f7f0df6ea25a6f6ed324adad7c
MD5 346ea5890fb9d1c08253125f228bfdc3
BLAKE2b-256 b1501e723a4b92902872aae4808bc10f70e79a39a597a56400adec2b4410c4ae

See more details on using hashes here.

File details

Details for the file colboost-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: colboost-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 24.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.24

File hashes

Hashes for colboost-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 3ca8d114181a5508e2bdc889f10a477beb8b0dc018a36261231f45e518277bd2
MD5 e6c015637c58a0d1d5cfbcb6b129ace7
BLAKE2b-256 3c50d291903f133ddf7851c603fee7ede1f2e60ae05d6e8677ec8961bbfcfc90

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page