Skip to main content

XGBoost for probabilistic prediction.

Project description

https://github.com/CDonnerer/xgboost-distribution/actions/workflows/test.yml/badge.svg?branch=main https://coveralls.io/repos/github/CDonnerer/xgboost-distribution/badge.svg?branch=main Documentation Status PyPI-Server

xgboost-distribution

XGBoost for probabilistic prediction. Like NGBoost, but faster, and in the XGBoost scikit-learn API.

XGBDistribution example

Installation

$ pip install xgboost-distribution

Usage

XGBDistribution follows the XGBoost scikit-learn API, with an additional keyword argument specifying the distribution (see the documentation for a full list of available distributions):

from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split

from xgboost_distribution import XGBDistribution


data = load_boston()
X, y = data.data, data.target
X_train, X_test, y_train, y_test = train_test_split(X, y)

model = XGBDistribution(distribution="normal", n_estimators=500)
model.fit(
    X_train, y_train,
    eval_set=[(X_test, y_test)],
    early_stopping_rounds=10
)

After fitting, we can predict the parameters of the distribution:

preds = model.predict(X_test)
mean, std = preds.loc, preds.scale

Note that this returned a namedtuple of numpy arrays for each parameter of the distribution (we use the scipy stats naming conventions for the parameters, see e.g. scipy.stats.norm for the normal distribution).

NGBoost performance comparison

XGBDistribution follows the method shown in the NGBoost library, using natural gradients to estimate the parameters of the distribution.

Below, we show a performance comparison of XGBDistribution with the NGBoost NGBRegressor, using the Boston Housing dataset, estimating normal distributions. We note that while the performance of the two models is essentially identical (measured on negative log-likelihood of a normal distribution and the RMSE), XGBDistribution is 30x faster (timed on both fit and predict steps):

XGBDistribution vs NGBoost

Please see the experiments page in the documentation for detailed results across various datasets.

Full XGBoost features

XGBDistribution offers the full set of XGBoost features available in the XGBoost scikit-learn API, allowing, for example, probabilistic regression with monotonic constraints:

XGBDistribution monotonic constraints

Acknowledgements

This package would not exist without the excellent work from:

  • NGBoost - Which demonstrated how gradient boosting with natural gradients can be used to estimate parameters of distributions. Much of the gradient calculations code were adapted from there.

  • XGBoost - Which provides the gradient boosting algorithms used here, in particular the sklearn APIs were taken as a blue-print.

Note

This project has been set up using PyScaffold 4.0.1. For details and usage information on PyScaffold see https://pyscaffold.org/.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xgboost-distribution-0.2.0.tar.gz (205.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

xgboost_distribution-0.2.0-py2.py3-none-any.whl (16.8 kB view details)

Uploaded Python 2Python 3

File details

Details for the file xgboost-distribution-0.2.0.tar.gz.

File metadata

  • Download URL: xgboost-distribution-0.2.0.tar.gz
  • Upload date:
  • Size: 205.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.4 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.0 CPython/3.8.11

File hashes

Hashes for xgboost-distribution-0.2.0.tar.gz
Algorithm Hash digest
SHA256 7bc39b3d9407074926d424bebd689641f86cc8cb7357b4700cecb14cdff9cf01
MD5 cf906cf3b9248ad8d9f67621d0e60eab
BLAKE2b-256 5542736bd8c761930c291cf69a96572e0e0a9dcc85f628be8fb5c87cf81cc7cb

See more details on using hashes here.

File details

Details for the file xgboost_distribution-0.2.0-py2.py3-none-any.whl.

File metadata

  • Download URL: xgboost_distribution-0.2.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 16.8 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.4 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.0 CPython/3.8.11

File hashes

Hashes for xgboost_distribution-0.2.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 019884df4551aae6221547708d95a1d67dfc74928629843b0c1101e7203025d5
MD5 2675ddef1654a4afe70bb4417a23f364
BLAKE2b-256 2e2e16a413ed4ec3d757d52cc13463101f774dbab86c60d455e566b8c521ab82

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page