Skip to main content

XGBoost for probabilistic prediction.

Project description

https://github.com/CDonnerer/xgboost-distribution/actions/workflows/test.yml/badge.svg?branch=main https://coveralls.io/repos/github/CDonnerer/xgboost-distribution/badge.svg?branch=main Documentation Status PyPI-Server

xgboost-distribution

XGBoost for probabilistic prediction. Like NGBoost, but faster, and in the XGBoost scikit-learn API.

XGBDistribution example

Installation

$ pip install --upgrade xgboost-distribution

Usage

XGBDistribution follows the XGBoost scikit-learn API, with an additional keyword in the constructor for specifying the distribution (see the documentation for a full list of available distributions):

from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split

from xgboost_distribution import XGBDistribution


data = load_boston()
X, y = data.data, data.target
X_train, X_test, y_train, y_test = train_test_split(X, y)

model = XGBDistribution(
    distribution="normal",
    n_estimators=500
)
model.fit(
    X_train, y_train,
    eval_set=[(X_test, y_test)],
    early_stopping_rounds=10
)

After fitting, we can predict the parameters of the distribution:

preds = model.predict(X_test)
mean, std = preds.loc, preds.scale

Note that this returned a namedtuple of numpy arrays for each parameter of the distribution (we use the scipy naming conventions, see e.g. scipy.stats.norm).

NGBoost performance comparison

XGBDistribution follows the method shown in the NGBoost library, using natural gradients to estimate the parameters of the distribution.

Below, we show a performance comparison of the NGBoost NGBRegressor and XGBDistribution models, using the Boston Housing dataset and a normal distribution (similar hyperparameters). We note that while the performance of the two models is essentially identical, XGBDistribution is 50x faster (timed on both fit and predict steps):

XGBDistribution vs NGBoost

Note that the speed-up will decrease with dataset size, as it is ultimately limited by the natural gradient computation (via LAPACK gesv). However, with 1m rows of data XGBDistribution is still 10x faster than NGBRegressor.

Full XGBoost features

XGBDistribution offers the full set of XGBoost features available in the XGBoost scikit-learn API, allowing, for example, probabilistic regression with monotonic constraints:

XGBDistribution monotonic constraints

Acknowledgements

This package would not exist without the excellent work from:

  • NGBoost - Which demonstrated how gradient boosting with natural gradients can be used to estimate parameters of distributions. Much of the gradient calculations code were adapted from there.

  • XGBoost - Which provides the gradient boosting algorithms used here, in particular the sklearn APIs were taken as a blue-print.

Note

This project has been set up using PyScaffold 4.0.1. For details and usage information on PyScaffold see https://pyscaffold.org/.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xgboost-distribution-0.1.2.tar.gz (200.1 kB view details)

Uploaded Source

Built Distribution

xgboost_distribution-0.1.2-py2.py3-none-any.whl (14.7 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file xgboost-distribution-0.1.2.tar.gz.

File metadata

  • Download URL: xgboost-distribution-0.1.2.tar.gz
  • Upload date:
  • Size: 200.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.8.10

File hashes

Hashes for xgboost-distribution-0.1.2.tar.gz
Algorithm Hash digest
SHA256 8b58dea92138cb1aed860a02bae0a6074fdaee971e07a77b7d2bc5b826617516
MD5 0bf081f2cff76b81b60450dd49b848d5
BLAKE2b-256 c096a8e1d987a643120e3bbeb486a8f5f4c1dbd9c5ba9800bbe58ea2c759392b

See more details on using hashes here.

File details

Details for the file xgboost_distribution-0.1.2-py2.py3-none-any.whl.

File metadata

  • Download URL: xgboost_distribution-0.1.2-py2.py3-none-any.whl
  • Upload date:
  • Size: 14.7 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.8.10

File hashes

Hashes for xgboost_distribution-0.1.2-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 90e544b92f545f3b6856e662733dea06ac76d514f2b219e586ce67df5c9d476b
MD5 ccb309cf39d513a9d3d3017cedf4e9b9
BLAKE2b-256 406024050a992cbf9e100d902d65774988126579d978899c35964b0a12b882c8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page