Probabilistic Gradient Boosting Machines in Pytorch
Reason this release was yanked:
Weird installation issues with 1.6
Project description
PGBM
Probabilistic Gradient Boosting Machines (PGBM) is a probabilistic gradient boosting framework in Python based on PyTorch/Numba, developed by Airlab in Amsterdam. It provides the following advantages over existing frameworks:
- Probabilistic regression estimates instead of only point estimates. (example)
- Auto-differentiation of custom loss functions. (example, example)
- Native GPU-acceleration. (example)
- Distributed training for CPU and GPU, across multiple nodes. (examples)
- Ability to optimize probabilistic estimates after training for a set of common distributions, without retraining the model. (example)
In addition, we support the following features:
- Feature subsampling by tree
- Sample subsampling ('bagging') by tree
- Saving, loading and predicting with a trained model (example, example)
- Checkpointing (continuing training of a model after saving) (example, example)
- Feature importance by gain and permutation (example, example)
- Monotone constraints (example, example)
- Scikit-learn compatible via
PGBMRegressor
class.
It is aimed at users interested in solving large-scale tabular probabilistic regression problems, such as probabilistic time series forecasting. For more details, read our paper or check out the examples.
Below a simple example using our sklearn wrapper:
from pgbm import PGBMRegressor
from sklearn.model_selection import train_test_split
from sklearn.datasets import fetch_california_housing
X, y = fetch_california_housing(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1)
model = PGBMRegressor().fit(X_train, y_train)
yhat_point = model.predict(X_test)
yhat_dist = model.predict_dist(X_test)
Installation
Dependencies
We offer PGBM using two backends, PyTorch (import pgbm
) and Numba (import pgbm_nb
).
Torch backend
torch>=1.8.0
, with CUDA Toolkit >= 10.2 for GPU acceleration (https://pytorch.org/get-started/locally/). Verify that PyTorch can find a cuda device on your machine by checking whethertorch.cuda.is_available()
returnsTrue
after installing PyTorch.ninja>=1.10.2.2
for compiling the custom c++ extensions.- GPU training: the CUDA device should have CUDA compute ability 6.x or higher.
Numba backend
numba>=0.53.1
(https://numba.readthedocs.io/en/stable/user/installing.html). The Numba backend does not support differentiable loss functions and GPU training is also not supported using this backend.
Installation via pip
We recommend to install PGBM using pip
.
- without dependencies:
pip install pgbm
. Use this if you have already installed the above dependencies separately. - with dependencies:
- Torch CPU+GPU:
pip install pgbm[torch-gpu] --find-links https://download.pytorch.org/whl/cu102/torch_stable.html
- Torch CPU-only:
pip install pgbm[torch-cpu]
- Numba:
pip install pgbm[numba]
- All versions (Torch CPU+GPU and Numba):
pip install pgbm[all] --find-links https://download.pytorch.org/whl/cu102/torch_stable.html
- Torch CPU+GPU:
Verification
Both backends use JIT-compilation so you incur additional compilation time the first time you use PGBM.
To verify, download & run an example from the examples folder to verify the installation is correct:
- Run this example to verify ability to train & predict on CPU with Torch backend.
- Run this example to verify ability to train & predict on GPU with Torch backend.
- Run this example to verify ability to train & predict on CPU with Numba backend.
- Run this example to verify ability to perform distributed CPU, GPU, multi-CPU and/or multi-GPU training.
Support
See the examples folder for examples, an overview of hyperparameters and a function reference. In general, PGBM works similar to existing gradient boosting packages such as LightGBM or xgboost (and it should be possible to more or less use it as a drop-in replacement), except that it is required to explicitly define a loss function and loss metric.
In case further support is required, open an issue.
Reference
Olivier Sprangers, Sebastian Schelter, Maarten de Rijke. Probabilistic Gradient Boosting Machines for Large-Scale Probabilistic Regression. Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 21), August 14–18, 2021, Virtual Event, Singapore.
The experiments from our paper can be replicated by running the scripts in the experiments folder. Datasets are downloaded when needed in the experiments except for higgs and m5, which should be pre-downloaded and saved to the datasets folder (Higgs) and to datasets/m5 (m5).
License
This project is licensed under the terms of the Apache 2.0 license.
Acknowledgements
This project was developed by Airlab Amsterdam.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.