Skip to main content

Probabilistic Gradient Boosting Machines in Pytorch

Project description

PGBM Airlab Amsterdam

PyPi version Python version GitHub license

Probabilistic Gradient Boosting Machines (PGBM) is a probabilistic gradient boosting framework in Python based on PyTorch/Numba, developed by Airlab in Amsterdam. It provides the following advantages over existing frameworks:

  • Probabilistic regression estimates instead of only point estimates. (example)
  • Auto-differentiation of custom loss functions. (example, example)
  • Native GPU-acceleration. (example)
  • Distributed training for CPU and GPU, across multiple nodes. (examples)
  • Ability to optimize probabilistic estimates after training for a set of common distributions, without retraining the model. (example)

In addition, we support the following features:

  • Feature subsampling by tree
  • Sample subsampling ('bagging') by tree
  • Saving, loading and predicting with a trained model (example, example)
  • Checkpointing (continuing training of a model after saving) (example, example)
  • Feature importance by gain and permutation (example, example)
  • Monotone constraints (example, example)
  • Scikit-learn compatible via PGBMRegressor class.

It is aimed at users interested in solving large-scale tabular probabilistic regression problems, such as probabilistic time series forecasting. For more details, read our paper or check out the examples.

Below a simple example using our sklearn wrapper:

from pgbm import PGBMRegressor
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_boston
X, y = load_boston(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1)
model = PGBMRegressor().fit(X_train, y_train)  
yhat_point = model.predict(X_test)
yhat_dist = model.predict_dist(X_test)

Installation

Run pip install pgbm from a terminal within a Python (virtual) environment of your choice.

Verification

  • Download & run an example from the examples folder to verify the installation is correct:
    • Run this example to verify ability to train & predict on CPU with Torch backend.
    • Run this example to verify ability to train & predict on GPU with Torch backend.
    • Run this example to verify ability to train & predict on CPU with Numba backend.
    • Run this example to verify ability to perform distributed CPU, GPU, multi-CPU and/or multi-GPU training.
  • Note that when training on the GPU, the custom CUDA kernel will be JIT-compiled when initializing a model. Hence, the first time you train a model on the GPU it can take a bit longer, as PGBM needs to compile the CUDA kernel.
  • When using the Numba-backend, several functions need to be JIT-compiled. Hence, the first time you train a model using this backend it can take a bit longer.
  • To run the examples some additional packages such as scikit-learn or matplotlib are required; these should be installed separately via pip or conda.

Dependencies

The core package has the following dependencies which should be installed separately (installing the core package via pip will not automatically install these dependencies).

Torch backend
  • CUDA Toolkit matching your PyTorch distribution (https://developer.nvidia.com/cuda-toolkit)
  • PyTorch >= 1.8.0, with CUDA 10.2 for GPU acceleration (https://pytorch.org/get-started/locally/). Verify that PyTorch can find a cuda device on your machine by checking whether torch.cuda.is_available() returns True after installing PyTorch.
  • PGBM uses a custom CUDA kernel which needs to be compiled, which may require installing a suitable compiler. Installing PyTorch and the full CUDA Toolkit should be sufficient, but open an issue if you find it still not working even after installing these dependencies.
  • The CUDA device should have CUDA compute ability 6.x or higher.
  • Scikit-learn in case you want to use our sklearn wrapper PGBMRegressor. (https://scikit-learn.org/stable/)
Numba backend

The Numba backend does not support differentiable loss functions and GPU training is also not supported using this backend.

Support

See the examples folder for examples, an overview of hyperparameters and a function reference. In general, PGBM works similar to existing gradient boosting packages such as LightGBM or xgboost (and it should be possible to more or less use it as a drop-in replacement), except that it is required to explicitly define a loss function and loss metric.

In case further support is required, open an issue.

Reference

Olivier Sprangers, Sebastian Schelter, Maarten de Rijke. Probabilistic Gradient Boosting Machines for Large-Scale Probabilistic Regression. Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 21), August 14–18, 2021, Virtual Event, Singapore.

The experiments from our paper can be replicated by running the scripts in the experiments folder. Datasets are downloaded when needed in the experiments except for higgs and m5, which should be pre-downloaded and saved to the datasets folder (Higgs) and to datasets/m5 (m5).

License

This project is licensed under the terms of the Apache 2.0 license.

Acknowledgements

This project was developed by Airlab Amsterdam.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pgbm-1.2.tar.gz (40.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pgbm-1.2-py3-none-any.whl (43.6 kB view details)

Uploaded Python 3

File details

Details for the file pgbm-1.2.tar.gz.

File metadata

  • Download URL: pgbm-1.2.tar.gz
  • Upload date:
  • Size: 40.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.8.5

File hashes

Hashes for pgbm-1.2.tar.gz
Algorithm Hash digest
SHA256 5798f2732f53bd8176127a0e6f56943d2b5439efb50fd757c3f3f416a9121614
MD5 61bd2f495221b968214627c0bebc7e45
BLAKE2b-256 c689daa37302873d4421730dbdf3246971a1a4a70422d7db1427305d709ae8dc

See more details on using hashes here.

File details

Details for the file pgbm-1.2-py3-none-any.whl.

File metadata

  • Download URL: pgbm-1.2-py3-none-any.whl
  • Upload date:
  • Size: 43.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.8.5

File hashes

Hashes for pgbm-1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 f1b588133d8903e8661966a5b26ce73ca322bcbcff85e170770eba66ef83c75f
MD5 85c2fb5b3b57dfdcfd99e88a96ff870b
BLAKE2b-256 aabaf8c4570ecd5b7a14265cf34b6fba4c231a6eba82966bf6cddf4d69f79713

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page