High performance implementation of GBDT family of algorithm
Project description
GBDT is a high performance and full featured C++ implementation of [Jerome H. Friedman's Gradient Boosting Decision Trees Algorithm](http://statweb.stanford.edu/~jhf/ftp/stobst.pdf) and its modern offsprings,. It features high efficiency, low memory footprint, collections of loss functions and built-in mechanisms to handle categorical features and missing values.
When is GBDT good for you?
-----------
* **You are looking beyond linear models.**
* Gradient Boosting Decision Trees Algorithms is one of the best offshelf ML algorithms with built-in capabilities of non-linear transformation and feature crossing.
* **Your data is too big to load into memory with existing ML packages.**
* GBDT reduces memory footprint dramatically with feature bucketization. For some tested datasets, it used 1/7 of the memory of its counterpart and took only 1/2 time to train. See [docs/PERFORMANCE_BENCHMARK.md](https://github.com/yarny/gbdt/blob/master/docs/PERFORMANCE_BENCHMARK.md) for more details.
* **You want better handling of categorical features and missing values.**
* GBDT has built-in mechanisms to figure out how to split categorical features and place missing values in the trees.
* **You want to try different loss functions.**
* GBDT implements various pointwise, pairwise, listingwis loss functions including mse, logloss, huberized hinge loss, pairwise logloss,
[GBRank](http://www.cc.gatech.edu/~zha/papers/fp086-zheng.pdf) and [LambdaMart](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/MSR-TR-2010-82.pdf). It supports easily addition of your own custom loss functions.
When is GBDT good for you?
-----------
* **You are looking beyond linear models.**
* Gradient Boosting Decision Trees Algorithms is one of the best offshelf ML algorithms with built-in capabilities of non-linear transformation and feature crossing.
* **Your data is too big to load into memory with existing ML packages.**
* GBDT reduces memory footprint dramatically with feature bucketization. For some tested datasets, it used 1/7 of the memory of its counterpart and took only 1/2 time to train. See [docs/PERFORMANCE_BENCHMARK.md](https://github.com/yarny/gbdt/blob/master/docs/PERFORMANCE_BENCHMARK.md) for more details.
* **You want better handling of categorical features and missing values.**
* GBDT has built-in mechanisms to figure out how to split categorical features and place missing values in the trees.
* **You want to try different loss functions.**
* GBDT implements various pointwise, pairwise, listingwis loss functions including mse, logloss, huberized hinge loss, pairwise logloss,
[GBRank](http://www.cc.gatech.edu/~zha/papers/fp086-zheng.pdf) and [LambdaMart](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/MSR-TR-2010-82.pdf). It supports easily addition of your own custom loss functions.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
gbdt-0.3.1.2.tar.gz
(3.1 MB
view details)
File details
Details for the file gbdt-0.3.1.2.tar.gz.
File metadata
- Download URL: gbdt-0.3.1.2.tar.gz
- Upload date:
- Size: 3.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e2c43cbd4bf7eb41431d6a01993893772b96637cae4f450de201efa7c2f2e7d9
|
|
| MD5 |
254dd85f71d6f6c982b141d0265d1873
|
|
| BLAKE2b-256 |
3348ad5e3c0356d85dd3c85006fe512e822e595a333f355dcb60be720cada9e1
|