LightGBM Classifier with integrated bayesian hyperparameter optimization
Project description
(100)gecs
Bayesian hyperparameter tuning for LGBMClassifier, LGBMRegressor, CatBoostClassifier and CatBoostRegressor with a scikit-learn API
Table of Contents
Project Overview
gecs
is a tool to help automate the process of hyperparameter tuning for boosting classifiers and regressors, which can potentially save significant time and computational resources in model building and optimization processes. The GEC
stands for Good Enough Classifier, which allows you to focus on other tasks such as feature engineering. If you deploy 100 of them, you get 100GECs.
Introduction
The primary class in this package is LightGEC
, which is derived from LGBMClassifier
. Like its parent, LightGEC can be used to build and train gradient boosting models, but with the added feature of automated bayesian hyperparameter optimization. It can be imported from gecs.lightgec
and then used in place of LGBMClassifier
, with the same API.
By default, LightGEC
optimizes num_leaves
, boosting_type
, learning_rate
, reg_alpha
, reg_lambda
, min_child_samples
, min_child_weight
, colsample_bytree
, subsample_freq
, subsample
and optionallyn_estimators
. Which hyperparameters to tune is fully customizable.
Installation
The installation requires cmake
, which can be installed using apt
on linux or brew
on mac. Then you can install (100)gecs using pip.
pip install gecs
Usage
The LightGEC
class provides the same API to the user as the LGBMClassifier
class of lightgbm
, and additionally:
-
the two additional parameters to the fit method:
-
n_iter
: Defines the number of hyperparameter combinations that the model should try. More iterations could lead to better model performance, but at the expense of computational resources -
fixed_hyperparameters
: Allows the user to specify hyperparameters that the GEC should not optimize. By default, onlyn_estimators
is fixed. Any of the LGBMClassifier init arguments can be fixed, and so cansubsample_freq
andsubsample
, but only jointly. This is done by passing the valuebagging
.
-
-
the methods
serialize
anddeserialize
, which stores theLightGEC
state for the hyperparameter optimization process, but not the fittedLGBMClassifier
parameters, to a json file. To store the boosted tree model itself, you have to provide your own serialization or usepickle
-
the methods
freeze
andunfreeze
that turn theLightGEC
functionally into aLGBMClassifier
and back
Example
The default use of LightGEC
would look like this:
from sklearn.datasets import load_iris
from gecs.lightgec import LightGEC # LGBMClassifier with hyperparameter optimization
from gecs.lightger import LightGER # LGBMRegressor with hyperparameter optimization
from gecs.catgec import CatGEC # CatBoostClassifier with hyperparameter optimization
from gecs.catger import CatGER # CatBoostRegressor with hyperparameter optimization
X, y = load_iris(return_X_y=True)
# fit and infer GEC
gec = LightGEC()
gec.fit(X, y)
yhat = gec.predict(X)
# manage GEC state
path = "./gec.json"
gec.serialize(path) # stores gec data and settings, but not underlying LGBMClassifier attributes
gec2 = LightGEC.deserialize(path, X, y) # X and y are necessary to fit the underlying LGBMClassifier
gec.freeze() # freeze GEC so that it behaves like a LGBMClassifier
gec.unfreeze() # unfreeze to enable GEC hyperparameter optimisation
# benchmark against LGBMClassifier
from lightgbm import LGBMClassifier
from sklearn.model_selection import cross_val_score
import numpy as np
clf = LGBMClassifier()
lgbm_score = np.mean(cross_val_score(clf, X, y))
gec.freeze()
gec_score = np.mean(cross_val_score(gec, X, y))
print(f"{gec_score = }, {lgbm_score = }")
assert gec_score > lgbm_score, "GEC doesn't outperform LGBMClassifier"
#check what hyperparameter combinations were tried
gec.tried_hyperparameters()
Contributing
If you want to contribute, please reach out and I'll design a process around it.
License
MIT
Contact Information
You can find my contact information on my website: https://leonluithlen.eu
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file gecs-0.1.1.tar.gz
.
File metadata
- Download URL: gecs-0.1.1.tar.gz
- Upload date:
- Size: 20.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.6.1 CPython/3.10.10 Linux/5.15.0-76-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 66466364229ceffa210676147209c6e5a1d818561bfaae650c7800b2fa49871f |
|
MD5 | 11d6e67c67da6c6aa9595fe2e0486561 |
|
BLAKE2b-256 | 4cc92eb1e10f5dee0eec05780d7b516e4c58100e3caa553020ed2f07f1089632 |
File details
Details for the file gecs-0.1.1-py3-none-any.whl
.
File metadata
- Download URL: gecs-0.1.1-py3-none-any.whl
- Upload date:
- Size: 27.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.6.1 CPython/3.10.10 Linux/5.15.0-76-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | cca6eab53d9566f1102cfa165a0c4eae50377deb6663ad08f35e3473849be426 |
|
MD5 | dad99bd676426f171666b27508c54a62 |
|
BLAKE2b-256 | 6c12f63b4057c4ae80b9dbbef0057ba3810e10ef7c7fcc649d19a4a5bdac4491 |