A selective ensemble for predictive models that tests new additions to prevent downgrades in performance.
Project description
clique-ml
A selective ensemble for predictive time-series models that tests new additions to prevent downgrades in performance.
This code was written and tested against a CUDA 12.2 environment; if you run into compatability issues, try setting up a venv
using cuda-venv.sh
.
Usage
Setup
pip install -U clique-ml
import clique
Training
Create a list of models to train. Supports any class that can call fit()
, predict()
, and get_params()
:
import xgboost as xgb
import lightgbm as lgb
import catboost as cat
import tensorflow as tf
models = [
xgb.XGBRegressor(...),
lgb.LGBMRegressor(...),
cat.CatBoostRegressor(...),
tf.keras.Sequential(...),
]
Data is automatically split for training, testing, and validaiton, so simply pass models
, inputs (X
) and targets(y
) to train_ensemble()
:
X, y = ... # preprocessed data; 20% is set aside for validation, and the rest is trained on using k-folds
ensemble = clique.train_ensemble(models, X, y, folds=5, limit=3) # instance of clique.SelectiveEnsemble
folds
sets n_splits
for scikit-learn's TimeSeriesSplit
class, which is used to implement k-folds here. For a single split, pass folds=1
.
limit
sets a soft target for how many models to include in the ensemble. When set, once exceeded, the ensemble will reject new models that raise its mean score.
By default, the ensemble trains using 5 folds and no size limit.
Evaluation
train_ensemble()
will output the results of each sub-model's training on every fold:
Pre-training setup...Complete (0.0s)
Model 1/5: Fold 1/5: Stopped: PredictionError: Model is guessing a constant value. -- 3
Model 2/5: Fold 1/5: Stopped: PredictionError: Model is guessing a constant value. -- 3
Model 3/5: Fold 1/5: Accepted with score: 0.03233311 (0.1s) (CatBoostRegressor_1731893049_0)
Model 3/5: Fold 2/5: Accepted with score: 0.02314115 (0.0s) (CatBoostRegressor_1731893050_1)
Model 3/5: Fold 3/5: Accepted with score: 0.01777214 (0.0s) (CatBoostRegressor_1731893050_2)
...
Model 5/5: Fold 2/5: Rejected with score: 0.97019375 (0.3s)
Model 5/5: Fold 3/5: Rejected with score: 0.41385662 (1.4s)
Model 5/5: Fold 4/5: Rejected with score: 0.41153231 (0.8s)
Model 5/5: Fold 5/5: Rejected with score: 0.40335007 (1.6s)
Once trained, details of the final ensemble can be reviewed with:
print(ensemble) # <SelectiveEnsemble (5 model(s); mean: 0.03389993; best: 0.03321487; limit: 3)>
Or:
print(len(ensemble)) # 5
print(ensemble.mean_score) # 0.033899934449981864
print(ensemble.best_score) # 0.033214874389494775
Pruning
Since SelectiveEnsemble
has to accept the first N models to establish a mean, frontloading with weaker models may cause oversaturation, even when limit
is set.
To remedy this, call SelectiveEnsemble.prune()
:
pruned = ensemble.prune()
Which will return a copy of the ensemble with all sub-models scoring above the mean removed.
If a limit
is passed in, the removal of all models above the mean will recurse until that limit is reached:
triumvate = ensemble.prune(3)
print(len(ensemble)) # 3 (or less)
This recursion is automatic for instances where SelectiveEnsemble.limit
is set manually or by train_ensemble()
.
Deployment
To make predictions, simply call:
predictions = ensemble.predict(...) # with a new set of inputs
Which will use the mean score across all sub-models for each prediction.
If you wish to continue training on an existing ensemble, use:
existing = clique.load_ensemble(X_test=X, y_test=y) # test data must be passed in for new model evaluation
updated = clique.train_ensemble(models, X, y, ensemble=existing)
Note that if a limit is set on the existing model, that will be set and enforced on the updated one.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file clique_ml-0.0.4.tar.gz
.
File metadata
- Download URL: clique_ml-0.0.4.tar.gz
- Upload date:
- Size: 7.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b8effffa395239f12f6aa931cdaccc85b0f33924166b65b91d51ff74294ee76a |
|
MD5 | eea1d606bf4caad0ea459b919a2d2325 |
|
BLAKE2b-256 | 614985d480ecd996be70819cd19c0ee25e25a6203c51c32091402113cb75e52d |
File details
Details for the file clique_ml-0.0.4-py3-none-any.whl
.
File metadata
- Download URL: clique_ml-0.0.4-py3-none-any.whl
- Upload date:
- Size: 6.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b2347fe07ac5f8641ea7fa69164a5744aa8b1a35b01fe0c54a38818cadc42c0c |
|
MD5 | dcd5eb4489db1ee4056c198869dbbf86 |
|
BLAKE2b-256 | 2369bcc9a7cd74bbde11b1bdfa7c31167f6b246d39ee0657e070cb0042a83ba7 |