Skip to main content

A selective ensemble for predictive models that tests new additions to prevent downgrades in performance.

Project description

clique-ml

A selective ensemble for predictive time-series models that tests new additions to prevent downgrades in performance.

This code was written and tested against a CUDA 12.2 environment; if you run into compatability issues, try setting up a venv using cuda-venv.sh.

Usage

Setup

pip install -U clique-ml
import clique

Training

Create a list of models to train. Supports any class that can call fit(), predict(), and get_params():

import xgboost as xgb
import lightgbm as lgb
import catboost as cat
import tensorflow as tf

models = [
    xgb.XGBRegressor(...),
    lgb.LGBMRegressor(...),
    cat.CatBoostRegressor(...),
    tf.keras.Sequential(...),
]

Data is automatically split for training, testing, and validaiton, so simply pass models, inputs (X) and targets(y) to train_ensemble():

X, y = ... # preprocessed data; 20% is set aside for validation, and the rest is trained on using k-folds

ensemble = clique.train_ensemble(models, X, y, folds=5, limit=3) # instance of clique.SelectiveEnsemble

folds sets n_splits for scikit-learn's TimeSeriesSplit class, which is used to implement k-folds here. For a single split, pass folds=1.

limit sets a soft target for how many models to include in the ensemble. When set, once exceeded, the ensemble will reject new models that raise its mean score.

By default, the ensemble trains using 5 folds and no size limit.

Evaluation

train_ensemble() will output the results of each sub-model's training on every fold:

Pre-training setup...Complete (0.0s)
Model 1/5: Fold 1/5: Stopped: PredictionError: Model is guessing a constant value. -- 3      
Model 2/5: Fold 1/5: Stopped: PredictionError: Model is guessing a constant value. -- 3       
Model 3/5: Fold 1/5: Accepted with score: 0.03233311 (0.1s) (CatBoostRegressor_1731893049_0)          
Model 3/5: Fold 2/5: Accepted with score: 0.02314115 (0.0s) (CatBoostRegressor_1731893050_1)          
Model 3/5: Fold 3/5: Accepted with score: 0.01777214 (0.0s) (CatBoostRegressor_1731893050_2)
...      
Model 5/5: Fold 2/5: Rejected with score: 0.97019375 (0.3s)                            
Model 5/5: Fold 3/5: Rejected with score: 0.41385662 (1.4s)                         
Model 5/5: Fold 4/5: Rejected with score: 0.41153231 (0.8s)          
Model 5/5: Fold 5/5: Rejected with score: 0.40335007 (1.6s) 

Once trained, details of the final ensemble can be reviewed with:

print(ensemble) # 

Or:

print(len(ensemble))
print(ensemble.mean_score)
print(ensemble.best_score)

Pruning

Since SelectiveEnsemble has to accept the first N models to establish a mean, frontloading with weaker models may cause oversaturation, even when limit is set.

To remedy this, call SelectiveEnsemble.prune():

pruned = ensemble.prune()

Which will return a copy of the ensemble with all sub-models scoring above the mean removed.

If a limit is passed in, the removal of all models above the mean will recurse until that limit is reached:

triumvate = ensemble.prune(3)
print(len(ensemble)) # 3 (or less)

This recursion is automatic for instances where SelectiveEnsemble.limit is set manually or by train_ensemble().

Deployment

To make predictions, simply call:

predictions = ensemble.predict(...) # with a new set of inputs

Which will use the mean score across all sub-models for each prediction.

If you wish to continue training on an existing ensemble, use:

existing = clique.load_ensemble(X_test=X, y_test=y) # test data must be passed in for new model evaluation
updated = clique.train_ensemble(models, X, y, ensemble=existing)

Note that if a limit is set on the existing model, that will be set and enforced on the updated one.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

clique_ml-0.0.3.tar.gz (7.5 kB view details)

Uploaded Source

Built Distribution

clique_ml-0.0.3-py3-none-any.whl (6.8 kB view details)

Uploaded Python 3

File details

Details for the file clique_ml-0.0.3.tar.gz.

File metadata

  • Download URL: clique_ml-0.0.3.tar.gz
  • Upload date:
  • Size: 7.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.12

File hashes

Hashes for clique_ml-0.0.3.tar.gz
Algorithm Hash digest
SHA256 cd70aa8833f0d00a724a457559dfe5b7024ad7606b11446b1e0fecb390efcd86
MD5 3b3376f7f81f497220f360c4b8d7dd13
BLAKE2b-256 8e4abbb8dbfda0248eba1345877b44b391c1110d8b5ed4ad6a1c73992a733bbb

See more details on using hashes here.

File details

Details for the file clique_ml-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: clique_ml-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 6.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.12

File hashes

Hashes for clique_ml-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 7d65594adbcd547b63a069e4e05ea6b300646e87651af7c7fa9f9551f381e910
MD5 ac0677dc858d4abd229381acebf23e20
BLAKE2b-256 36a25cc72f5c8b0ecd4020421b5ed90acead09ac36441e32a5423193f203ca3a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page