Easy hyperparameter optimization and automatic result saving across machine learning algorithms and libraries
Project description
HyperparameterHunter
HyperparameterHunter provides a wrapper for machine learning algorithms that automatically save all the important data in a unified format. Simplify the experimentation and hyperparameter tuning process by letting HyperparameterHunter do the hard work of recording, organizing, and learning from your tests — all while using the same libraries you already do — with no need to provide extra information. Don't let any of your experiments go to waste, and start doing hyperparameter optimization the way it was meant to be.
- Installation:
pip install hyperparameter-hunter
- Source: https://github.com/HunterMcGushion/hyperparameter_hunter
- Documentation: https://hyperparameter-hunter.readthedocs.io
Features
- Automatically record Experiment results
- Truly informed hyperparameter optimization that automatically uses past Experiments
- Eliminate boilerplate code for cross-validation loops, predicting, and scoring
- Stop worrying about keeping track of hyperparameters, scores, or re-running the same Experiments
- Use the libraries and utilities you already love
Getting Started
1) Environment:
Set up an Environment to organize Experiments and Optimization results.
Any Experiments or Optimization rounds we perform will use our active Environment.
from hyperparameter_hunter import Environment, CrossValidationExperiment
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import StratifiedKFold
data = load_breast_cancer()
df = pd.DataFrame(data=data.data, columns=data.feature_names)
df['target'] = data.target
env = Environment(
train_dataset=df, # Add holdout/test dataframes, too
root_results_path='path/to/results/directory', # Where your result files will go
metrics_map=['roc_auc_score'], # Callables, or strings referring to `sklearn.metrics`
cross_validation_type=StratifiedKFold, # Class, or string in `sklearn.model_selection`
cross_validation_params=dict(n_splits=5, shuffle=True, random_state=32)
)
2) Individual Experimentation:
Perform Experiments with your favorite libraries simply by providing model initializers and hyperparameters
Keras
# Same format used by `keras.wrappers.scikit_learn`. Nothing new to learn
def build_fn(input_shape): # `input_shape` calculated for you
model = Sequential([
Dense(100, kernel_initializer='uniform', input_shape=input_shape, activation='relu'),
Dropout(0.5),
Dense(1, kernel_initializer='uniform', activation='sigmoid')
]) # All layer arguments saved (whether explicit or Keras default) for future use
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
return model
experiment = CrossValidationExperiment(
model_initializer=KerasClassifier,
model_init_params=build_fn, # We interpret your build_fn to save hyperparameters in a useful, readable format
model_extra_params=dict(
callbacks=[ReduceLROnPlateau(patience=5)], # Use Keras callbacks
batch_size=32, epochs=10, verbose=0 # Fit/predict arguments
)
)
SKLearn
experiment = CrossValidationExperiment(
model_initializer=LinearSVC, # (Or any of the dozens of other SK-Learn algorithms)
model_init_params=dict(penalty='l1', C=0.9) # Default values used and recorded for kwargs not given
)
XGBoost
experiment = CrossValidationExperiment(
model_initializer=XGBClassifier,
model_init_params=dict(objective='reg:linear', max_depth=3, n_estimators=100, subsample=0.5)
)
LightGBM
experiment = CrossValidationExperiment(
model_initializer=LGBMClassifier,
model_init_params=dict(boosting_type='gbdt', num_leaves=31, max_depth=-1, min_child_samples=5, subsample=0.5)
)
CatBoost
experiment = CrossValidationExperiment(
model_initializer=CatboostClassifier,
model_init_params=dict(iterations=500, learning_rate=0.01, depth=7, allow_writing_files=False),
model_extra_params=dict(fit=dict(verbose=True)) # Send kwargs to `fit` and other extra methods
)
RGF
experiment = CrossValidationExperiment(
model_initializer=RGFClassifier,
model_init_params=dict(max_leaf=1000, algorithm='RGF', min_samples_leaf=10)
)
3) Hyperparameter Optimization:
Just like Experiments, but if you want to optimize a hyperparameter, use the classes imported below
from hyperparameter_hunter import Real, Integer, Categorical
from hyperparameter_hunter import optimization as opt
Keras
def build_fn(input_shape):
model = Sequential([
Dense(Integer(50, 150), input_shape=input_shape, activation='relu'),
Dropout(Real(0.2, 0.7)),
Dense(1, activation=Categorical(['sigmoid', 'softmax']))
])
model.compile(
optimizer=Categorical(['adam', 'rmsprop', 'sgd', 'adadelta']),
loss='binary_crossentropy', metrics=['accuracy']
)
return model
optimizer = opt.RandomForestOptimization(iterations=7)
optimizer.set_experiment_guidelines(
model_initializer=KerasClassifier,
model_init_params=build_fn,
model_extra_params=dict(
callbacks=[ReduceLROnPlateau(patience=Integer(5, 10))],
batch_size=Categorical([32, 64]),
epochs=10, verbose=0
)
)
optimizer.go()
SKLearn
optimizer = opt.DummySearch(iterations=42)
optimizer.set_experiment_guidelines(
model_initializer=AdaBoostClassifier, # (Or any of the dozens of other SKLearn algorithms)
model_init_params=dict(
n_estimators=Integer(75, 150),
learning_rate=Real(0.8, 1.3),
algorithm='SAMME.R'
)
)
optimizer.go()
XGBoost
optimizer = opt.BayesianOptimization(iterations=10)
optimizer.set_experiment_guidelines(
model_initializer=XGBClassifier,
model_init_params=dict(
max_depth=Integer(low=2, high=20),
learning_rate=Real(0.0001, 0.5),
n_estimators=200,
subsample=0.5,
booster=Categorical(['gbtree', 'gblinear', 'dart']),
)
)
optimizer.go()
LightGBM
optimizer = opt.BayesianOptimization(iterations=100)
optimizer.set_experiment_guidelines(
model_initializer=LGBMClassifier,
model_init_params=dict(
boosting_type=Categorical(['gbdt', 'dart']),
num_leaves=Integer(5, 20),
max_depth=-1,
min_child_samples=5,
subsample=0.5
)
)
optimizer.go()
CatBoost
optimizer = opt.GradientBoostedRegressionTreeOptimization(iterations=32)
optimizer.set_experiment_guidelines(
model_initializer=CatBoostClassifier,
model_init_params=dict(
iterations=100,
eval_metric=Categorical(['Logloss', 'Accuracy', 'AUC']),
learning_rate=Real(low=0.0001, high=0.5),
depth=Integer(4, 7),
allow_writing_files=False
)
)
optimizer.go()
RGF
optimizer = opt.ExtraTreesOptimization(iterations=10)
optimizer.set_experiment_guidelines(
model_initializer=RGFClassifier,
model_init_params=dict(
max_leaf=1000,
algorithm=Categorical(['RGF', 'RGF_Opt', 'RGF_Sib']),
l2=Real(0.01, 0.3),
normalize=Categorical([True, False]),
learning_rate=Real(0.3, 0.7),
loss=Categorical(['LS', 'Expo', 'Log', 'Abs'])
)
)
optimizer.go()
Output File Structure
This is a simple illustration of the file structure you can expect your Experiment
s to generate. For an in-depth description of the directory structure and the contents of the various files, see the File Structure Overview section in the documentation. However, the essentials are as follows:
- An
Experiment
adds a file to each HyperparameterHunterAssets/Experiments subdirectory, named byexperiment_id
- Each
Experiment
also adds an entry to HyperparameterHunterAssets/Leaderboards/GlobalLeaderboard.csv - Customize which files are created via
Environment
'sfile_blacklist
anddo_full_save
kwargs (documented here)
HyperparameterHunterAssets
| Heartbeat.log
|
└───Experiments
| |
| └───Descriptions
| | | <Files describing Experiment results, conditions, etc.>.json
| |
| └───Predictions<OOF/Holdout/Test>
| | | <Files containing Experiment predictions for the indicated dataset>.csv
| |
| └───Heartbeats
| | | <Files containing the log produced by the Experiment>.log
| |
| └───ScriptBackups
| | <Files containing a copy of the script that created the Experiment>.py
|
└───Leaderboards
| | GlobalLeaderboard.csv
| | <Other leaderboards>.csv
|
└───TestedKeys
| | <Files named by Environment key, containing hyperparameter keys>.json
|
└───KeyAttributeLookup
| <Files linking complex objects used in Experiments to their hashes>
Installation
pip install hyperparameter-hunter
If you like being on the cutting-edge, and you want all the latest developments, run:
pip install git+https://github.com/HunterMcGushion/hyperparameter_hunter.git
Tested Libraries
- Keras
- scikit-learn
- LightGBM
- CatBoost
- XGBoost
- rgf_python
- ... More on the way
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for hyperparameter_hunter-1.0.0.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | d1022759fe2c543bb7ad606851569ca8f0c97bd009170f5ce2d5b1692bfc39ba |
|
MD5 | 11c4282f67627963cf5ce2c29edf2e34 |
|
BLAKE2b-256 | f4046a77551f43357a138d0a9e712eef31bc0daad82956e59982093b2a242e1b |
Hashes for hyperparameter_hunter-1.0.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 937070cc893fc4b735137a82e973d90f5256c518b309fa4f613516bf7f270518 |
|
MD5 | 8936e091109f1075e4d67e0478989b26 |
|
BLAKE2b-256 | 27fa0a2bed1b0e1a7c28356020da18e39252ba10b09a82ce2131e5bc92bbde00 |