Easy hyperparameter optimization and automatic result saving across machine learning algorithms and libraries

These details have not been verified by PyPI

Project links

Homepage

Project description

HyperparameterHunter

HyperparameterHunter Overview

HyperparameterHunter provides a wrapper for machine learning algorithms that automatically save all the important data in a unified format. Simplify the experimentation and hyperparameter tuning process by letting HyperparameterHunter do the hard work of recording, organizing, and learning from your tests — all while using the same libraries you already do — with no need to provide extra information. Don't let any of your experiments go to waste, and start doing hyperparameter optimization the way it was meant to be.

Installation: pip install hyperparameter-hunter
Source: https://github.com/HunterMcGushion/hyperparameter_hunter
Documentation: https://hyperparameter-hunter.readthedocs.io

Features

Automatically record Experiment results
Truly informed hyperparameter optimization that automatically uses past Experiments
Eliminate boilerplate code for cross-validation loops, predicting, and scoring
Stop worrying about keeping track of hyperparameters, scores, or re-running the same Experiments
Use the libraries and utilities you already love

Getting Started

1) Environment:

Set up an Environment to organize Experiments and Optimization results.
Any Experiments or Optimization rounds we perform will use our active Environment.

from hyperparameter_hunter import Environment, CrossValidationExperiment
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import StratifiedKFold

data = load_breast_cancer()
df = pd.DataFrame(data=data.data, columns=data.feature_names)
df['target'] = data.target

env = Environment(
    train_dataset=df,  # Add holdout/test dataframes, too
    root_results_path='path/to/results/directory',  # Where your result files will go
    metrics_map=['roc_auc_score'],  # Callables, or strings referring to `sklearn.metrics`
    cross_validation_type=StratifiedKFold,  # Class, or string in `sklearn.model_selection`
    cross_validation_params=dict(n_splits=5, shuffle=True, random_state=32)
)

2) Individual Experimentation:

Perform Experiments with your favorite libraries simply by providing model initializers and hyperparameters

Keras

# Same format used by `keras.wrappers.scikit_learn`. Nothing new to learn
def build_fn(input_shape):  # `input_shape` calculated for you
    model = Sequential([
        Dense(100, kernel_initializer='uniform', input_shape=input_shape, activation='relu'),
        Dropout(0.5),
        Dense(1, kernel_initializer='uniform', activation='sigmoid')
    ])  # All layer arguments saved (whether explicit or Keras default) for future use
    model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
    return model

experiment = CrossValidationExperiment(
    model_initializer=KerasClassifier,
    model_init_params=build_fn,  # We interpret your build_fn to save hyperparameters in a useful, readable format
    model_extra_params=dict(
        callbacks=[ReduceLROnPlateau(patience=5)],  # Use Keras callbacks
        batch_size=32, epochs=10, verbose=0  # Fit/predict arguments
    )
)

SKLearn

experiment = CrossValidationExperiment(
    model_initializer=LinearSVC,  # (Or any of the dozens of other SK-Learn algorithms)
    model_init_params=dict(penalty='l1', C=0.9)  # Default values used and recorded for kwargs not given
)

XGBoost

experiment = CrossValidationExperiment(
    model_initializer=XGBClassifier,
    model_init_params=dict(objective='reg:linear', max_depth=3, n_estimators=100, subsample=0.5)
)

LightGBM

experiment = CrossValidationExperiment(
    model_initializer=LGBMClassifier,
    model_init_params=dict(boosting_type='gbdt', num_leaves=31, max_depth=-1, min_child_samples=5, subsample=0.5)
)

CatBoost

experiment = CrossValidationExperiment(
    model_initializer=CatboostClassifier,
    model_init_params=dict(iterations=500, learning_rate=0.01, depth=7, allow_writing_files=False),
    model_extra_params=dict(fit=dict(verbose=True))  # Send kwargs to `fit` and other extra methods
)

RGF

experiment = CrossValidationExperiment(
    model_initializer=RGFClassifier,
    model_init_params=dict(max_leaf=1000, algorithm='RGF', min_samples_leaf=10)
)

3) Hyperparameter Optimization:

Just like Experiments, but if you want to optimize a hyperparameter, use the classes imported below

from hyperparameter_hunter import Real, Integer, Categorical
from hyperparameter_hunter import optimization as opt

Keras

def build_fn(input_shape):
    model = Sequential([
        Dense(Integer(50, 150), input_shape=input_shape, activation='relu'),
        Dropout(Real(0.2, 0.7)),
        Dense(1, activation=Categorical(['sigmoid', 'softmax']))
    ])
    model.compile(
        optimizer=Categorical(['adam', 'rmsprop', 'sgd', 'adadelta']),
        loss='binary_crossentropy', metrics=['accuracy']
    )
    return model

optimizer = opt.RandomForestOptimization(iterations=7)
optimizer.set_experiment_guidelines(
    model_initializer=KerasClassifier,
    model_init_params=build_fn,
    model_extra_params=dict(
        callbacks=[ReduceLROnPlateau(patience=Integer(5, 10))],
        batch_size=Categorical([32, 64]),
        epochs=10, verbose=0
    )
)
optimizer.go()

SKLearn

optimizer = opt.DummySearch(iterations=42)
optimizer.set_experiment_guidelines(
    model_initializer=AdaBoostClassifier,  # (Or any of the dozens of other SKLearn algorithms)
    model_init_params=dict(
        n_estimators=Integer(75, 150),
        learning_rate=Real(0.8, 1.3),
        algorithm='SAMME.R'
    )
)
optimizer.go()

XGBoost

optimizer = opt.BayesianOptimization(iterations=10)
optimizer.set_experiment_guidelines(
    model_initializer=XGBClassifier,
    model_init_params=dict(
        max_depth=Integer(low=2, high=20),
        learning_rate=Real(0.0001, 0.5),
        n_estimators=200,
        subsample=0.5,
        booster=Categorical(['gbtree', 'gblinear', 'dart']),
    )
)
optimizer.go()

LightGBM

optimizer = opt.BayesianOptimization(iterations=100)
optimizer.set_experiment_guidelines(
    model_initializer=LGBMClassifier,
    model_init_params=dict(
        boosting_type=Categorical(['gbdt', 'dart']),
        num_leaves=Integer(5, 20),
        max_depth=-1,
        min_child_samples=5,
        subsample=0.5
    )
)
optimizer.go()

CatBoost

optimizer = opt.GradientBoostedRegressionTreeOptimization(iterations=32)
optimizer.set_experiment_guidelines(
    model_initializer=CatBoostClassifier,
    model_init_params=dict(
        iterations=100,
        eval_metric=Categorical(['Logloss', 'Accuracy', 'AUC']),
        learning_rate=Real(low=0.0001, high=0.5),
        depth=Integer(4, 7),
        allow_writing_files=False
    )
)
optimizer.go()

RGF

optimizer = opt.ExtraTreesOptimization(iterations=10)
optimizer.set_experiment_guidelines(
    model_initializer=RGFClassifier,
    model_init_params=dict(
        max_leaf=1000,
        algorithm=Categorical(['RGF', 'RGF_Opt', 'RGF_Sib']),
        l2=Real(0.01, 0.3),
        normalize=Categorical([True, False]),
        learning_rate=Real(0.3, 0.7),
        loss=Categorical(['LS', 'Expo', 'Log', 'Abs'])
    )
)
optimizer.go()

Output File Structure

This is a simple illustration of the file structure you can expect your Experiments to generate. For an in-depth description of the directory structure and the contents of the various files, see the File Structure Overview section in the documentation. However, the essentials are as follows:

An Experiment adds a file to each HyperparameterHunterAssets/Experiments subdirectory, named by experiment_id
Each Experiment also adds an entry to HyperparameterHunterAssets/Leaderboards/GlobalLeaderboard.csv
Customize which files are created via Environment's file_blacklist and do_full_save kwargs (documented here)

HyperparameterHunterAssets
|   Heartbeat.log
|
└───Experiments
|   |
|   └───Descriptions
|   |   |   <Files describing Experiment results, conditions, etc.>.json
|   |
|   └───Predictions<OOF/Holdout/Test>
|   |   |   <Files containing Experiment predictions for the indicated dataset>.csv
|   |
|   └───Heartbeats
|   |   |   <Files containing the log produced by the Experiment>.log
|   |
|   └───ScriptBackups
|       |   <Files containing a copy of the script that created the Experiment>.py
|
└───Leaderboards
|   |   GlobalLeaderboard.csv
|   |   <Other leaderboards>.csv
|
└───TestedKeys
|   |   <Files named by Environment key, containing hyperparameter keys>.json
|
└───KeyAttributeLookup
    |   <Files linking complex objects used in Experiments to their hashes>

Installation

pip install hyperparameter-hunter

If you like being on the cutting-edge, and you want all the latest developments, run:

pip install git+https://github.com/HunterMcGushion/hyperparameter_hunter.git

Tested Libraries

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

3.0.0

Aug 6, 2019

3.0.0b1 pre-release

Aug 6, 2019

3.0.0b0 pre-release

Jul 14, 2019

3.0.0a2 pre-release

Jun 12, 2019

3.0.0a1 pre-release

Jun 8, 2019

3.0.0a0 pre-release

Jun 7, 2019

2.2.0

Feb 11, 2019

2.1.1

Jan 16, 2019

2.1.0

Jan 16, 2019

2.0.1

Nov 25, 2018

2.0.0

Nov 17, 2018

1.1.0

Oct 5, 2018

1.0.2

Aug 27, 2018

1.0.1

Aug 20, 2018

This version

1.0.0

Aug 20, 2018

0.0.1

Jun 14, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hyperparameter_hunter-1.0.0.tar.gz (114.6 kB view hashes)

Uploaded Aug 20, 2018 Source

Built Distribution

hyperparameter_hunter-1.0.0-py3-none-any.whl (134.6 kB view hashes)

Uploaded Aug 20, 2018 Python 3

Hashes for hyperparameter_hunter-1.0.0.tar.gz

Hashes for hyperparameter_hunter-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`d1022759fe2c543bb7ad606851569ca8f0c97bd009170f5ce2d5b1692bfc39ba`
MD5	`11c4282f67627963cf5ce2c29edf2e34`
BLAKE2b-256	`f4046a77551f43357a138d0a9e712eef31bc0daad82956e59982093b2a242e1b`

Hashes for hyperparameter_hunter-1.0.0-py3-none-any.whl

Hashes for hyperparameter_hunter-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`937070cc893fc4b735137a82e973d90f5256c518b309fa4f613516bf7f270518`
MD5	`8936e091109f1075e4d67e0478989b26`
BLAKE2b-256	`27fa0a2bed1b0e1a7c28356020da18e39252ba10b09a82ce2131e5bc92bbde00`