This package is an automatic machine learning module whose function is to optimize the hyper-parameters of an automatic learning model. Code at: https://github.com/JeremieGince/AutoMLpy .
Project description
This package is an automatic machine learning module whose function is to optimize the hyper-parameters of an automatic learning model.
In this package you can find: a grid search method, a random search algorithm and a Gaussian process search method. Everything is implemented to be compatible with the Tensorflow, pyTorch and sklearn libraries.
Installation
Latest stable version:
pip install AutoMLpy
Latest unstable version:
- Download the .whl file here;
- Copy the path of this file on your computer;
- pip install it with
pip install [path].whl
Example - MNIST optimization with Tensorflow & Keras
Here you can see an example on how to optimize a model made with Tensorflow and Keras on the popular dataset MNIST.
Imports
We start by importing some useful stuff.
# Some useful packages
from typing import Union, Tuple
import time
import numpy as np
import pandas as pd
import pprint
# Tensorflow
import tensorflow as tf
import tensorflow_datasets as tfds
# Importing the HPOptimizer and the RandomHpSearch from the AutoMLpy package.
from AutoMLpy import HpOptimizer, RandomHpSearch
Dataset
Now we load the MNIST dataset in the tensorflow way.
def normalize_img(image, label):
"""Normalizes images: `uint8` -> `float32`."""
return tf.cast(image, tf.float32) / 255., label
def get_tf_mnist_dataset(**kwargs):
# https://www.tensorflow.org/datasets/keras_example
(ds_train, ds_test), ds_info = tfds.load(
'mnist',
split=['train', 'test'],
shuffle_files=True,
as_supervised=True,
with_info=True,
)
# Build training pipeline
ds_train = ds_train.map(normalize_img, num_parallel_calls=tf.data.experimental.AUTOTUNE)
ds_train = ds_train.cache()
ds_train = ds_train.shuffle(ds_info.splits['train'].num_examples)
ds_train = ds_train.batch(128)
ds_train = ds_train.prefetch(tf.data.experimental.AUTOTUNE)
# Build evaluation pipeline
ds_test = ds_test.map(normalize_img, num_parallel_calls=tf.data.experimental.AUTOTUNE)
ds_test = ds_test.batch(128)
ds_test = ds_test.cache()
ds_test = ds_test.prefetch(tf.data.experimental.AUTOTUNE)
return ds_train, ds_test
Keras Model
Now we make a function that return a keras model given a set of hyper-parameters (hp).
def get_tf_mnist_model(**hp):
if hp.get("use_conv", False):
model = tf.keras.models.Sequential([
# Convolution layers
tf.keras.layers.Conv2D(10, 3, padding="same", input_shape=(28, 28, 1)),
tf.keras.layers.MaxPool2D((2, 2)),
tf.keras.layers.Conv2D(50, 3, padding="same"),
tf.keras.layers.MaxPool2D((2, 2)),
# Dense layers
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(120, activation='relu'),
tf.keras.layers.Dense(84, activation='relu'),
tf.keras.layers.Dense(10)
])
else:
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(120, activation='relu'),
tf.keras.layers.Dense(84, activation='relu'),
tf.keras.layers.Dense(10)
])
return model
The Optimizer Model
It's time to implement the optimizer model. You just have to implement the following methods: "build_model", "fit_dataset_model_" and "score_on_dataset". Those methods must respect their signature and output type. The objective here is to make the building, the training and the score phase depend on some hyper-parameters. So the optimizer can use those to find the best set of hp.
class KerasMNISTHpOptimizer(HpOptimizer):
def build_model(self, **hp) -> tf.keras.Model:
model = get_tf_mnist_model(**hp)
model.compile(
optimizer=tf.keras.optimizers.SGD(
learning_rate=hp.get("learning_rate", 1e-3),
nesterov=hp.get("nesterov", True),
momentum=hp.get("momentum", 0.99),
),
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=[tf.keras.metrics.SparseCategoricalAccuracy()],
)
return model
def fit_dataset_model_(
self,
model: tf.keras.Model,
dataset,
**hp
) -> tf.keras.Model:
history = model.fit(
dataset,
epochs=hp.get("epochs", 1),
verbose=False,
)
return model
def score_on_dataset(
self,
model: tf.keras.Model,
dataset,
**hp
) -> float:
test_loss, test_acc = model.evaluate(dataset, verbose=0)
return test_acc
Execution & Optimization
First thing after creating our classes is to load the dataset in memory.
mnist_train, mnist_test = get_tf_mnist_dataset()
mnist_hp_optimizer = KerasMNISTHpOptimizer()
After you will define your hyper-parameters space with a dictionary like this.
hp_space = dict(
epochs=list(range(1, 16)),
learning_rate=np.linspace(1e-4, 1e-1, 50),
nesterov=[True, False],
momentum=np.linspace(0.01, 0.99, 50),
use_conv=[True, False],
)
It's time to define your hp search algorithm and give it your budget in time and iteration. Here we will test for 10 minutes and 100 iterations maximum.
param_gen = RandomHpSearch(hp_space, max_seconds=60*10, max_itr=100)
Finally, you start the optimization by giving your parameter generator to the optimize method. Note that the "stop_criterion" argument is to stop the optimization when the given score is reached. It's really useful to save some time.
save_kwargs = dict(
save_name=f"tf_mnist_hp_opt",
title="Random search: MNIST",
)
param_gen = mnist_hp_optimizer.optimize_on_dataset(
param_gen, mnist_train, save_kwargs=save_kwargs,
stop_criterion=1.0,
)
Testing
Now, you can test the optimized hyper-parameters by fitting again with the full train dataset. Yes with the full dataset, because in the optimization phase a cross-validation is made which crop your train dataset by half. Plus, it's time to test the fitted model on the test dataset.
opt_hp = param_gen.get_best_param()
model = mnist_hp_optimizer.build_model(**opt_hp)
mnist_hp_optimizer.fit_dataset_model_(
model, mnist_train, **opt_hp
)
test_acc = mnist_hp_optimizer.score_on_dataset(
model, mnist_test, **opt_hp
)
print(f"test_acc: {test_acc*100:.3f}%")
The optimized hyper-parameters:
pp = pprint.PrettyPrinter(indent=4)
pp.pprint(opt_hp)
Visualization
You can visualize the optimization with an interactive html file.
fig = param_gen.write_optimization_to_html(show=True, dark_mode=True, **save_kwargs)
Optimisation table
opt_table = param_gen.get_optimization_table()
Saving ParameterGenerator
param_gen.save_history(**save_kwargs)
save_path = param_gen.save_obj(**save_kwargs)
Loading ParameterGenerator
param_gen = RandomHpSearch.load_obj(save_path)
Re-lunch optimisation with loaded ParameterGenerator
# Change the budget to be able to optimize again
param_gen.max_itr = param_gen.max_seconds + 100
param_gen.max_seconds = param_gen.max_seconds + 60
param_gen = mnist_hp_optimizer.optimize_on_dataset(
param_gen, mnist_train, save_kwargs=save_kwargs,
stop_criterion=1.0, reset_gen=False,
)
opt_hp = param_gen.get_best_param()
print(param_gen.get_optimization_table())
pp.pprint(param_gen.history)
pp.pprint(opt_hp)
Other examples
Examples on how to use this package are in the folder ./examples. There you can find the previous example with Tensorflow and an example with pyTorch.
License
Citation
@article{Gince,
title={Implémentation du module AutoMLpy, un outil d’apprentissage machine automatique},
author={Jérémie Gince},
year={2021},
publisher={ULaval},
url={https://github.com/JeremieGince/AutoMLpy},
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file AutoMLpy-0.0.31.tar.gz
.
File metadata
- Download URL: AutoMLpy-0.0.31.tar.gz
- Upload date:
- Size: 23.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/3.7.3 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.8.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b1bfe1e2ed1601040d117c56d5c9dd452c55f8eaed051add8b68490302fd4c95 |
|
MD5 | 52f8de179336ddba5ea9126d98a781b7 |
|
BLAKE2b-256 | 2ee2501e377f7efefe5fedd91cab15109906c2d57727194fbd15f4c44d8d08fc |
File details
Details for the file AutoMLpy-0.0.31-py3-none-any.whl
.
File metadata
- Download URL: AutoMLpy-0.0.31-py3-none-any.whl
- Upload date:
- Size: 27.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/3.7.3 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.8.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2d580547f15ca39faf8d7e7a3b22454955e8a058d1b52176757ccb2a3d3b1ea3 |
|
MD5 | 90c8e54171c31f48c0202c7cfa4baa9a |
|
BLAKE2b-256 | 1be25480b1161b0f37c8b562f62b57fd620ad0e71575932b324bb0d0368090ee |