A hyperparameter optimization toolbox for convenient and fast prototyping
Project description
An optimization and data collection toolbox for convenient and fast prototyping of computationally expensive models.
Hyperactive:

is very easy to learn but extremly versatile

provides intelligent optimization algorithms, support for all mayor machinelearning frameworks and many interesting applications

makes optimization data collection simple

saves your computation time

supports parallel computing
As its name suggests Hyperactive started as a hyperparameter optimization package, but it has been generalized to solve expensive gradientfree optimization problems. It uses the GradientFreeOptimizers package as an optimizationbackend and expands on it with additional features and tools.
Overview • Installation • API reference • Roadmap • Citation • License
What's new?

27.08.2023 v4.5.0 add earlystopping for Optimization Strategies

01.03.2023 v4.4.0 add new feature: "Optimization Strategies"

18.11.2022 v4.3.0 with three new optimization algorithms (Spiral Optimization, Lipschitz Optimizer, DIRECT Optimizer)
Overview
Hyperactive features a collection of optimization algorithms that can be used for a variety of optimization problems. The following table shows examples of its capabilities:
Optimization Techniques  Tested and Supported Packages  Optimization Applications 
Local Search:
Global Search:
Population Methods: Sequential Methods: 
Machine Learning:
Deep Learning: Parallel Computing:

Feature Engineering:

The examples above are not necessarily done with realistic datasets or training procedures. The purpose is fast execution of the solution proposal and giving the user ideas for interesting usecases.
Sideprojects and Tools
The following packages are designed to support Hyperactive and expand its use cases.
Package  Description 

SearchDataCollector  Simple tool to save searchdata during or after the optimization run into csvfiles. 
SearchDataExplorer  Visualize searchdata with plotly inside a streamlit dashboard. 
If you want news about Hyperactive and related projects you can follow me on twitter.
Notebooks and Tutorials
Installation
The most recent version of Hyperactive is available on PyPi:
pip install hyperactive
Example
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.datasets import load_diabetes
from hyperactive import Hyperactive
data = load_diabetes()
X, y = data.data, data.target
# define the model in a function
def model(opt):
# pass the suggested parameter to the machine learning model
gbr = GradientBoostingRegressor(
n_estimators=opt["n_estimators"], max_depth=opt["max_depth"]
)
scores = cross_val_score(gbr, X, y, cv=4)
# return a single numerical value
return scores.mean()
# search space determines the ranges of parameters you want the optimizer to search through
search_space = {
"n_estimators": list(range(10, 150, 5)),
"max_depth": list(range(2, 12)),
}
# start the optimization run
hyper = Hyperactive()
hyper.add_search(model, search_space, n_iter=50)
hyper.run()
Hyperactive API reference
Basic Usage
Hyperactive(verbosity, distribution, n_processes)

verbosity = ["progress_bar", "print_results", "print_times"]
 Possible parameter types: (list, False)
 The verbosity list determines what part of the optimization information will be printed in the command line.

distribution = "multiprocessing"
 Possible parameter types: ("multiprocessing", "joblib", "pathos")
 Determine, which distribution service you want to use. Each library uses different packages to pickle objects:
 multiprocessing uses pickle
 joblib uses dill
 pathos uses cloudpickle

n_processes = "auto",
 Possible parameter types: (str, int)
 The maximum number of processes that are allowed to run simultaneously. If n_processes is of inttype there will only run n_processesnumber of jobs simultaneously instead of all at once. So if n_processes=10 and n_jobs_total=35, then the schedule would look like this 10  10  10  5. This saves computational resources if there is a large number of n_jobs. If "auto", then n_processes is the sum of all n_jobs (from .add_search(...)).
.add_search(objective_function, search_space, n_iter, optimizer, n_jobs, initialize, pass_through, callbacks, catch, max_score, early_stopping, random_state, memory, memory_warm_start)

objective_function
 Possible parameter types: (callable)
 The objective function defines the optimization problem. The optimization algorithm will try to maximize the numerical value that is returned by the objective function by trying out different parameters from the search space.

search_space
 Possible parameter types: (dict)
 Defines the space were the optimization algorithm can search for the best parameters for the given objective function.

n_iter
 Possible parameter types: (int)
 The number of iterations that will be performed during the optimization run. The entire iteration consists of the optimizationstep, which decides the next parameter that will be evaluated and the evaluationstep, which will run the objective function with the chosen parameter and return the score.

optimizer = "default"

Possible parameter types: ("default", initialized optimizer object)

Instance of optimization class that can be imported from Hyperactive. "default" corresponds to the random search optimizer. The imported optimization classes from hyperactive are different from gfo. They only accept optimizerspecificparameters. The following classes can be imported and used:
 HillClimbingOptimizer
 StochasticHillClimbingOptimizer
 RepulsingHillClimbingOptimizer
 SimulatedAnnealingOptimizer
 DownhillSimplexOptimizer
 RandomSearchOptimizer
 GridSearchOptimizer
 RandomRestartHillClimbingOptimizer
 RandomAnnealingOptimizer
 PowellsMethod
 PatternSearch
 ParallelTemperingOptimizer
 ParticleSwarmOptimizer
 SpiralOptimization
 EvolutionStrategyOptimizer
 BayesianOptimizer
 LipschitzOptimizer
 DirectAlgorithm
 TreeStructuredParzenEstimators
 ForestOptimizer

Example:
... opt_hco = HillClimbingOptimizer(epsilon=0.08) hyper = Hyperactive() hyper.add_search(..., optimizer=opt_hco) hyper.run() ...


n_jobs = 1
 Possible parameter types: (int)
 Number of jobs to run in parallel. Those jobs are optimization runs that work independent from another (no information sharing). If n_jobs == 1 the maximum available number of cpu cores is used.

initialize = {"grid": 4, "random": 2, "vertices": 4}

Possible parameter types: (dict)

The initialization dictionary automatically determines a number of parameters that will be evaluated in the first n iterations (n is the sum of the values in initialize). The initialize keywords are the following:

grid
 Initializes positions in a grid like pattern. Positions that cannot be put into a grid are randomly positioned. For very high dimensional search spaces (>30) this pattern becomes random.

vertices
 Initializes positions at the vertices of the search space. Positions that cannot be put into a new vertex are randomly positioned.

random
 Number of random initialized positions

warm_start
 List of parameter dictionaries that marks additional start points for the optimization run.
Example:
... search_space = { "x1": list(range(10, 150, 5)), "x2": list(range(2, 12)), } ws1 = {"x1": 10, "x2": 2} ws2 = {"x1": 15, "x2": 10} hyper = Hyperactive() hyper.add_search( model, search_space, n_iter=30, initialize={"grid": 4, "random": 10, "vertices": 4, "warm_start": [ws1, ws2]}, ) hyper.run()



pass_through = {}

Possible parameter types: (dict)

The pass_through accepts a dictionary that contains information that will be passed to the objectivefunction argument. This information will not change during the optimization run, unless the user does so by himself (within the objectivefunction).
Example:
... def objective_function(para): para.pass_through["stuff1"] # < this variable is 1 para.pass_through["stuff2"] # < this variable is 2 score = para["x1"] * para["x1"] return score pass_through = { "stuff1": 1, "stuff2": 2, } hyper = Hyperactive() hyper.add_search( model, search_space, n_iter=30, pass_through=pass_through, ) hyper.run()


callbacks = {}

Possible parameter types: (dict)

The callbacks enables you to pass functions to hyperactive that are called every iteration during the optimization run. The function has access to the same argument as the objectivefunction. You can decide if the functions are called before or after the objectivefunction is evaluated via the keys of the callbacksdictionary. The values of the dictionary are lists of the callbackfunctions. The following example should show they way to use callbacks:
Example:
... def callback_1(access): # do some stuff def callback_2(access): # do some stuff def callback_3(access): # do some stuff hyper = Hyperactive() hyper.add_search( objective_function, search_space, n_iter=100, callbacks={ "after": [callback_1, callback_2], "before": [callback_3] }, ) hyper.run()


catch = {}

Possible parameter types: (dict)

The catch parameter provides a way to handle exceptions that occur during the evaluation of the objectivefunction or the callbacks. It is a dictionary that accepts the exception class as a key and the score that is returned instead as the value. This way you can handle multiple types of exceptions and return different scores for each. In the case of an exception it often makes sense to return
np.nan
as a score. You can see an example of this in the following codesnippet:Example:
... hyper = Hyperactive() hyper.add_search( objective_function, search_space, n_iter=100, catch={ ValueError: np.nan, }, ) hyper.run()


max_score = None
 Possible parameter types: (float, None)
 Maximum score until the optimization stops. The score will be checked after each completed iteration.

early_stopping=None

(dict, None)

Stops the optimization run early if it did not achive any scoreimprovement within the last iterations. The early_stoppingparameter enables to set three parameters:
n_iter_no_change
: Nonoptional intparameter. This marks the last n iterations to look for an improvement over the iterations that came before n. If the best score of the entire run is within those last n iterations the run will continue (until other stopping criteria are met), otherwise the run will stop.tol_abs
: Optional floatparamter. The score must have improved at least this absolute tolerance in the last n iterations over the best score in the iterations before n. This is an absolute value, so 0.1 means an imporvement of 0.8 > 0.9 is acceptable but 0.81 > 0.9 would stop the run.tol_rel
: Optional floatparamter. The score must have imporved at least this relative tolerance (in percentage) in the last n iterations over the best score in the iterations before n. This is a relative value, so 10 means an imporvement of 0.8 > 0.88 is acceptable but 0.8 > 0.87 would stop the run.

random_state = None

Possible parameter types: (int, None)

Random state for random processes in the random, numpy and scipy module.


memory = "share"
 Possible parameter types: (bool, "share")
 Whether or not to use the "memory"feature. The memory is a dictionary, which gets filled with parameters and scores during the optimization run. If the optimizer encounters a parameter that is already in the dictionary it just extracts the score instead of reevaluating the objective function (which can take a long time). If memory is set to "share" and there are multiple jobs for the same objective function then the memory dictionary is automatically shared between the different processes.

memory_warm_start = None

Possible parameter types: (pandas dataframe, None)

Pandas dataframe that contains score and parameter information that will be automatically loaded into the memorydictionary.
example:
score x1 x2 x... 0.756 0.1 0.2 ... 0.823 0.3 0.1 ... ... ... ... ... ... ... ... ...

.run(max_time)
 max_time = None
 Possible parameter types: (float, None)
 Maximum number of seconds until the optimization stops. The time will be checked after each completed iteration.
Special Parameters
Objective Function
Each iteration consists of two steps:
 The optimization step: decides what position in the search space (parameter set) to evaluate next
 The evaluation step: calls the objective function, which returns the score for the given position in the search space
The objective function has one argument that is often called "para", "params", "opt" or "access". This argument is your access to the parameter set that the optimizer has selected in the corresponding iteration.
def objective_function(opt):
# get x1 and x2 from the argument "opt"
x1 = opt["x1"]
x2 = opt["x2"]
# calculate the score with the parameter set
score = (x1 * x1 + x2 * x2)
# return the score
return score
The objective function always needs a score, which shows how "good" or "bad" the current parameter set is. But you can also return some additional information with a dictionary:
def objective_function(opt):
x1 = opt["x1"]
x2 = opt["x2"]
score = (x1 * x1 + x2 * x2)
other_info = {
"x1 squared" : x1**2,
"x2 squared" : x2**2,
}
return score, other_info
When you take a look at the results (a pandas dataframe with all iteration information) after the run has ended you will see the additional information in it. The reason we need a dictionary for this is because Hyperactive needs to know the names of the additonal parameters. The score does not need that, because it is always called "score" in the results. You can run this example script if you want to give it a try.
Search Space Dictionary
The search space defines what values the optimizer can select during the search. These selected values will be inside the objective function argument and can be accessed like in a dictionary. The values in each search space dimension should always be in a list. If you use np.arange you should put it in a list afterwards:
search_space = {
"x1": list(np.arange(100, 101, 1)),
"x2": list(np.arange(100, 101, 1)),
}
A special feature of Hyperactive is shown in the next example. You can put not just numeric values into the search space dimensions, but also strings and functions. This enables a very high flexibility in how you can create your studies.
def func1():
# do stuff
return stuff
def func2():
# do stuff
return stuff
search_space = {
"x": list(np.arange(100, 101, 1)),
"str": ["a string", "another string"],
"function" : [func1, func2],
}
If you want to put other types of variables (like numpy arrays, pandas dataframes, lists, ...) into the search space you can do that via functions:
def array1():
return np.array([1, 2, 3])
def array2():
return np.array([3, 2, 1])
search_space = {
"x": list(np.arange(100, 101, 1)),
"str": ["a string", "another string"],
"numpy_array" : [array1, array2],
}
The functions contain the numpy arrays and returns them. This way you can use them inside the objective function.
Optimizer Classes
Each of the following optimizer classes can be initialized and passed to the "add_search"method via the "optimizer"argument. During this initialization the optimizer class accepts only optimizerspecificparamters (no random_state, initialize, ... ):
optimizer = HillClimbingOptimizer(epsilon=0.1, distribution="laplace", n_neighbours=4)
for the default parameters you can just write:
optimizer = HillClimbingOptimizer()
and pass it to Hyperactive:
hyper = Hyperactive()
hyper.add_search(model, search_space, optimizer=optimizer, n_iter=100)
hyper.run()
So the optimizerclasses are different from GradientFreeOptimizers. A more detailed explanation of the optimizationalgorithms and the optimizerspecificparamters can be found in the Optimization Tutorial.
 HillClimbingOptimizer
 RepulsingHillClimbingOptimizer
 SimulatedAnnealingOptimizer
 DownhillSimplexOptimizer
 RandomSearchOptimizer
 GridSearchOptimizer
 RandomRestartHillClimbingOptimizer
 RandomAnnealingOptimizer
 PowellsMethod
 PatternSearch
 ParallelTemperingOptimizer
 ParticleSwarmOptimizer
 EvolutionStrategyOptimizer
 BayesianOptimizer
 TreeStructuredParzenEstimators
 ForestOptimizer
Result Attributes
.best_para(objective_function)

objective_function
 (callable)

returnes: dictionary

Parameter dictionary of the best score of the given objective_function found in the previous optimization run.
example:
{ 'x1': 0.2, 'x2': 0.3, }
.best_score(objective_function)
 objective_function
 (callable)
 returns: int or float
 Numerical value of the best score of the given objective_function found in the previous optimization run.
.search_data(objective_function, times=False)

objective_function
 (callable)

returns: Pandas dataframe

The dataframe contains score and parameter information of the given objective_function found in the optimization run. If the parameter
times
is set to True the evaluation and iteration times are added to the dataframe.example:
score x1 x2 x... 0.756 0.1 0.2 ... 0.823 0.3 0.1 ... ... ... ... ... ... ... ... ...
Roadmap
v2.0.0 :heavy_check_mark:
 Change API
v2.1.0 :heavy_check_mark:
 Save memory of evaluations for later runs (long term memory)
 Warm start sequence based optimizers with long term memory
 Gaussian process regressors from various packages (gpy, sklearn, GPflow, ...) via wrapper
v2.2.0 :heavy_check_mark:
 Add basic dataset metafeatures to long term memory
 Add helperfunctions for memory
 connect two different model/dataset hashes
 split two different model/dataset hashes
 delete memory of model/dataset
 return best known model for dataset
 return search space for best model
 return best parameter for best model
v2.3.0 :heavy_check_mark:
 Treestructured Parzen Estimator
 Decision Tree Optimizer
 add "max_sample_size" and "skip_retrain" parameter for sbom to decrease optimization time
v3.0.0 :heavy_check_mark:
 New API
 expand usage of objectivefunction
 No passing of training data into Hyperactive
 Removing "long term memory"support (better to do in separate package)
 More intuitive selection of optimization strategies and parameters
 Separate optimization algorithms into other package
 expand api so that optimizer parameter can be changed at runtime
 add extensive testing procedure (similar to GradientFreeOptimizers)
v3.1.0 :heavy_check_mark:
 Decouple number of runs from active processes (Thanks to PartiallyTyped)
v3.2.0 :heavy_check_mark:
 Dashboard for visualization of searchdata at runtime via streamlit (ProgressBoard)
v3.3.0 :heavy_check_mark:
 Early stopping
 Shared memory dictionary between processes with the same objective function
v4.0.0 :heavy_check_mark:
 small adjustments to API
 move optimization strategies into submodule "optimizers"
 preparation for future add ons (longtermmemory, metalearn, ...) from separate repositories
 separate progress board into separate repository
v4.1.0 :heavy_check_mark:
 add python 3.9 to testing
 add pass_throughparameter
 add v1 GFO optimization algorithms
v4.2.0 :heavy_check_mark:
 add callbacksparameter
 add catchparameter
 add option to add eval and iter times to searchdata
v4.3.0 :heavy_check_mark:
 add new features from GFO
 add Spiral Optimization
 add Lipschitz Optimizer
 add DIRECT Optimizer
 print the random seed for reproducibility
v4.4.0 :heavy_check_mark:
 add OptimizationStrategies
 redesign progressbar
v4.5.0 :heavy_check_mark:
 add early stopping feature to custom optimization strategies
 display additional outputs from objectivefunction in results in commandline
 add type hints to hyperactiveapi
v4.6.0
 add support for constrained optimization
v4.7.0
 add "prune_search_space"method to custom optimization strategy class
FAQ
Known Errors + Solutions
Read this before opening a bugissue

Are you sure the bug is located in Hyperactive?
The error might be located in the optimizationbackend. Look at the error message from the command line. If one of the last messages look like this:
 File "/.../gradient_free_optimizers/...", line ...
Then you should post the bug report in:
Otherwise you can post the bug report in Hyperactive 
Do you have the correct Hyperactive version?
Every major version update (e.g. v2.2 > v3.0) the API of Hyperactive changes. Check which version of Hyperactive you have. If your major version is older you have two options:
Recommended: You could just update your Hyperactive version with:
pip install hyperactive upgrade
This way you can use all the new documentation and examples from the current repository.
Or you could continue using the old version and use an old repository branch as documentation. You can do that by selecting the corresponding branch. (top right of the repository. The default is "master" or "main") So if your major version is older (e.g. v2.1.0) you can select the 2.x.x branch to get the old repository for that version.

Provide example code for error reproduction To understand and fix the issue I need an example code to reproduce the error. I must be able to just copy the code into a pyfile and execute it to reproduce the error.
MemoryError: Unable to allocate ... for an array with shape (...)
This is expected of the current implementation of smboptimizers. For all Sequential model based algorithms you have to keep your eyes on the search space size:
search_space_size = 1
for value_ in search_space.values():
search_space_size *= len(value_)
print("search_space_size", search_space_size)
Reduce the search space size to resolve this error.
TypeError: cannot pickle '_thread.RLock' object
This is because you have classes and/or nontoplevel objects in the search space. Pickle (used by multiprocessing) cannot serialize them. Setting distribution to "joblib" or "pathos" may fix this problem:
hyper = Hyperactive(distribution="joblib")
Command line full of warnings
Very often warnings from sklearn or numpy. Those warnings do not correlate with bad performance from Hyperactive. Your code will most likely run fine. Those warnings are very difficult to silence.
It should help to put this at the very top of your script:
def warn(*args, **kwargs):
pass
import warnings
warnings.warn = warn
Warning: Not enough initial positions for population size
This warning occurs because Hyperactive needs more initial positions to choose from to generate a population for the optimization algorithm:
The number of initial positions is determined by the initialize
parameter in the add_search
method.
# This is how it looks per default
initialize = {"grid": 4, "random": 2, "vertices": 4}
# You could set it to this for a maximum population of 20
initialize = {"grid": 4, "random": 12, "vertices": 4}
References
[dto] ScikitOptimize
Citing Hyperactive
@Misc{hyperactive2021,
author = {{Simon Blanke}},
title = {{Hyperactive}: An optimization and data collection toolbox for convenient and fast prototyping of computationally expensive models.},
howpublished = {\url{https://github.com/SimonBlanke}},
year = {since 2019}
}
License
Project details
Release history Release notifications  RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Hashes for hyperactive4.6.0py3noneany.whl
Algorithm  Hash digest  

SHA256  00c23d497b7d4302aa4675ababf52292e83b9440c1c2d6cb5961f6e0c1a6b1ca 

MD5  b80f532a4e7358be13088cbdf3fcfc86 

BLAKE2b256  1cb2725c23aaef98b657374883bac6adb8982794198050490a04d2f6bb2b96f5 