hyperactive

A hyperparameter optimization toolbox for convenient and fast prototyping

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

An optimization and data collection toolbox for convenient and fast prototyping of computationally expensive models.

Hyperactive:

is very easy to learn but extremly versatile
provides intelligent optimization algorithms, support for all mayor machine-learning frameworks and many interesting applications
makes optimization data collection simple
visualizes your collected data (> v3.1.0)
saves your computation time
supports parallel computing

Overview • Installation • API reference • Roadmap • Citation • License

What's new?

Overview

Hyperactive features a collection of optimization algorithms that can be used for a variety of optimization problems. The following table shows listings of the capabilities of Hyperactive, where each of the items links to an example:

Optimization Techniques

Tested and Supported Packages

Optimization Applications

Local Search:

Global Search:

Population Methods:

Evolution Strategy

Sequential Methods:

Machine Learning:

Deep Learning:

Parallel Computing:

Feature Engineering:

Machine Learning:

Deep Learning:

Data Collection:

Visualization:

Optimization progress visualization

Miscellaneous:

The examples above are not necessarily done with realistic datasets or training procedures. The purpose is fast execution of the solution proposal and giving the user ideas for interesting usecases.

Hyperactive is very easy to use:

Regular training	Hyperactive
_{from sklearn.model_selection import cross_val_score from sklearn.tree import DecisionTreeRegressor from sklearn.datasets import load_boston data = load_boston() X, y = data.data, data.target gbr = DecisionTreeRegressor(max_depth=10) score = cross_val_score(gbr, X, y, cv=3).mean()}	^{from sklearn.model_selection import cross_val_score from sklearn.tree import DecisionTreeRegressor from sklearn.datasets import load_boston from hyperactive import Hyperactive data = load_boston() X, y = data.data, data.target def model(opt): gbr = DecisionTreeRegressor(max_depth=opt["max_depth"]) return cross_val_score(gbr, X, y, cv=3).mean() search_space = {"max_depth": list(range(3, 25))} hyper = Hyperactive() hyper.add_search(model, search_space, n_iter=50) hyper.run()}

Regular training

Hyperactive

_{from sklearn.model_selection import cross_val_score
from sklearn.tree import DecisionTreeRegressor
from sklearn.datasets import load_boston

data = load_boston()
X, y = data.data, data.target

gbr = DecisionTreeRegressor(max_depth=10)
score = cross_val_score(gbr, X, y, cv=3).mean()}

^{from sklearn.model_selection import cross_val_score
from sklearn.tree import DecisionTreeRegressor
from sklearn.datasets import load_boston
from hyperactive import Hyperactive

data = load_boston()
X, y = data.data, data.target

def model(opt):
gbr = DecisionTreeRegressor(max_depth=opt["max_depth"])
return cross_val_score(gbr, X, y, cv=3).mean()

search_space = {"max_depth": list(range(3, 25))}

hyper = Hyperactive()
hyper.add_search(model, search_space, n_iter=50)
hyper.run()}

Installation

The most recent version of Hyperactive is available on PyPi:

pip install hyperactive

Example

from sklearn.model_selection import cross_val_score
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.datasets import load_boston
from hyperactive import Hyperactive

data = load_boston()
X, y = data.data, data.target

# define the model in a function
def model(opt):
    # pass the suggested parameter to the machine learning model
    gbr = GradientBoostingRegressor(
        n_estimators=opt["n_estimators"]
    )
    scores = cross_val_score(gbr, X, y, cv=3)

    # return a single numerical value, which gets maximized
    return scores.mean()


# search space determines the ranges of parameters you want the optimizer to search through
search_space = {"n_estimators": list(range(10, 200, 5))}

# start the optimization run
hyper = Hyperactive()
hyper.add_search(model, search_space, n_iter=50)
hyper.run()

Hyperactive API reference

Basic Usage

Hyperactive(verbosity, distribution, n_processes)

verbosity = ["progress_bar", "print_results", "print_times"]
- (list, False)
- The verbosity list determines what part of the optimization information will be printed in the command line.

distribution = {"multiprocessing": {"initializer": tqdm.set_lock, "initargs": (tqdm.get_lock(),),}}

(str, dict, callable)

Access the parallel processing in three ways:

Via a str "multiprocessing" or "joblib" to choose one of the two.
Via a dictionary with one key "multiprocessing" or "joblib" and a value that is the input argument of Pool and Parallel. The default argument is a good example of this.
Via your own parallel processing function that will be used instead of those for multiprocessing and joblib. The wrapper-function must work similar to the following two functions:

Multiprocessing:

def multiprocessing_wrapper(process_func, search_processes_paras, **kwargs):
  n_jobs = len(search_processes_paras)

  pool = Pool(n_jobs, **kwargs)
  results = pool.map(process_func, search_processes_paras)

  return results

Joblib:

def joblib_wrapper(process_func, search_processes_paras, **kwargs):
    n_jobs = len(search_processes_paras)

    jobs = [
        delayed(process_func)(**info_dict)
        for info_dict in search_processes_paras
    ]
    results = Parallel(n_jobs=n_jobs, **kwargs)(jobs)

    return results

n_processes = "auto",
- (str, int)
- The maximum number of processes that are allowed to run simultaneously. If n_processes is of int-type there will only run n_processes-number of jobs simultaneously instead of all at once. So if n_processes=10 and n_jobs_total=35, then the schedule would look like this 10 - 10 - 10 - 5. This saves computational resources if there is a large number of n_jobs. If "auto", then n_processes is the sum of all n_jobs (from .add_search(...)).

.add_search(objective_function, search_space, n_iter, optimizer, n_jobs, initialize, max_score, random_state, memory, memory_warm_start)

objective_function
- (callable)
- The objective function defines the optimization problem. The optimization algorithm will try to maximize the numerical value that is returned by the objective function by trying out different parameters from the search space.
search_space
- (dict)
- Defines the space were the optimization algorithm can search for the best parameters for the given objective function.
n_iter
- (int)
- The number of iterations that will be performed during the optimization run. The entire iteration consists of the optimization-step, which decides the next parameter that will be evaluated and the evaluation-step, which will run the objective function with the chosen parameter and return the score.
optimizer = "default"
- (object)
- Instance of optimization class that can be imported from Hyperactive. "default" corresponds to the random search optimizer. The following classes can be imported and used:
  - HillClimbingOptimizer
  - StochasticHillClimbingOptimizer
  - RepulsingHillClimbingOptimizer
  - RandomSearchOptimizer
  - RandomRestartHillClimbingOptimizer
  - RandomAnnealingOptimizer
  - SimulatedAnnealingOptimizer
  - ParallelTemperingOptimizer
  - ParticleSwarmOptimizer
  - EvolutionStrategyOptimizer
  - BayesianOptimizer
  - TreeStructuredParzenEstimators
  - DecisionTreeOptimizer
  - EnsembleOptimizer
- Example:
```
...

opt_hco = HillClimbingOptimizer(epsilon=0.08)
hyper = Hyperactive()
hyper.add_search(..., optimizer=opt_hco)
hyper.run()

...
```
n_jobs = 1
- (int)
- Number of jobs to run in parallel. Those jobs are optimization runs that work independent from another (no information sharing). If n_jobs == -1 the maximum available number of cpu cores is used.
initialize = {"grid": 4, "random": 2, "vertices": 4}
- (dict)
- The initialization dictionary automatically determines a number of parameters that will be evaluated in the first n iterations (n is the sum of the values in initialize). The initialize keywords are the following:
  - grid
    - Initializes positions in a grid like pattern. Positions that cannot be put into a grid are randomly positioned.
  - vertices
    - Initializes positions at the vertices of the search space. Positions that cannot be put into a new vertex are randomly positioned.
  - random
    - Number of random initialized positions
  - warm_start
    - List of parameter dictionaries that marks additional start points for the optimization run.
  Example:
```
... 
search_space = {
    "x1": list(range(10, 150, 5)),
    "x2": list(range(2, 12)),
}

ws1 = {"x1": 10, "x2": 2}
ws2 = {"x1": 15, "x2": 10}

hyper = Hyperactive()
hyper.add_search(
    model,
    search_space,
    n_iter=30,
    initialize={"grid": 4, "random": 10, "vertices": 4, "warm_start": [ws1, ws2]},
)
hyper.run()
```
max_score = None
- (float, None)
- Maximum score until the optimization stops. The score will be checked after each completed iteration.
random_state = None
- (int, None)
- Random state for random processes in the random, numpy and scipy module.
memory = True
- (bool)
- Whether or not to use the "memory"-feature. The memory is a dictionary, which gets filled with parameters and scores during the optimization run. If the optimizer encounters a parameter that is already in the dictionary it just extracts the score instead of reevaluating the objective function (which can take a long time).

memory_warm_start = None

(pandas dataframe, None)

Pandas dataframe that contains score and parameter information that will be automatically loaded into the memory-dictionary.

example:

score	x1	x2	x...
0.756	0.1	0.2	...
0.823	0.3	0.1	...
...	...	...	...
...	...	...	...

.run(max_time)

max_time = None
- (float, None)
- Maximum number of seconds until the optimization stops. The time will be checked after each completed iteration.

Special Parameters

Objective Function

Each iteration consists of two steps:

The optimization step: decides what position in the search space (parameter set) to evaluate next
The evaluation step: calls the objective function, which returns the score for the given position in the search space

The objective function has one argument that is often called "para", "params" or "opt". This argument is your access to the parameter set that the optimizer has selected in the corresponding iteration.

def objective_function(opt):
    # get x1 and x2 from the argument "opt"
    x1 = opt["x1"]
    x2 = opt["x1"]

    # calculate the score with the parameter set
    score = -(x1 * x1 + x2 * x2)

    # return the score
    return score

The objective function always needs a score, which shows how "good" or "bad" the current parameter set is. But you can also return some additional information with a dictionary:

def objective_function(opt):
    x1 = opt["x1"]
    x2 = opt["x1"]

    score = -(x1 * x1 + x2 * x2)

    other_info = {
      "x1 squared" : x1**2,
      "x2 squared" : x2**2,
    }

    return score, other_info

When you take a look at the results (a pandas dataframe with all iteration information) after the run has ended you will see the additional information in it. The reason we need a dictionary for this is because Hyperactive needs to know the names of the additonal parameters. The score does not need that, because it is always called "score" in the results. You can run this example script if you want to give it a try.

Search Space Dictionary

The search space defines what values the optimizer can select during the search. These selected values will be inside the objective function argument and can be accessed like in a dictionary. The values in each search space dimension should always be in a list. If you use np.arange you should put it in a list afterwards:

search_space = {
    "x1": list(np.arange(-100, 101, 1)),
    "x2": list(np.arange(-100, 101, 1)),
}

A special feature of Hyperactive is shown in the next example. You can put not just numeric values into the search space dimensions, but also strings and functions. This enables a very high flexibility in how you can create your studies.

def func1():
  # do stuff
  return stuff


def func2():
  # do stuff
  return stuff


search_space = {
    "x": list(np.arange(-100, 101, 1)),
    "str": ["a string", "another string"],
    "function" : [func1, func2],
}

If you want to put other types of variables (like numpy arrays, pandas dataframes, lists, ...) into the search space you can do that via functions:

def array1():
  return np.array([0, 1, 2])


def array2():
  return np.array([0, 1, 2])


search_space = {
    "x": list(np.arange(-100, 101, 1)),
    "str": ["a string", "another string"],
    "numpy_array" : [array1, array2],
}

The functions contain the numpy arrays and returns them. This way you can use them inside the objective function.

Optimizer Classes

Each of the following optimizer classes can be initialized and passed to the "add_search"-method via the "optimizer"-argument. During this initialization the optimizer class accepts additional paramters. You can read more about each optimization-strategy and its parameters in the Optimization Tutorial.

HillClimbingOptimizer
RepulsingHillClimbingOptimizer
SimulatedAnnealingOptimizer
RandomSearchOptimizer
RandomRestartHillClimbingOptimizer
RandomAnnealingOptimizer
ParallelTemperingOptimizer
ParticleSwarmOptimizer
EvolutionStrategyOptimizer
BayesianOptimizer
TreeStructuredParzenEstimators
DecisionTreeOptimizer

Progress Board

The progress board enables the visualization of search data during the optimization run. This will help you to understand what is happening during the optimization and give an overview of the explored parameter sets and scores.

filter_file
- (None, True)
- If the filter_file-parameter is True Hyperactive will create a file in the current directory, which allows the filtering of parameters or the score by setting an upper or lower bound.

The following script provides an example:

from sklearn.model_selection import cross_val_score
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.datasets import load_boston

from hyperactive import Hyperactive
# import the ProgressBoard
from hyperactive.dashboards import ProgressBoard

data = load_boston()
X, y = data.data, data.target


def model(opt):
    gbr = GradientBoostingRegressor(
        n_estimators=opt["n_estimators"],
        max_depth=opt["max_depth"],
        min_samples_split=opt["min_samples_split"],
    )
    scores = cross_val_score(gbr, X, y, cv=3)

    return scores.mean()


search_space = {
    "n_estimators": list(range(50, 150, 5)),
    "max_depth": list(range(2, 12)),
    "min_samples_split": list(range(2, 22)),
}

# create an instance of the ProgressBoard
progress_board = ProgressBoard()

hyper = Hyperactive()

# pass the instance of the ProgressBoard to .add_search(...)
hyper.add_search(
    model,
    search_space,
    n_iter=120,
    progress_board=progress_board,
)

# a terminal will open, which opens a dashboard in your browser
hyper.run()

Result Attributes

.best_para(objective_function)

objective_function
- (callable)
returnes: dictionary
Parameter dictionary of the best score of the given objective_function found in the previous optimization run.

example:
```
{
  'x1': 0.2, 
  'x2': 0.3,
}
```

.best_score(objective_function)

objective_function
- (callable)
returns: int or float
Numerical value of the best score of the given objective_function found in the previous optimization run.

.results(objective_function)

objective_function
- (callable)
returns: Pandas dataframe

The dataframe contains score, parameter information, iteration times and evaluation times of the given objective_function found in the previous optimization run.

example:

score	x1	x2	x...	eval_times	iter_times
0.756	0.1	0.2	...	0.953	1.123
0.823	0.3	0.1	...	0.948	1.101
...	...	...	...	...	...
...	...	...	...	...	...

Roadmap

v2.0.0 :heavy_check_mark:

Change API

v2.1.0 :heavy_check_mark:

Save memory of evaluations for later runs (long term memory)
Warm start sequence based optimizers with long term memory
Gaussian process regressors from various packages (gpy, sklearn, GPflow, ...) via wrapper

v2.2.0 :heavy_check_mark:

Add basic dataset meta-features to long term memory
Add helper-functions for memory
- connect two different model/dataset hashes
- split two different model/dataset hashes
- delete memory of model/dataset
- return best known model for dataset
- return search space for best model
- return best parameter for best model

v2.3.0 :heavy_check_mark:

Tree-structured Parzen Estimator
Decision Tree Optimizer
add "max_sample_size" and "skip_retrain" parameter for sbom to decrease optimization time

v3.0.0 :heavy_check_mark:

New API
- expand usage of objective-function
- No passing of training data into Hyperactive
- Removing "long term memory"-support (better to do in separate package)
- More intuitive selection of optimization strategies and parameters
- Separate optimization algorithms into other package
- expand api so that optimizer parameter can be changed at runtime
- add extensive testing procedure (similar to Gradient-Free-Optimizers)

v3.1.0 :heavy_check_mark:

Decouple number of runs from active processes (Thanks to PartiallyTyped)

Next Features

"long term memory" for search-data storage and usage
Data collector tool to use inside the objective function
Dashboard for visualization of stored search-data
Dashboard for visualization of search-data at runtime (progress-bar)

Experimental algorithms

The following algorithms are of my own design and, to my knowledge, do not yet exist in the technical literature. If any of these algorithms already exist I would like you to share it with me in an issue.

Random Annealing

A combination between simulated annealing and random search.

FAQ

Known Errors + Solutions

Read this before opening a bug-issue

Are you sure the bug is located in Hyperactive?

Look at the error message from the command line. If one of the last messages look like this:

File "/.../gradient_free_optimizers/...", line ...

Then you should post the bug report in:

https://github.com/SimonBlanke/Gradient-Free-Optimizers

Otherwise you can post the bug report in Hyperactive

MemoryError: Unable to allocate ... for an array with shape (...)

This is expected of the current implementation of smb-optimizers. For all Sequential model based algorithms you have to keep your eyes on the search space size:

search_space_size = 1
for value_ in search_space.values():
    search_space_size *= len(value_)

print("search_space_size", search_space_size)

Reduce the search space size to resolve this error.

TypeError: cannot pickle '_thread.RLock' object

Setting distribution to "joblib" may fix this problem:

hyper = Hyperactive(distribution="joblib")

Command line full of warnings

Very often warnings from sklearn or numpy. Those warnings do not correlate with bad performance from Hyperactive. Your code will most likely run fine. Those warnings are very difficult to silence.

Put this at the very top of your script:

def warn(*args, **kwargs):
    pass


import warnings

warnings.warn = warn

References

[dto] Scikit-Optimize

Citing Hyperactive

@Misc{hyperactive2019,
  author =   {{Simon Blanke}},
  title =    {{Hyperactive}: An optimization and data collection toolbox for convenient and fast prototyping of computationally expensive models.},
  howpublished = {\url{https://github.com/SimonBlanke}},
  year = {since 2019}
}

License

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

4.6.0

Oct 24, 2023

4.5.0

Aug 27, 2023

4.4.3

May 23, 2023

4.4.2

May 22, 2023

4.4.1

Apr 11, 2023

4.4.0

Mar 1, 2023

4.3.1

Jan 4, 2023

4.3.0

Nov 18, 2022

4.2.0

May 4, 2022

4.1.1

Mar 16, 2022

4.1.0

Mar 16, 2022

4.0.2

Dec 5, 2021

4.0.1

Dec 2, 2021

4.0.0

Dec 1, 2021

3.3.3

Nov 28, 2021

3.3.2

Oct 2, 2021

3.3.1

Aug 30, 2021

3.3.0

Aug 29, 2021

This version

3.2.3

Jul 2, 2021

3.2.2

Jul 1, 2021

3.2.1

Jun 23, 2021

3.1.0

May 19, 2021

3.1.0a2 pre-release

Jan 31, 2021

3.1.0a1 pre-release

Jan 30, 2021

3.0.6

Mar 23, 2021

3.0.5.1

Mar 1, 2021

3.0.5

Feb 26, 2021

3.0.4

Feb 7, 2021

3.0.3

Jan 18, 2021

3.0.2

Jan 14, 2021

3.0.1

Jan 6, 2021

3.0.0

Dec 30, 2020

2.3.1

Oct 21, 2020

2.3.0

Mar 22, 2020

2.2.3

Feb 26, 2020

2.2.2

Feb 22, 2020

2.2.1

Feb 22, 2020

2.2.0

Feb 21, 2020

2.1.0

Feb 4, 2020

2.0.1

Dec 16, 2019

2.0.0

Dec 15, 2019

2.0.0b0 pre-release

Dec 7, 2019

1.1.4

Nov 27, 2019

1.1.3

Nov 8, 2019

1.1.2

Nov 6, 2019

1.1.1

Oct 9, 2019

1.1.0

Oct 8, 2019

1.0.0

Sep 25, 2019

0.5.0

Sep 23, 2019

0.4.2

Sep 9, 2019

0.4.1.8

Sep 8, 2019

0.4.1.7

Sep 8, 2019

0.4.1.6

Aug 28, 2019

0.4.1.5

Aug 20, 2019

0.4.1.4

Aug 6, 2019

0.4.1.3

Aug 1, 2019

0.4.1.2

Jul 31, 2019

0.4.1.1

Jul 31, 2019

0.4.1

Jul 31, 2019

0.4.0

Jul 23, 2019

0.3.5

Jul 18, 2019

0.3.4

Jul 10, 2019

0.3.3

Jul 6, 2019

0.3.2

Jul 4, 2019

0.3.1

Jul 2, 2019

0.3.0

Jun 30, 2019

0.2.3

Jun 29, 2019

0.2.2

Jun 22, 2019

0.2.1

Jun 21, 2019

0.2.0

Jun 15, 2019

0.1.4

Jun 9, 2019

0.1.3

Jun 8, 2019

0.1.2

Jun 5, 2019

0.1.1

Jun 4, 2019

0.1

May 28, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

hyperactive-3.2.3-py3-none-any.whl (49.6 kB view hashes)

Uploaded Jul 2, 2021 Python 3

Hashes for hyperactive-3.2.3-py3-none-any.whl

Hashes for hyperactive-3.2.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`59ebbc22804f83c41c6fcc136a64bd161e55b1566341761745864818e309f36c`
MD5	`78d21206a4677abff60bf5aa0821e0a5`
BLAKE2b-256	`aa86ba0c466c059386b2f7be2897c48ed6442bc7da8223371d45b3af1ee1d5ec`

hyperactive 3.2.3

Navigation

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Project description

An optimization and data collection toolbox for convenient and fast prototyping of computationally expensive models.

Hyperactive:

Overview • Installation • API reference • Roadmap • Citation • License

What's new?

Hyperactive Tutorial

Neural Architecture Search Tutorial

Overview

Hyperactive features a collection of optimization algorithms that can be used for a variety of optimization problems. The following table shows listings of the capabilities of Hyperactive, where each of the items links to an example:

Hyperactive is very easy to use:

Installation

Example

Hyperactive API reference

Basic Usage

Special Parameters

Result Attributes

Roadmap

Experimental algorithms

Random Annealing

FAQ

Known Errors + Solutions

References

[dto] Scikit-Optimize

Citing Hyperactive

License

Project details

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution