Dask parallelized bayesian optimization toolbox
Project description
Scikit-Optimize-Adapter (Adapter: “A DAsk Parallel TunER”) is an efficient light weight library built on top of Scikit-Optimize and Dask that lets the user do Bayesian optimization hyperparameter tuning with different schemes of parallelized cross-validations.
Install
pip install --index-url https://test.pypi.org/simple/ --no-deps scikit-optimize-adapter --upgrade
Getting started
This is a very quick and basic tutorial. More detailed tutorials will be written soon!
Let’s start with the below dummy training data:
import pandas as pd
import numpy as np
group_col = np.asarray([1]*10 + [2]*10 + [3]*10).reshape(-1, 1)
data = np.arange(30*4).reshape(30, 4)
data = np.hstack((data, group_col))
df = pd.DataFrame(data=data, columns=['target', 'f1', 'f2', 'f3', 'groups'])
features = ['f1', 'f2', 'f3']
groupby = 'groups'
target = 'target'
K = 5
orderby=None
num_partition=None
window_size=None
from skopt.space import Space, Categorical, Integer, Real, Dimension
space = [Real(0.5, 10), # learning rate (learn_rate)
Real(0, 1), # gamma (min_split_improvement)
Integer(3, 4), # max_depth (max_depth)
Integer(11, 13), # n_estimators (ntrees)
Integer(2, 4), # min_child_weight (min_rows)
Real(0, 1), # colsample_bytree (col_sample_rate_per_tree)
Real(0, 1)] # subsample (sample_rate)
Adapter is shipped with XGBoost regressor and classifier, but you can pass in a callable estimator of your design if you wish to customize it.
from adapter import Adapter
adapt = Adapter(df, features, target, K, groupby=None,
cross_validation_scheme='random_shuffle',
search_method="bayesian_optimization",
estimator="xgboost_regression")
Try copying the link to the web browser to check out the dask dashboard: http://127.0.0.1:8789/status.
You can visualize the Dask delayed computation graph:
delayed_graph = adapt.construct_delayed_graph(num_iter=3, search_space=space) # we will set n_iter to 3 to make visualizing manageable.
delayed_graph.visualize()
Let’s run the code:
res = adapt.run(num_initial=5, num_iter=15, search_space=space)
While it runs, checkout the dashboard again, and click on the Graph tab. You will see the above computation graph being worked on in real time!
Now you can retrieve the results:
adapt.plot_improvements() # to show the improvements
optimal_params = adapt.get_optimal_params() # which you can use to train your final model
If you are running this in a local machine, you must take responsibility of removing the temporary directory:
adapt.cleanup()
Todo:
rest of the cross validation schemes
testing hard thresholded submit process (and testing speed without it)
supervised encodings
add unit tests
continuous integration set up
random search method
multi GPU environment
documentations
~~getting the results of the optimization~~
~~visualization of optimizations~~
early stop criterion using callbacks
~~beta readme.rst for install and tutorial~~
full readme.rst for install and tutorial
periodic training
bayesian warm start training
dependency managements
active per worker threadpool managements
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for scikit-optimize-adapter-0.0b0.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 603067d77b9085b4efb3bffb9a8842d20037c0add9989a6df667ba030a326f5b |
|
MD5 | ef638fdc8aa7990a15fe7fe12786ae35 |
|
BLAKE2b-256 | 18f0f6c99440de47ace27f047d408c763ec1fb99593567cd6077067fdf31e28b |
Hashes for scikit_optimize_adapter-0.0b0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ba1c6d898068a6224144048396f2d2707e98f7b3db8c06ceb904f566150a8213 |
|
MD5 | b8623a60e819a095cf653c0ba0d2b296 |
|
BLAKE2b-256 | b72a93ba5b51cffb2e3c5299ec20825c4458d006935955657e4e36a64c21593c |