Dask parallelized bayesian optimization toolbox
Project description
ScikitOptimizeAdapter (Adapter: “A DAsk Parallel TunER”) is an efficient light weight library built on top of ScikitOptimize and Dask that lets the user do Bayesian optimization hyperparameter tuning with different schemes of parallelized crossvalidations.
Install
pip install indexurl https://test.pypi.org/simple/ nodeps scikitoptimizeadapter upgrade
Getting started
Let’s start with the below dummy training data:
import pandas as pd import numpy as np data = np.arange(30*4).reshape(30, 4) df = pd.DataFrame(data=data, columns=['target', 'f1', 'f2', 'f3']) features = ['f1', 'f2', 'f3'] target = 'target' K = 5 orderby=None num_partition=None window_size=None from skopt.space import Space, Categorical, Integer, Real, Dimension space = [Real(0.5, 10), # learning rate (learn_rate) Real(0, 1), # gamma (min_split_improvement) Integer(3, 4), # max_depth (max_depth) Integer(11, 13), # n_estimators (ntrees) Integer(2, 4), # min_child_weight (min_rows) Real(0, 1), # colsample_bytree (col_sample_rate_per_tree) Real(0, 1)] # subsample (sample_rate)
Adapter is shipped with XGBoost regressor and classifier, but you can pass in a callable estimator of your design if you wish to customize it.
from adapter import Adapter adapt = Adapter(df, features, target, K, groupby=None, cross_validation_scheme='random_shuffle', search_method="bayesian_optimization", estimator="xgboost_regression") # "xgboost_regression" or "xgboost_classification" or callable estimator (more on this later)
Try copying the link to the web browser to check out the dask dashboard: http://127.0.0.1:8789/status.
You can visualize the Dask delayed computation graph:
delayed_graph = adapt.construct_delayed_graph(num_iter=3, search_space=space) # we will set n_iter to 3 to make visualizing manageable. delayed_graph.visualize()
Let’s run the code. num_initial is the number of random initial searches and num_iter is the total number of search steps taken, including the num_initial step counts. (Example: num_initial=5, num_iter=15 means 5 random search and 10 Bayesian search)
res = adapt.run(num_initial=5, num_iter=15, search_space=space)
While it runs, checkout the dashboard again, and click on the Graph tab. You will see the above computation graph being worked on in real time!
Now you can retrieve the results:
adapt.plot_improvements() # to show the improvements optimal_params = adapt.get_optimal_params() # which you can use to train your final model
If you are running this in a local machine, you must take responsibility of removing the temporary directory:
adapt.cleanup()
Crossvalidation schemes
There are 5 different crossvalidation schemes supported by the adapter:
 random_shuffle: create K crossvalidation folds from randomly shuffled rows
 Default mode for most regression tasks .
 ordered: create K crossvalidation folds after sorting the train data by a certain column
 Used for regression tasks where data has time series nature with high temporal autocorrelation.
 Must supply orderby argument.
 binary_classification: create K crossvalidation folds where positive/negative label proportion is preserved
 Used for classification task.
 This mode will preserve the positive and negative label proportions in each fold.
 stratified_sampling: create K crossvalidation folds such that the skew distribution of response is preserved
 Used for regression task where the continuous response variable is highly skewed.
 This mode will preserve the skew distribution of the response values by sampling from stratification.
 Must supply num_partition argument.
 expanding_window: mainly for time series modeling
 Refer to:
Tuning for multiple models in parallel
Again, let’s take a look at a specific example data:
import pandas as pd import numpy as np group_col = np.asarray([1]*10 + [2]*10 + [3]*10 + [4]*10 + [5]*10 + [6]*10).reshape(1, 1) # this time we have a column specifying group data = np.arange(60*4).reshape(60, 4) data = np.hstack((data, group_col)) df = pd.DataFrame(data=data, columns=['target', 'f1', 'f2', 'f3', 'groups']) features = ['f1', 'f2', 'f3'] target = 'target' K = 5 orderby=None num_partition=None window_size=None from skopt.space import Space, Categorical, Integer, Real, Dimension space = [Real(0.5, 10), # learning rate (learn_rate) Real(0, 1), # gamma (min_split_improvement) Integer(3, 4), # max_depth (max_depth) Integer(11, 13), # n_estimators (ntrees) Integer(2, 4), # min_child_weight (min_rows) Real(0, 1), # colsample_bytree (col_sample_rate_per_tree) Real(0, 1)] # subsample (sample_rate)
We can tune the models for each group by passing by groupby argument.
from adapter import Adapter adapt = Adapter(df, features, target, K, groupby='groups', cross_validation_scheme='random_shuffle', search_method="bayesian_optimization", estimator="xgboost_regression")
Run the adapter the same way:
res = adapt.run(num_initial=5, num_iter=15, search_space=space)
You can visualize the Dask delayed computation graph:
Passing in an arbitrary callable estimator
You can pass in an arbitrary callable estimator as long as it implements the standard scikitlearn estimator API:
from abc import ABCMeta, abstractmethod class BaseEstimator(object, metaclass=ABCMeta): """ Base class for all Algorithm classes. """ def __init__(self, **kwargs): pass @abstractmethod def fit(self, X, y, params): pass @abstractmethod def score(self, X, y): pass @abstractmethod def predict(self, X): pass
For example, we can even do something like:
from adapter import BaseEstimator # import BaseEstimator! class DummyEstimator(BaseEstimator): def __init__(self): pass def fit(self, train_X, train_y, params): a = len(train_X)/10. for i in range(int(a*5000000)): i + 1 print(len(train_X), len(train_y)) def score(self, validation_X, validation_y): print(len(validation_X), len(validation_y)) return 1.5 def predict(self, test_X): return len(test_X) my_estimator = DummyEstimator()
Then you can use it with the Adapter:
from adapter import Adapter adapt = Adapter(df, features, target, K, groupby='groups', cross_validation_scheme='random_shuffle', search_method="bayesian_optimization", estimator=my_estimator) # your own estimator
Tuning multiple models with highly skewed training data sizes
When the data size for each group is highly skewed, a suboptimal resource allocation can occur. In this case, it is more advantageous to throttle the feeding of delayed graphs to the Dask client by using multiple thread instances. Let’s again look at an example case:
import pandas as pd import numpy as np import time group_col = np.asarray([1]*100 + [2]*2 + [3]*2 + [4]*2 + [5]*2 + [6]*2 + [7]*2 + [8]*2 + [9]*2 + [16]*2 + [26]*2 + [17]*2 + [18]*2 + [19]*2 + [116]*2 + [126]*2).reshape(1, 1) data = np.arange(130*4).reshape(130, 4) data = np.hstack((data, group_col)) df = pd.DataFrame(data=data, columns=['target', 'f1', 'f2', 'f3', 'groups']) features = ['f1', 'f2', 'f3'] groupby = 'groups' target = 'target' K = 5 from adapter import BaseEstimator # import BaseEstimator! class DummyEstimator(BaseEstimator): def __init__(self): pass def fit(self, train_X, train_y, params): a = len(train_X)/10. for i in range(int(a*5000000)): i + 1 print(len(train_X), len(train_y)) def score(self, validation_X, validation_y): print(len(validation_X), len(validation_y)) return 1.5 def predict(self, test_X): return len(test_X) my_estimator = DummyEstimator() orderby=None num_partition=None window_size=None from skopt.space import Space, Categorical, Integer, Real, Dimension space = [Real(0.5, 10), # learning rate (learn_rate) Real(0, 1), # gamma (min_split_improvement) Integer(3, 4), # max_depth (max_depth) Integer(11, 13), # n_estimators (ntrees) Integer(2, 4), # min_child_weight (min_rows) Real(0, 1), # colsample_bytree (col_sample_rate_per_tree) Real(0, 1)] # subsample (sample_rate)
In such a case, we use run_with_threads method call, where we pass an additional argument of num_threads:
from adapter import Adapter adapt = Adapter(df, features, target, K, groupby='groups', cross_validation_scheme='random_shuffle', search_method="bayesian_optimization", estimator=my_estimator) # your own estimator res = adapt.run(num_initial=5, num_iter=15, search_space=space, num_threads=2) # num_threads
You can check fromt the Dask dashboard that only two delayed computation graphs are worked on at the same time, achieving a dynamic resource allocation in effect:
Todo:
 rest of the cross validation schemes
 testing hard thresholded submit process (and testing speed without it)
 supervised encodings
 add unit tests
 continuous integration set up
 random search method
 multi GPU environment
 documentations
 ~~getting the results of the optimization~~
 ~~visualization of optimizations~~
 early stop criterion using callbacks
 ~~beta readme.rst for install and tutorial~~
 full readme.rst for install and tutorial
 periodic training
 bayesian warm start training
 dependency managements
 active per worker threadpool managements
Project details
Release history Release notifications
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Filename, size  File type  Python version  Upload date  Hashes 

Filename, size scikit_optimize_adapter0.0b1py3noneany.whl (18.7 kB)  File type Wheel  Python version py3  Upload date  Hashes View 
Filename, size scikitoptimizeadapter0.0b1.tar.gz (17.1 kB)  File type Source  Python version None  Upload date  Hashes View 
Hashes for scikit_optimize_adapter0.0b1py3noneany.whl
Algorithm  Hash digest  

SHA256  63c326ef19f5e7d224a4788a6163652310e50b04d5f0f66236fb3de9290cddd9 

MD5  2ceb84c3bfb28201a18c2bbf9f44f2af 

BLAKE2256  9a15716fd396be237cf66e778f320cbedc607c4555a51375065428897cda33a8 
Hashes for scikitoptimizeadapter0.0b1.tar.gz
Algorithm  Hash digest  

SHA256  52295d72b378e60ff94ce350332adb2bedbf6335ad8779e2258c21eb81345837 

MD5  70869bd0cc34167e6a73d73b5bc10a79 

BLAKE2256  ab1ac2cd40e6d34273d8d3ce702d2b8e7bc98e9bf3cb366f89497834548fb42c 