A drop-in replacement for Scikit-Learn’s GridSearchCV / RandomizedSearchCV with cutting edge hyperparameter tuning techniques.

Project description

tune-sklearn

Tune-sklearn is a drop-in replacement for Scikit-Learn’s model selection module (GridSearchCV, RandomizedSearchCV) with cutting edge hyperparameter tuning techniques.

Features

Here’s what tune-sklearn has to offer:

Consistency with Scikit-Learn API: Change less than 5 lines in a standard Scikit-Learn script to use the API [example].
Modern tuning techniques: tune-sklearn allows you to easily leverage Bayesian Optimization, HyperBand, BOHB, and other optimization techniques by simply toggling a few parameters.
Framework support: tune-sklearn is used primarily for tuning Scikit-Learn models, but it also supports and provides examples for many other frameworks with Scikit-Learn wrappers such as Skorch (Pytorch) [example], KerasClassifier (Keras) [example], and XGBoostClassifier (XGBoost) [example].
Scale up: Tune-sklearn leverages Ray Tune, a library for distributed hyperparameter tuning, to parallelize cross validation on multiple cores and even multiple machines without changing your code.

Check out our API Documentation and Walkthrough (for master branch).

Installation

Dependencies

numpy (>=1.16)
ray
scikit-learn (>=0.23)

User Installation

pip install tune-sklearn ray[tune]

pip install -U git+https://github.com/ray-project/tune-sklearn.git && pip install 'ray[tune]'

Tune-sklearn Early Stopping

For certain estimators, tune-sklearn can also immediately enable incremental training and early stopping. Such estimators include:

Estimators that implement 'warm_start' (except for ensemble classifiers and decision trees)
Estimators that implement partial fit
XGBoost, LightGBM and CatBoost models (via incremental learning)

To read more about compatible scikit-learn models, see scikit-learn's documentation at section 8.1.1.3.

Early stopping algorithms that can be enabled include HyperBand and Median Stopping (see below for examples).

If the estimator does not support partial_fit, a warning will be shown saying early stopping cannot be done and it will simply run the cross-validation on Ray's parallel back-end.

Apart from early stopping scheduling algorithms, tune-sklearn also supports passing custom stoppers to Ray Tune. These can be passed via the stopper argument when instantiating TuneSearchCV or TuneGridSearchCV. See the Ray documentation for an overview of available stoppers.

Examples

TuneGridSearchCV

To start out, it’s as easy as changing our import statement to get Tune’s grid search cross validation interface, and the rest is almost identical!

TuneGridSearchCV accepts dictionaries in the format { param_name: str : distribution: list } or a list of such dictionaries, just like scikit-learn's GridSearchCV. The distribution can also be the output of Ray Tune's tune.grid_search.

# from sklearn.model_selection import GridSearchCV
from tune_sklearn import TuneGridSearchCV

# Other imports
import numpy as np
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import SGDClassifier

# Set training and validation sets
X, y = make_classification(n_samples=11000, n_features=1000, n_informative=50, n_redundant=0, n_classes=10, class_sep=2.5)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=1000)

# Example parameters to tune from SGDClassifier
parameters = {
    'alpha': [1e-4, 1e-1, 1],
    'epsilon':[0.01, 0.1]
}

tune_search = TuneGridSearchCV(
    SGDClassifier(),
    parameters,
    early_stopping="MedianStoppingRule",
    max_iters=10
)

import time # Just to compare fit times
start = time.time()
tune_search.fit(X_train, y_train)
end = time.time()
print("Tune Fit Time:", end - start)
pred = tune_search.predict(X_test)
accuracy = np.count_nonzero(np.array(pred) == np.array(y_test)) / len(pred)
print("Tune Accuracy:", accuracy)

If you'd like to compare fit times with sklearn's GridSearchCV, run the following block of code:

from sklearn.model_selection import GridSearchCV
# n_jobs=-1 enables use of all cores like Tune does
sklearn_search = GridSearchCV(
    SGDClassifier(),
    parameters,
    n_jobs=-1
)

start = time.time()
sklearn_search.fit(X_train, y_train)
end = time.time()
print("Sklearn Fit Time:", end - start)
pred = sklearn_search.predict(X_test)
accuracy = np.count_nonzero(np.array(pred) == np.array(y_test)) / len(pred)
print("Sklearn Accuracy:", accuracy)

TuneSearchCV

TuneSearchCV is an upgraded version of scikit-learn's RandomizedSearchCV.

It also provides a wrapper for several search optimization algorithms from Ray Tune's tune.suggest, which in turn are wrappers for other libraries. The selection of the search algorithm is controlled by the search_optimization parameter. In order to use other algorithms, you need to install the libraries they depend on (pip install column). The search algorithms are as follows:

Algorithm	`search_optimization` value	Summary	Website	`pip install`
(Random Search)	`"random"`	Randomized Search		built-in
SkoptSearch	`"bayesian"`	Bayesian Optimization	[Scikit-Optimize]	`scikit-optimize`
HyperOptSearch	`"hyperopt"`	Tree-Parzen Estimators	[HyperOpt]	`hyperopt`
TuneBOHB	`"bohb"`	Bayesian Opt/HyperBand	[BOHB]	`hpbandster ConfigSpace`
Optuna	`"optuna"`	Tree-Parzen Estimators	[Optuna]	`optuna`

All algorithms other than RandomListSearcher accept parameter distributions in the form of dictionaries in the format { param_name: str : distribution: tuple or list }.

Tuples represent real distributions and should be two-element or three-element, in the format (lower_bound: float, upper_bound: float, Optional: "uniform" (default) or "log-uniform"). Lists represent categorical distributions. Ray Tune Search Spaces are also supported and provide a rich set of potential distributions. Search spaces allow for users to specify complex, potentially nested search spaces and parameter distributions. Furthermore, each algorithm also accepts parameters in their own specific format. More information in Tune documentation.

Random Search (default) accepts dictionaries in the format { param_name: str : distribution: list } or a list of such dictionaries, just like scikit-learn's RandomizedSearchCV.

from tune_sklearn import TuneSearchCV

# Other imports
import scipy
from ray import tune
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import SGDClassifier

# Set training and validation sets
X, y = make_classification(n_samples=11000, n_features=1000, n_informative=50, n_redundant=0, n_classes=10, class_sep=2.5)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=1000)

# Example parameter distributions to tune from SGDClassifier
# Note the use of tuples instead if non-random optimization is desired
param_dists = {
    'loss': ['squared_hinge', 'hinge'], 
    'alpha': (1e-4, 1e-1, 'log-uniform'),
    'epsilon': (1e-2, 1e-1)
}

bohb_tune_search = TuneSearchCV(SGDClassifier(),
    param_distributions=param_dists,
    n_trials=2,
    max_iters=10,
    search_optimization="bohb"
)

bohb_tune_search.fit(X_train, y_train)

# Define the `param_dists using the SearchSpace API
# This allows the specification of sampling from discrete and 
# categorical distributions (below for the `learning_rate` scheduler parameter)
param_dists = {
    'loss': tune.choice(['squared_hinge', 'hinge']),
    'alpha': tune.loguniform(1e-4, 1e-1),
    'epsilon': tune.uniform(1e-2, 1e-1),
}


hyperopt_tune_search = TuneSearchCV(SGDClassifier(),
    param_distributions=param_dists,
    n_trials=2,
    early_stopping=True, # uses Async HyperBand if set to True
    max_iters=10,
    search_optimization="hyperopt"
)

hyperopt_tune_search.fit(X_train, y_train)

Other Machine Learning Libraries and Examples

Tune-sklearn also supports the use of other machine learning libraries such as Pytorch (using Skorch) and Keras. You can find these examples here:

More information

Ray Tune

Project details

Release history Release notifications | RSS feed

0.5.0

Nov 6, 2023

0.4.6

Jun 30, 2023

0.4.5

Nov 14, 2022

0.4.4

Oct 5, 2022

0.4.3

Apr 22, 2022

0.4.2

Apr 4, 2022

0.4.1

Sep 15, 2021

0.4.0

Jul 5, 2021

This version

0.3.0

May 12, 2021

0.2.1

Dec 23, 2020

0.2.0

Dec 19, 2020

0.2.0rc0 pre-release

Dec 17, 2020

0.1.0

Sep 12, 2020

0.0.9

Sep 11, 2020

0.0.8

Sep 11, 2020

0.0.7

Jul 28, 2020

0.0.6

Jul 18, 2020

0.0.5

Jul 15, 2020

0.0.4

Jul 7, 2020

0.0.3

Jul 2, 2020

0.0.2

Jun 28, 2020

0.0.1

May 29, 2020

0.0.0

Mar 13, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tune_sklearn-0.3.0.tar.gz (32.1 kB view details)

Uploaded May 12, 2021 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tune_sklearn-0.3.0-py3-none-any.whl (39.8 kB view details)

Uploaded May 12, 2021 Python 3

File details

Details for the file tune_sklearn-0.3.0.tar.gz.

File metadata

Download URL: tune_sklearn-0.3.0.tar.gz
Upload date: May 12, 2021
Size: 32.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.57.0 CPython/3.7.9

File hashes

Hashes for tune_sklearn-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`ccc11539f8378776cf76480a92812ed746d93aa0048d3b999f9aa60c0909855f`
MD5	`9c75628614014a0fa0156e68101b3fd2`
BLAKE2b-256	`f1157198c568e77a8a3488d5f1ef30b50f7ccb2e2e5122c50838a10fafb22e1a`

See more details on using hashes here.

File details

Details for the file tune_sklearn-0.3.0-py3-none-any.whl.

File metadata

Download URL: tune_sklearn-0.3.0-py3-none-any.whl
Upload date: May 12, 2021
Size: 39.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.57.0 CPython/3.7.9

File hashes

Hashes for tune_sklearn-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d64bbcc0c888158eb02b46f5a5366d194837fcf2b2f63858a9841871a60d594d`
MD5	`aa123454e8a7550511c30394499b1167`
BLAKE2b-256	`a3ab51c5384ae892cf985b0c1f7e468767af53102b20ca631d2e3764cb2908dc`

See more details on using hashes here.

tune-sklearn 0.3.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

tune-sklearn

Features

Installation

Dependencies

User Installation

Tune-sklearn Early Stopping

Examples

TuneGridSearchCV

TuneSearchCV

Other Machine Learning Libraries and Examples

More information

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes