Hyperparameter optimization for multiple machine learning algorithms using Optuna, with Scikit-learn API
Project description
OptuML: Hyperparameter Optimization for Multiple Machine Learning Algorithms using Optuna
⣰⡁ ⡀⣀ ⢀⡀ ⣀⣀ ⡎⢱ ⣀⡀ ⣰⡀ ⡀⢀ ⡷⢾ ⡇ ⠄ ⣀⣀ ⣀⡀ ⢀⡀ ⡀⣀ ⣰⡀ ⡎⢱ ⣀⡀ ⣰⡀ ⠄ ⣀⣀ ⠄ ⣀⣀ ⢀⡀ ⡀⣀
⢸ ⠏ ⠣⠜ ⠇⠇⠇ ⠣⠜ ⡧⠜ ⠘⠤ ⠣⠼ ⠇⠸ ⠧⠤ ⠇ ⠇⠇⠇ ⡧⠜ ⠣⠜ ⠏ ⠘⠤ ⠣⠜ ⡧⠜ ⠘⠤ ⠇ ⠇⠇⠇ ⠇ ⠴⠥ ⠣⠭ ⠏
OptuML
(for Optuna and ML) is a Python module that provides hyperparameter optimization for several machine learning algorithms using the Optuna framework. The module supports a variety of algorithms and allows easy hyperparameter tuning through a scikit-learn-like API.
Input OptuML train Predict
┌─────────────────┐ ┌──────────────────────────────────┐ ┌─────────────────────────────┐
│X_train, y_train ┼────► clf = Optimizer(algorithm="SVC") ├───► y_pred = clf.predict(X_test)│
└─────────────────┘ │ clf.fit(X_train, y_train) │ │ │
┌─────────────────┐ └─▲────────────────────────────────┘ └─────────────────────────▲───┘
│ML algorithm ├──────┘ │
└─────────────────┘ X_test───┘
Features
- Multiple Algorithms: Supports hyperparameter optimization for the following algorithms:
- Scikit-learn zoo, plus:
- CatBoost
- XGBoost
- Optuna Framework: Leverages Optuna for powerful hyperparameter search.
- Maximize or Minimize: Allows setting the optimization direction (
maximize
orminimize
). - Scikit-learn API: Provides a consistent interface for
fit()
,predict()
,predict_proba()
, andscore()
methods. - Control Output: Optionally run Optuna with granular verbosity settings (
verbose
asbool
orint
). - Cross-validation: Easily integrate cross-validation with custom scoring metrics (e.g., accuracy, ROC AUC).
Installation
a) pip
pip install optuml
or upgrade with:
pip install optuml --upgrade
b) Manual way
You can install the required packages via pip
:
pip install optuna scikit-learn catboost xgboost numpy wrapt_timeout_decorator
Next just fetch the optuml.py
file from the repo and put it in the directory with your script.
Usage
Basic Example
Here’s how you can use the Optimizer
class to optimize hyperparameters for different machine learning algorithms using the Iris dataset:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from OptuML import Optimizer
# Load the Iris dataset
X, y = load_iris(return_X_y=True)
# Split the dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Instantiate the optimizer for SVC
optimizer = Optimizer(algorithm="SVC", n_trials=50, cv=3, scoring="accuracy", verbose=True)
# Fit the optimizer to the training data
optimizer.fit(X_train, y_train)
# Predict on the test set
y_pred = optimizer.predict(X_test)
# Calculate the accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy}")
# Print the best hyperparameters found during optimization
print(f"Best Hyperparameters: {optimizer.best_params_}")
Available Algorithms
The Optimizer
class supports the following algorithms. You can specify the algorithm
parameter to choose which one to use:
Classifiers
Algorithm | Type |
---|---|
AdaBoostClassifier |
AdaBoost Classifier |
CatBoostClassifier |
CatBoost Classifier |
GaussianNB |
Gaussian Naive Bayes |
KNeighborsClassifier |
k-Nearest Neighbors Classifier |
MLPClassifier |
Multi-layer Perceptron Classifier |
RandomForestClassifier |
Random Forest Classifier |
SVC |
Support Vector Classifier |
XGBClassifier |
XGBoost Classifier |
QDA |
Quadratic Discriminant Analysis |
Regressors
Algorithm | Type |
---|---|
AdaBoostRegressor |
AdaBoost Regressor |
CatBoostRegressor |
CatBoost Regressor |
KNeighborsRegressor |
k-Nearest Neighbors Regressor |
MLPRegressor |
Multi-layer Perceptron Regressor |
RandomForestRegressor |
Random Forest Regressor |
SVR |
Support Vector Regressor |
XGBRegressor |
XGBoost Regressor |
Controlling Verbosity
You can control the verbosity of Optuna's output by using the verbose
parameter:
- Set
verbose=True
for standard logging. - Use an
int
value to specify more granular verbosity levels (e.g.,optuna.logging.DEBUG
).
optimizer = Optimizer(algorithm="SVC", n_trials=50, cv=3, scoring="accuracy", verbose=True)
- and/or show a progress bar:
optimizer = Optimizer(algorithm="SVC", n_trials=50, cv=3, scoring="accuracy", show_progress_bar=True)
API Reference
Optimizer
Parameters
algorithm
(str
): The machine learning algorithm to optimize.direction
(str
, default"maximize"
): Direction of optimization. Can be"maximize"
or"minimize"
.verbose
(bool
orint
, defaultFalse
): Controls Optuna's verbosity.n_trials
(int
, default100
): Number of optimization trials to run.timeout
(float
, optional): Maximum time (in seconds) for the optimization process.cv
(int
, default5
): Number of cross-validation folds.scoring
(str
, default"accuracy"
): Scoring metric to use during cross-validation.random_state
(int
, optional): Seed for random number generation.cv_timeout
(int
, default120
) Timeout for a signle cv process within a trial
Methods
fit(X, y)
: Fit the model using hyperparameter optimization.predict(X)
: Make predictions using the best model found during optimization.predict_proba(X)
: Predict class probabilities (if supported by the model).score(X, y)
: Score the model using the test data.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.