Hyperparameter optimization for multiple machine learning algorithms using Optuna, with Scikit-learn API

These details have not been verified by PyPI

Project links

Homepage

Project description

OptuML: Hyperparameter Optimization for Multiple Machine Learning Algorithms using Optuna

  ⣰⡁ ⡀⣀ ⢀⡀ ⣀⣀    ⡎⢱ ⣀⡀ ⣰⡀ ⡀⢀ ⡷⢾ ⡇    ⠄ ⣀⣀  ⣀⡀ ⢀⡀ ⡀⣀ ⣰⡀   ⡎⢱ ⣀⡀ ⣰⡀ ⠄ ⣀⣀  ⠄ ⣀⣀ ⢀⡀ ⡀⣀
  ⢸  ⠏  ⠣⠜ ⠇⠇⠇   ⠣⠜ ⡧⠜ ⠘⠤ ⠣⠼ ⠇⠸ ⠧⠤   ⠇ ⠇⠇⠇ ⡧⠜ ⠣⠜ ⠏  ⠘⠤   ⠣⠜ ⡧⠜ ⠘⠤ ⠇ ⠇⠇⠇ ⠇ ⠴⠥ ⠣⠭ ⠏

OptuML is a Python module that provides hyperparameter optimization for several machine learning algorithms using the Optuna framework. The module supports a variety of algorithms and allows easy hyperparameter tuning through a scikit-learn-like API.

OptuML: Hyperparameter Optimization for Multiple Machine Learning Algorithms using Optuna

Quick start

from OptuML import Optimizer

# Instantiate the optimizer for SVM
optimizer = Optimizer(algorithm="SVM", n_trials=50, cv=3, scoring="accuracy", verbose=True)

# Fit the optimizer to the training data
optimizer.fit(X_train, y_train)

# Predict on the test set
y_pred = optimizer.predict(X_test)

Features

Multiple Algorithms: Supports hyperparameter optimization for the following algorithms:
- Support Vector Machine (SVM)
- k-Nearest Neighbors (kNN)
- Random Forest
- CatBoost
- XGBoost
- Logistic Regression
- Decision Tree
Optuna Framework: Leverages Optuna for powerful hyperparameter search.
Maximize or Minimize: Allows setting the optimization direction (maximize or minimize).
Scikit-learn API: Provides a consistent interface for fit(), predict(), predict_proba(), and score() methods.
Control Output: Optionally run Optuna with verbose logging (verbose=True) or in silent mode (verbose=False).
Cross-validation: Easily integrate cross-validation with custom scoring metrics (e.g., accuracy, ROC AUC).

Installation

Prerequisites

Ensure that the following Python packages are installed:

optuna
scikit-learn
catboost
xgboost
numpy

You can install the required packages via pip:

pip install optuna scikit-learn catboost xgboost numpy

Installation

Just fetch the optuml.py file from the repo and put it in the directory with your script.

Usage

Basic Example

Here’s how you can use the Optimizer class to optimize hyperparameters for different machine learning algorithms using the Iris dataset:

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from OptuML import Optimizer

# Load the Iris dataset
X, y = load_iris(return_X_y=True)

# Convert to a binary classification problem
X = X[y != 2]
y = y[y != 2]

# Split the dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Instantiate the optimizer for SVM
optimizer = Optimizer(algorithm="SVM", n_trials=50, cv=3, scoring="accuracy", verbose=True, random_state=42)

# Fit the optimizer to the training data
optimizer.fit(X_train, y_train)

# Predict on the test set
y_pred = optimizer.predict(X_test)

# Calculate the accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy}")

# Print the best hyperparameters found during optimization
print(f"Best Hyperparameters: {optimizer.best_params_}")

Available Algorithms

The Optimizer class supports the following algorithms. You can specify the algorithm parameter to choose which one to use:

"SVM" (Support Vector Machine)
"kNN" (k-Nearest Neighbors)
"RandomForest" (Random Forest)
"CatBoost" (CatBoost Classifier)
"XGBoost" (XGBoost Classifier)
"LogisticRegression" (Logistic Regression)
"DecisionTree" (Decision Tree Classifier)

Example of Optimizing Different Algorithms

You can iterate over different algorithms and optimize hyperparameters for each one:

algorithms = ['SVM', 'kNN', 'RandomForest', 'CatBoost', 'XGBoost', 'LogisticRegression', 'DecisionTree']

for algorithm in algorithms:
    print(f"Optimizing {algorithm}")
    optimizer = Optimizer(algorithm=algorithm, n_trials=50, cv=3, scoring="accuracy", verbose=False, random_state=42)
    optimizer.fit(X_train, y_train)
    
    # Predict and calculate accuracy
    y_pred = optimizer.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    print(f"Accuracy for {algorithm}: {accuracy}")
    print(f"Best Hyperparameters for {algorithm}: {optimizer.best_params_}")
    print("="*60)

Custom Scoring and Direction

You can also optimize for different scoring metrics like ROC AUC or use the direction parameter to minimize or maximize the objective:

# Instantiate the optimizer for RandomForest with ROC AUC optimization
optimizer = Optimizer(algorithm="RandomForest", n_trials=50, cv=3, scoring="roc_auc", direction="maximize", verbose=True, random_state=42)

# Fit the optimizer to the training data
optimizer.fit(X_train, y_train)

# Predict probabilities and calculate ROC AUC
y_proba = optimizer.predict_proba(X_test)
roc_auc = roc_auc_score(y_test, y_proba[:, 1])
print(f"ROC AUC: {roc_auc}")

# Print the best hyperparameters
print(f"Best Hyperparameters: {optimizer.best_params_}")

Controlling Verbosity

You can control the verbosity of Optuna's output by using the verbose parameter:

Set verbose=True to enable detailed Optuna logging.
Set verbose=False to suppress logging and run the optimizer silently.

optimizer = Optimizer(algorithm="SVM", n_trials=50, cv=3, scoring="accuracy", verbose=True, random_state=42)

API Reference

`Optimizer`

Parameters:

algorithm (str): The machine learning algorithm to optimize. Options are 'SVM', 'kNN', 'RandomForest', 'CatBoost', 'XGBoost', 'LogisticRegression', 'DecisionTree'.
direction (str, default "maximize"): Direction of optimization. Can be "maximize" or "minimize".
verbose (bool, default False): If True, Optuna logging will be enabled. If False, the optimizer will run silently.
n_trials (int, default 100): Number of optimization trials to run.
timeout (float, optional): Maximum time (in seconds) for the optimization process.
cv (int, default 5): Number of cross-validation folds.
scoring (str, default "accuracy"): Scoring metric to use during cross-validation.
random_state (int, optional): Seed for random number generation.

Methods:

fit(X, y): Fit the model using hyperparameter optimization.
predict(X): Make predictions using the best model found during optimization.
predict_proba(X): Predict class probabilities (if supported by the model).
score(X, y): Score the model using the test data.
optimization_time(): Get the total time taken for optimization.

Contributing

Contributions to the project are welcome! Please feel free to submit issues or pull requests on GitHub. You can also fork the repository and make your changes.

Running Tests

You can run unit tests using pytest:

pip install pytest
pytest

Contact

If you have any questions or feedback, feel free to open an issue on GitHub.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.2.3

Sep 27, 2024

0.2.2

Sep 16, 2024

0.2.1

Sep 13, 2024

0.2.0

Sep 13, 2024

0.1.4

Sep 12, 2024

0.1.3

Sep 12, 2024

0.1.2

Sep 12, 2024

This version

0.1.0

Sep 12, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

optuml-0.1.0.tar.gz (8.3 kB view hashes)

Uploaded Sep 12, 2024 Source

Built Distribution

optuml-0.1.0-py3-none-any.whl (8.7 kB view hashes)

Uploaded Sep 12, 2024 Python 3

Hashes for optuml-0.1.0.tar.gz

Hashes for optuml-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`e40261c56d20823c1df652cd64201b22205e1a6743e4379402d6152508e29584`
MD5	`8aab7e33def84c3fac4362f1eec13a8a`
BLAKE2b-256	`81c072afe863c9ad82ba92bbc0c0380d6f0953d64d8abbcf555cb67e6e89cb70`

Hashes for optuml-0.1.0-py3-none-any.whl

Hashes for optuml-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ca52bb9a3d0e6474cbc2479d80a01ef0e9aeaf0b101f5e74def4317cacee7bd2`
MD5	`57d90cbaf6ade0eb28b54f6b57eb0e2b`
BLAKE2b-256	`52af7f7ddfd7c1d7a685e60e57cc03bbcceaf1a60e9b7a8593d88a6632d587ab`