Skip to main content

Multi-fidelity optimisation

Project description

Certainly! Here's the updated README with badges (tags) included at the top and the license updated to Apache 2.0.


focus_opt: Multi-Fidelity Hyperparameter Optimization

PyPI version Python versions License Downloads Documentation Status Build Status Code Style

Introduction

focus-opt is a Python package for performing multi-fidelity hyperparameter optimization on machine learning models. It implements optimization algorithms such as Hill Climbing and Genetic Algorithms with support for multi-fidelity evaluations. This allows for efficient exploration of hyperparameter spaces by evaluating configurations at varying levels of fidelity, balancing computational cost and optimization accuracy.

The package is designed to be flexible and extensible, enabling users to define custom hyperparameter spaces and evaluation functions for different machine learning models. In this guide, we'll demonstrate how to install and use focus-opt with a Decision Tree Classifier on the Breast Cancer Wisconsin dataset.

Installation

You can install focus-opt directly from PyPI:

pip install focus-opt

Alternatively, if you want to work with the latest version from the repository:

git clone https://github.com/eliottkalfon/focus_opt.git
cd focus_opt
pip install .

It's recommended to use a virtual environment to manage dependencies.

Creating a Virtual Environment (Optional)

Create a virtual environment using venv:

python -m venv venv

Activate the virtual environment:

  • On Windows:

    venv\Scripts\activate
    
  • On Unix or Linux:

    source venv/bin/activate
    

Using focus-opt with a Decision Tree Classifier

Below is an example of how to use focus-opt to perform hyperparameter optimization on a Decision Tree Classifier using both Hill Climbing and Genetic Algorithms.

import logging
from typing import Dict, Any

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import StratifiedKFold
from sklearn.tree import DecisionTreeClassifier

# Import classes from focus_opt package
from focus_opt.hp_space import (
    HyperParameterSpace,
    CategoricalHyperParameter,
    OrdinalHyperParameter,
    ContinuousHyperParameter,
)

from focus_opt.optimizers import HillClimbingOptimizer, GeneticAlgorithmOptimizer

# Set up logging
logging.basicConfig(level=logging.INFO)

# Define the hyperparameter space for the Decision Tree Classifier
hp_space = HyperParameterSpace("Decision Tree Hyperparameter Space")

hp_space.add_hp(CategoricalHyperParameter(name="criterion", values=["gini", "entropy"]))
hp_space.add_hp(CategoricalHyperParameter(name="splitter", values=["best", "random"]))
hp_space.add_hp(
    OrdinalHyperParameter(name="max_depth", values=[None] + list(range(1, 21)))
)
hp_space.add_hp(
    ContinuousHyperParameter(
        name="min_samples_split", min_value=2, max_value=20, is_int=True
    )
)
hp_space.add_hp(
    ContinuousHyperParameter(
        name="min_samples_leaf", min_value=1, max_value=20, is_int=True
    )
)
hp_space.add_hp(
    ContinuousHyperParameter(name="max_features", min_value=0.0, max_value=1.0)
)

# Load the Breast Cancer dataset
data = load_breast_cancer()
X, y = data.data, data.target

# Define the evaluation function
def dt_evaluation(config: Dict[str, Any], fidelity: int) -> float:
    """
    Evaluation function for a Decision Tree Classifier with cross-validation.

    Args:
        config (Dict[str, Any]): Hyperparameter configuration.
        fidelity (int): Fidelity level (index of the cross-validation fold).

    Returns:
        float: Accuracy for the specified cross-validation fold.
    """
    logging.info(f"Evaluating config: {config} at fidelity level: {fidelity}")

    # Initialize the classifier with the given hyperparameters
    clf = DecisionTreeClassifier(random_state=42, **config)

    # Stratified K-Fold Cross-Validation
    skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)

    # Get the train and test indices for the specified fold
    for fold_index, (train_index, test_index) in enumerate(skf.split(X, y)):
        if fold_index + 1 == fidelity:
            X_train, X_test = X[train_index], X[test_index]
            y_train, y_test = y[train_index], y[test_index]
            clf.fit(X_train, y_train)
            score = clf.score(X_test, y_test)
            logging.info(f"Score for config {config} at fold {fidelity}: {score}")
            return score

    raise ValueError(f"Invalid fidelity level: {fidelity}")

# Instantiate the Hill Climbing Optimizer
hill_climbing_optimizer = HillClimbingOptimizer(
    hp_space=hp_space,
    evaluation_function=dt_evaluation,
    max_fidelity=5,       # Number of cross-validation folds
    maximize=True,        # We aim to maximize accuracy
    log_results=True,
    warm_start=20,        # Number of initial configurations to explore
    random_restarts=5,    # Number of random restarts to avoid local optima
)

# Run the Hill Climbing optimization
best_candidate_hill_climbing = hill_climbing_optimizer.optimize(budget=500)
print(
    f"Best candidate from Hill Climbing: {best_candidate_hill_climbing.config} "
    f"with score: {best_candidate_hill_climbing.evaluation_score}"
)

# Instantiate the Genetic Algorithm Optimizer
ga_optimizer = GeneticAlgorithmOptimizer(
    hp_space=hp_space,
    evaluation_function=dt_evaluation,
    max_fidelity=5,           # Number of cross-validation folds
    maximize=True,            # We aim to maximize accuracy
    population_size=20,       # Size of the population in each generation
    crossover_rate=0.8,       # Probability of crossover between parents
    mutation_rate=0.1,        # Probability of mutation in offspring
    elitism=1,                # Number of top individuals to carry over to the next generation
    tournament_size=3,        # Number of individuals competing in tournament selection
    min_population_size=5,    # Minimum population size to maintain diversity
    log_results=True,
)

# Run the Genetic Algorithm optimization
best_candidate_ga = ga_optimizer.optimize(budget=500)
print(
    f"Best candidate from Genetic Algorithm: {best_candidate_ga.config} "
    f"with score: {best_candidate_ga.evaluation_score}"
)

Explanation

  • Importing focus_opt: We import the necessary classes from the focus_opt package.
  • Hyperparameter Space Definition: We define a hyperparameter space that includes parameters such as criterion, splitter, max_depth, min_samples_split, min_samples_leaf, and max_features.
  • Evaluation Function: The dt_evaluation function evaluates a given hyperparameter configuration using cross-validation. The fidelity parameter corresponds to the cross-validation fold index, enabling multi-fidelity optimization.
  • Optimizers: We use both HillClimbingOptimizer and GeneticAlgorithmOptimizer from focus_opt.optimizers to search for the best hyperparameter configuration within the defined budget.
  • Running the Optimization: We specify a computational budget (e.g., budget=500), which limits the total number of evaluations performed during the optimization process.

Usage Guide

Defining Hyperparameter Spaces

focus_opt allows you to define a hyperparameter space by creating instances of different hyperparameter types:

  • CategoricalHyperParameter: For hyperparameters that take on a set of discrete categories.
  • OrdinalHyperParameter: For hyperparameters that have an inherent order.
  • ContinuousHyperParameter: For hyperparameters with continuous values, including integers and floats.

Example:

from focus_opt.hp_space import (
    HyperParameterSpace,
    CategoricalHyperParameter,
    ContinuousHyperParameter
)

hp_space = HyperParameterSpace("Model Hyperparameters")

hp_space.add_hp(
    CategoricalHyperParameter(
        name="activation_function", 
        values=["relu", "tanh", "sigmoid"]
    )
)

hp_space.add_hp(
    ContinuousHyperParameter(
        name="learning_rate", 
        min_value=0.0001, 
        max_value=0.1
    )
)

Implementing Custom Evaluation Functions

Your evaluation function should accept a hyperparameter configuration and a fidelity level, then return a performance score. Here's a template:

from typing import Dict, Any

def evaluation_function(config: Dict[str, Any], fidelity: int) -> float:
    """
    Custom evaluation function.

    Args:
        config (Dict[str, Any]): Hyperparameter configuration.
        fidelity (int): Fidelity level (e.g., amount of data or number of epochs).

    Returns:
        float: Performance score.
    """
    # Implement your model training and evaluation logic here
    pass

Using the Hill Climbing Optimizer

from focus_opt.optimizers import HillClimbingOptimizer

optimizer = HillClimbingOptimizer(
    hp_space=hp_space,
    evaluation_function=evaluation_function,
    max_fidelity=10,       # Adjust based on your fidelity levels
    maximize=True,         # Set to False if minimizing
    log_results=True,
    warm_start=10,         # Initial random configurations
    random_restarts=3,     # Number of restarts to avoid local optima
)

best_candidate = optimizer.optimize(budget=100)
print(f"Best configuration: {best_candidate.config}")
print(f"Best score: {best_candidate.evaluation_score}")

Using the Genetic Algorithm Optimizer

from focus_opt.optimizers import GeneticAlgorithmOptimizer

optimizer = GeneticAlgorithmOptimizer(
    hp_space=hp_space,
    evaluation_function=evaluation_function,
    max_fidelity=10,
    maximize=True,
    population_size=50,
    crossover_rate=0.7,
    mutation_rate=0.1,
    elitism=2,
    tournament_size=5,
    log_results=True,
)

best_candidate = optimizer.optimize(budget=500)
print(f"Best configuration: {best_candidate.config}")
print(f"Best score: {best_candidate.evaluation_score}")

Customizing the Optimizers

You can adjust various parameters of the optimizers to suit your needs:

  • For HillClimbingOptimizer:

    • warm_start: Number of random initial configurations.
    • random_restarts: Number of times the optimizer restarts from a new random position.
    • neighbor_selection: Strategy for selecting neighboring configurations.
  • For GeneticAlgorithmOptimizer:

    • population_size: Number of configurations in each generation.
    • crossover_rate: Probability of crossover between parent configurations.
    • mutation_rate: Probability of mutation in offspring configurations.
    • elitism: Number of top configurations to carry over to the next generation.
    • tournament_size: Number of configurations competing during selection.

Multi-Fidelity Optimization

focus_opt enables multi-fidelity optimization by allowing you to specify varying levels of fidelity in your evaluation function. This can help reduce computational costs by evaluating more configurations at lower fidelities and fewer configurations at higher fidelities.

Fidelity Scheduling

You can define your own fidelity scheduling within your evaluation function or rely on the built-in mechanisms:

def evaluation_function(config: Dict[str, Any], fidelity: int) -> float:
    # Use 'fidelity' to adjust the evaluation, such as training epochs or data size
    pass

Requirements

Ensure you have the following packages installed:

  • Python: 3.8 or higher
  • scikit-learn
  • numpy
  • scipy

These dependencies are automatically installed when you install focus_opt using pip.

Contributing

Contributions are welcome! If you find a bug or have an idea for a new feature, please open an issue or submit a pull request.

To contribute:

  1. Fork the repository.
  2. Create a new branch (git checkout -b feature/YourFeature).
  3. Commit your changes (git commit -am 'Add YourFeature').
  4. Push to the branch (git push origin feature/YourFeature).
  5. Open a Pull Request.

Please ensure your code adheres to the existing style standards and includes appropriate tests.

License

This project is licensed under the Apache License 2.0. See the LICENSE file for details.


Author: Eliott Kalfon

Feel free to reach out if you have any questions or need further assistance!


Additional Notes

  • Documentation: Comprehensive documentation is available on Read the Docs.
  • Continuous Integration: The project uses GitHub Actions for automated testing and code quality checks.
  • Code Style: The codebase follows the Black code style for consistency.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

focus_opt-0.1.0.tar.gz (21.0 kB view details)

Uploaded Source

Built Distribution

focus_opt-0.1.0-py3-none-any.whl (18.9 kB view details)

Uploaded Python 3

File details

Details for the file focus_opt-0.1.0.tar.gz.

File metadata

  • Download URL: focus_opt-0.1.0.tar.gz
  • Upload date:
  • Size: 21.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.6

File hashes

Hashes for focus_opt-0.1.0.tar.gz
Algorithm Hash digest
SHA256 8c6df1aad2e357d62c6e28b77440884c3119fb5913b06eba247924856b4f0a0d
MD5 292f2370d1b9c6abb150ed8c8fe57246
BLAKE2b-256 fa4126b3245d524d6bd0b8aefe02885ce17bc39e16e2c74edf407e5826bccf29

See more details on using hashes here.

File details

Details for the file focus_opt-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: focus_opt-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 18.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.6

File hashes

Hashes for focus_opt-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 313d4c51c63e6bef38093775f7cf11342c7298ba02a732177890d0539162204a
MD5 09a9b030676134ab076594f6539c5965
BLAKE2b-256 3fb529024ab6bfe83a64e1281af651b2879b7c922234ecf5f522a7db378c0f65

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page