Skip to main content

MetaSklearn: A Metaheuristic-Powered Hyperparameter Optimization Framework for Scikit-Learn Models

Project description

MetaSklearn: A Metaheuristic-Powered Hyperparameter Optimization Framework for Scikit-Learn Models.

GitHub release PyPI version PyPI - Python Version PyPI - Downloads Downloads Tests & Publishes to PyPI Documentation Status Chat DOI License: GPL v3


🌟 Overview

MetaSklearn is a flexible and extensible Python library that brings metaheuristic optimization to hyperparameter tuning of scikit-learn models. It provides a seamless interface to optimize hyperparameters using nature-inspired algorithms from the Mealpy library. It is designed to be user-friendly and efficient, making it easy to integrate into your machine learning workflow.

🚀 Features

  • ✅ Hyperparameter optimization by metaheuristic algorithms with mealpy.
  • ✅ Compatible with any scikit-learn model (SVM, RandomForest, XGBoost, etc.)
  • ✅ Supports classification and regression tasks
  • ✅ Custom and scikit-learn scoring support
  • ✅ Integration with PerMetrics for rich evaluation metrics
  • ✅ Scikit-learn compatible API: .fit(), .predict(), .score()

📦 Installation

Install the latest version using pip:

pip install metasklearn

After that, check the version to ensure successful installation:

$ python
>>> import metasklearn
>>> metasklearn.__version__

🧠 How It Works

MetaSklearn defines a custom MetaSearchCV class that wraps your model and performs hyperparameter tuning using any optimizer supported by Mealpy. The framework evaluates model performance using either scikit-learn’s metrics or additional ones from PerMetrics library.

🚀 Quick Start

📘 Example with SVM model for regression task

from sklearn.svm import SVR
from sklearn.datasets import load_diabetes
from metasklearn import MetaSearchCV, FloatVar, StringVar, Data

## Load data object
X, y = load_diabetes(return_X_y=True)
data = Data(X, y)

## Split train and test
data.split_train_test(test_size=0.2, random_state=42, inplace=True)
print(data.X_train.shape, data.X_test.shape)

## Scaling dataset
data.X_train, scaler_X = data.scale(data.X_train, scaling_methods=("standard", "minmax"))
data.X_test = scaler_X.transform(data.X_test)

data.y_train, scaler_y = data.scale(data.y_train, scaling_methods=("standard", "minmax"))
data.y_train = data.y_train.ravel()
data.y_test = scaler_y.transform(data.y_test.reshape(-1, 1)).ravel()

# Define param bounds for SVC

# param_bounds = {          ==> This is for GridSearchCV, show you how to convert to our MetaSearchCV
#     "C": [0.1, 100],
#     "gamma": [1e-4, 1],
#     "kernel": ["linear", "rbf", "poly"]
# }

param_bounds = [
    FloatVar(lb=0., ub=100., name="C"),
    FloatVar(lb=1e-4, ub=1., name="gamma"),
    StringVar(valid_sets=("linear", "rbf", "poly"), name="kernel")
]

# Initialize and fit MetaSearchCV
searcher = MetaSearchCV(
    estimator=SVR(),
    param_bounds=param_bounds,
    task_type="regression",
    optim="BaseGA",
    optim_params={"epoch": 20, "pop_size": 30, "name": "GA"},
    cv=3,
    scoring="MSE",  # or any custom scoring like "F1_macro"
    seed=42,
    n_jobs=2,
    verbose=True,
    mode='single', n_workers=None, termination=None
)

searcher.fit(data.X_train, data.y_train)
print("Best parameters (Classification):", searcher.best_params)
print("Best model: ", searcher.best_estimator)
print("Best score during searching: ", searcher.best_score)

# Make prediction after re-fit
y_pred = searcher.predict(data.X_test)
print("Test Accuracy:", searcher.score(data.X_test, data.y_test))
print("Test Score: ", searcher.scores(data.X_test, data.y_test, list_metrics=("RMSE", "R", "KGE", "NNSE")))

📘 Example with SVM model for classification task

from sklearn.svm import SVC
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from metasklearn import MetaSearchCV, FloatVar, StringVar

# Load dataset
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define param bounds for SVC

# param_bounds = {          ==> This is for GridSearchCV, show you how to convert to our MetaSearchCV
#     "C": [0.1, 100],
#     "gamma": [1e-4, 1],
#     "kernel": ["linear", "rbf", "poly"]
# }

param_bounds = [
    FloatVar(lb=0., ub=100., name="C"),
    FloatVar(lb=1e-4, ub=1., name="gamma"),
    StringVar(valid_sets=("linear", "rbf", "poly"), name="kernel")
]

# Initialize and fit MetaSearchCV
searcher = MetaSearchCV(
    estimator=SVC(),
    param_bounds=param_bounds,
    task_type="classification",
    optim="BaseGA",
    optim_params={"epoch": 20, "pop_size": 30, "name": "GA"},
    cv=3,
    scoring="AS",  # or any custom scoring like "F1_macro"
    seed=42,
    n_jobs=2,
    verbose=True,
    mode='single', n_workers=None, termination=None
)

searcher.fit(X_train, y_train)
print("Best parameters (Classification):", searcher.best_params)
print("Best model: ", searcher.best_estimator)
print("Best score during searching: ", searcher.best_score)

# Make prediction after re-fit
y_pred = searcher.predict(X_test)
print("Test Accuracy:", searcher.score(X_test, y_test))
print("Test Score: ", searcher.scores(X_test, y_test, list_metrics=("AS", "RS", "PS", "F1S")))

As can be seen, you do it like any other model from Scikit-Learn library such as Random Forest, Decision Tree, XGBoost,...

📋 Parameters - Variable Types in MetaSearchCV. How to choose them?

This section explains how to use different types of variables from the MetaSearchCV library when defining hyperparameter search spaces. Each variable type is suitable for different kinds of optimization parameters.

1. IntegerVar – Integer Variable

from metasklearn import IntegerVar

var = IntegerVar(lb=1, ub=100, name="n_estimators")

Used for discrete numerical parameters like number of neighbors in KNN, number of estimators in ensembles, etc.

2. FloatVar – Float/Continuous Variable

from metasklearn import FloatVar

var = FloatVar(lb=0.001, ub=1.0, name="learning_rate")

Used for continuous numerical parameters such as learning_rate, C, gamma, etc.

3. StringVar – Categorical/String Variable

from metasklearn import StringVar

var = StringVar(valid_sets=("linear", "poly", "rbf"), name="kernel")

Used for string parameters with a limited number of choices, e.g., kernel in SVM. Value None can be set also.

4. BinaryVar – Binary Variable (0 or 1)

from metasklearn import BinaryVar

var = BinaryVar(n_vars=1, name="feature_selected")

Used in binary feature selection problems or any 0/1-based decision.

5. BoolVar – Boolean Variable (True or False)

from metasklearn import BoolVar

var = BoolVar(n_vars=1, name="use_bias")

Used for Boolean-type arguments such as fit_intercept, use_bias, etc.

6. CategoricalVar - A set of mixed discrete variables such as int, float, string, None

from metasklearn import CategoricalVar

var = CategoricalVar(valid_sets=((3., None, "alpha"), (5, 12, 32), ("auto", "exp", "sin")), name="categorical")

This type of variable is useful when a hyperparameter can take on a predefined set of mixed values, such as: Mixed types of parameters in optimization tasks (int, string, bool, float,...).

7. SequenceVar - Variables as tuple, list, or set

from metasklearn import SequenceVar

var = SequenceVar(valid_sets=((10, ), (20, 15), (30, 10, 5)), return_type=list, name="hidden_layer_sizes")

This type of variable is useful for defining hyperparameters that represent sequences, such as the sizes of hidden layers in a neural network.

8. PermutationVar – Permutation Variable

from metasklearn import PermutationVar

var = PermutationVar(valid_set=(1, 2, 5, 10), name="job_order")

Used for optimization problems involving permutations, like scheduling or routing.

9. TransferBinaryVar – Transfer Binary Variable

from metasklearn import TransferBinaryVar

var = TransferBinaryVar(n_vars=1, tf_func="vstf_01", lb=-8., ub=8., all_zeros=True, name="transfer_binary")

Used in binary search spaces that support transformation-based metaheuristics.

10. TransferBoolVar – Transfer Boolean Variable

from metasklearn import TransferBoolVar

var = TransferBoolVar(n_vars=1, tf_func="vstf_01", lb=-8., ub=8., name="transfer_bool")

Used in Boolean search spaces with transferable logic between states.

🔧 Example: Define a Mixed Search Space

from metasklearn import (IntegerVar, FloatVar, StringVar, BinaryVar, BoolVar, 
        PermutationVar, CategoricalVar, SequenceVar, TransferBinaryVar, TransferBoolVar)

param_bounds = [
    IntegerVar(lb=1, ub=20, name="n_neighbors"),
    FloatVar(lb=0.001, ub=1.0, name="alpha"),
    StringVar(valid_sets=["uniform", "distance"], name="weights"),
    BinaryVar(name="use_feature"),
    BoolVar(name="fit_bias"),
    PermutationVar(valid_set=(1, 2, 5, 10), name="job_order"),
    CategoricalVar(valid_sets=[0.1, "relu", False, None, 3], name="activation_choice"),
    SequenceVar(valid_sets=((10,), (20, 10), (30, 50, 5)), name="mixed_choice"),
    TransferBinaryVar(name="bin_transfer"),
    TransferBoolVar(name="bool_transfer")
]

Use this format when designing hyperparameter spaces for advanced models in MetaSearchCV.

⚙ Supported Optimizers

MetaSklearn integrates all metaheuristic algorithms from Mealpy, including:

  • AOA (Arithmetic Optimization Algorithm)
  • GWO (Grey Wolf Optimizer)
  • PSO (Particle Swarm Optimization)
  • DE (Differential Evolution)
  • WOA, SSA, MVO, and many more...

You can pass any optimizer name or an instantiated optimizer object to MetaSearchCV. For more details, please refer to the link

📊 Custom Metrics

You can use custom scoring functions from:

  • sklearn.metrics.get_scorer_names()

  • permetrics.RegressionMetric and ClassificationMetric

For details on PerMetrics library, please refer to the link

📚 Documentation

Documentation is available at: 👉 https://metasklearn.readthedocs.io

You can build the documentation locally:

cd docs
make html

🧪 Testing

You can run unit tests using:

pytest tests/

🤝 Contributing

We welcome contributions to MetaSklearn! If you have suggestions, improvements, or bug fixes, feel free to fork the repository, create a pull request, or open an issue.

📄 License

This project is licensed under the GPLv3 License. See the LICENSE file for more details.

Citation Request

Please include these citations if you plan to use this library:

@software{thieu20250510MetaSklearn,
  author       = {Nguyen Van Thieu},
  title        = {MetaSklearn: A Metaheuristic-Powered Hyperparameter Optimization Framework for Scikit-Learn Models},
  month        = June,
  year         = 2025,
  doi         = {10.6084/m9.figshare.28978805},
  url          = {https://github.com/thieu1995/MetaSklearn}
}

Official Links


Developed by: Thieu @ 2025

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

metasklearn-0.3.0.tar.gz (39.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

metasklearn-0.3.0-py3-none-any.whl (34.7 kB view details)

Uploaded Python 3

File details

Details for the file metasklearn-0.3.0.tar.gz.

File metadata

  • Download URL: metasklearn-0.3.0.tar.gz
  • Upload date:
  • Size: 39.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for metasklearn-0.3.0.tar.gz
Algorithm Hash digest
SHA256 d8b733993d1cc1efd1a6a4d1900ed461cfcf660478301c5001aebbcbb79906dc
MD5 d8e696ad9af4b0d92cc8c7f3b0db37d1
BLAKE2b-256 20e627af61b7fb7ada15fe4355b9fe69f6118da35a2c0eb0c601b2a578094acc

See more details on using hashes here.

Provenance

The following attestation bundles were made for metasklearn-0.3.0.tar.gz:

Publisher: publish-package.yml on thieu1995/MetaSklearn

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file metasklearn-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: metasklearn-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 34.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for metasklearn-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 69d76ee6746b7e7f9eb41ca284e06d8a9bc6c7dd3de5778e5fc21f956762cd78
MD5 24d10836c700e452213905efcb2e5ffd
BLAKE2b-256 b6f28fb2dfa9ab4a7fc481337c847ec173669f02dde41d04640740ffb0fb5f76

See more details on using hashes here.

Provenance

The following attestation bundles were made for metasklearn-0.3.0-py3-none-any.whl:

Publisher: publish-package.yml on thieu1995/MetaSklearn

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page