sklearn-genetic-opt

Scikit-learn models hyperparameter tuning and feature selection using evolutionary algorithms

These details have not been verified by PyPI

Project links

Project description

sklearn-genetic-opt

sklearn-genetic-opt adds evolutionary optimization tools to the scikit-learn workflow. It can tune hyperparameters with GASearchCV and select feature subsets with GAFeatureSelectionCV using algorithms powered by DEAP.

The project is useful when a search space is mixed, irregular, expensive, or not well served by an exhaustive grid. It follows familiar scikit-learn patterns: define an estimator, define a search space, call fit, inspect best_params_ or support_, and use the fitted object for prediction.

Documentation is available at rodrigo-arenas.github.io/Sklearn-genetic-opt.

Highlights

GASearchCV for hyperparameter search across classification, regression, and supported outlier-detection estimators.
GAFeatureSelectionCV for wrapper-based feature selection with cross-validation.
Search spaces for integer, continuous, and categorical parameters.
Smart initial populations with PopulationConfig(initializer="smart"), including warm-start seeds, estimator defaults, Latin-hypercube numeric coverage, stratified categorical coverage, and duplicate avoidance.
Adaptive mutation and crossover schedules.
Optional local search, diversity control, random immigrants, and fitness sharing to improve exploration, avoid premature convergence, and refine good solutions.
Parallel candidate evaluation with n_jobs and parallel_backend.
Evaluation caching, optimizer telemetry through history, and fit-cost counters through fit_stats_.
Callbacks for early stopping, progress reporting, checkpoints, TensorBoard, and custom logic.
Plotting helpers plus MLflow 3 logging support.

Installation

Install the core package with pip:

pip install sklearn-genetic-opt

Or with conda from the conda-forge channel:

conda install -c conda-forge sklearn-genetic-opt

Install optional plotting, MLflow, and TensorBoard integrations with pip:

pip install sklearn-genetic-opt[all]

The conda package ships only the core dependencies. To use the optional integrations in a conda environment, install the extras you need alongside it, for example:

conda install -c conda-forge sklearn-genetic-opt seaborn mlflow

Requirements

Core requirements:

Python >= 3.12
scikit-learn >= 1.5.0
NumPy >= 1.26.0
DEAP >= 1.4.0
tqdm >= 4.60.0

Optional extras:

Seaborn >= 0.13.2 for plots
MLflow >= 3.14.0 for experiment logging
TensorFlow >= 2.21.0 and TensorBoard >= 2.20.0,<2.21.0 for TensorBoard logging on Python < 3.14

Quick Start: Hyperparameter Search

from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import StratifiedKFold, train_test_split

from sklearn_genetic import EvolutionConfig, GASearchCV, PopulationConfig, RuntimeConfig
from sklearn_genetic.space import Categorical, Continuous, Integer

X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(
    X,
    y,
    test_size=0.25,
    stratify=y,
    random_state=42,
)

param_grid = {
    "n_estimators": Integer(50, 200),
    "max_depth": Integer(2, 12),
    "max_features": Continuous(0.3, 1.0),
    "criterion": Categorical(["gini", "entropy", "log_loss"]),
}

search = GASearchCV(
    estimator=RandomForestClassifier(random_state=42),
    param_grid=param_grid,
    cv=StratifiedKFold(n_splits=3, shuffle=True, random_state=42),
    scoring="accuracy",
    evolution_config=EvolutionConfig(population_size=12, generations=8),
    population_config=PopulationConfig(initializer="smart"),
    runtime_config=RuntimeConfig(
        n_jobs=-1,
        parallel_backend="auto",
        use_cache=True,
        verbose=True,
    ),
)

search.fit(X_train, y_train)

print(search.best_params_)
print(search.best_score_)
print(search.score(X_test, y_test))
print(search.fit_stats_)

Quick Start: Feature Selection

import numpy as np
from sklearn.datasets import load_iris
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC

from sklearn_genetic import (
    EvolutionConfig,
    GAFeatureSelectionCV,
    PopulationConfig,
    RuntimeConfig,
)
from sklearn_genetic.schedules import ExponentialAdapter

X, y = load_iris(return_X_y=True)
noise = np.random.default_rng(42).uniform(0, 1, size=(X.shape[0], 8))
X = np.hstack([X, noise])

X_train, X_test, y_train, y_test = train_test_split(
    X,
    y,
    test_size=0.25,
    stratify=y,
    random_state=42,
)

selector = GAFeatureSelectionCV(
    estimator=SVC(gamma="auto"),
    cv=3,
    scoring="accuracy",
    evolution_config=EvolutionConfig(
        population_size=12,
        generations=8,
        mutation_probability=ExponentialAdapter(0.8, 0.2, 0.05),
        crossover_probability=ExponentialAdapter(0.2, 0.8, 0.05),
    ),
    population_config=PopulationConfig(initializer="smart"),
    runtime_config=RuntimeConfig(n_jobs=-1, verbose=True),
)

selector.fit(X_train, y_train)

print(selector.support_)
print(accuracy_score(y_test, selector.predict(X_test)))
print(selector.transform(X_test).shape)

Improving Search Quality

The default PopulationConfig(initializer="smart") is recommended for most runs. It improves early search coverage without reducing the number of generations or population size.

For harder spaces, combine it with optimizer controls:

from sklearn_genetic import EvolutionConfig, OptimizationConfig, PopulationConfig, RuntimeConfig
from sklearn_genetic.schedules import ExponentialAdapter, InverseAdapter

search = GASearchCV(
    estimator=estimator,
    param_grid=param_grid,
    scoring="roc_auc",
    cv=3,
    evolution_config=EvolutionConfig(
        population_size=16,
        generations=12,
        crossover_probability=ExponentialAdapter(0.85, 0.45, 0.08),
        mutation_probability=InverseAdapter(0.18, 0.55, 0.12),
    ),
    population_config=PopulationConfig(
        initializer="smart",
        warm_start_configs=[{"C": 1.0, "class_weight": None}],
    ),
    runtime_config=RuntimeConfig(n_jobs=-1, parallel_backend="auto", use_cache=True),
    optimization_config=OptimizationConfig(
        local_search=True,
        local_search_top_k=2,
        local_search_steps=2,
        diversity_control=True,
        random_immigrants_fraction=0.15,
        fitness_sharing=True,
    ),
)

Use:

PopulationConfig.warm_start_configs when you already know useful candidate settings.
OptimizationConfig(local_search=True) to refine the best candidates near the end of the run.
OptimizationConfig(diversity_control=True) to react when the population collapses too early.
OptimizationConfig(fitness_sharing=True) to keep multiple promising regions alive.
fit_stats_ to understand evaluation cost, cache hits, and skipped work.
history to inspect fitness, diversity, stagnation, and optimizer telemetry by generation.

Parallelism

RuntimeConfig.n_jobs controls parallel execution:

RuntimeConfig(n_jobs=1) runs sequentially.
RuntimeConfig(n_jobs=-1) uses all available CPU cores.
RuntimeConfig(n_jobs=k) uses k workers.

With RuntimeConfig(parallel_backend="auto") or RuntimeConfig(parallel_backend="population"), unique candidates in the same generation are evaluated in parallel and each candidate runs cross-validation sequentially to avoid nested parallelism. Use RuntimeConfig(parallel_backend="cv") to evaluate candidates serially while passing n_jobs to each candidate’s cross-validation call.

Persistence and Checkpointing

Use ModelCheckpoint to write progress during long searches:

from sklearn_genetic.callbacks import ModelCheckpoint

search.fit(X_train, y_train, callbacks=[ModelCheckpoint("checkpoint.pkl")])

Use save and load when you want to persist a fitted search object:

search.save("ga_search.pkl")
restored = GASearchCV(estimator=estimator, param_grid=param_grid)
restored.load("ga_search.pkl")

Benchmarks

The repository includes benchmark scripts for optimizer mechanics, model metrics, and comparisons against scikit-learn search methods:

python benchmarks/benchmark_fit.py --quick
python benchmarks/benchmark_fit.py --parallel-backends auto cv --runs 3
python benchmarks/benchmark_search_methods.py --quick
python benchmarks/benchmark_search_methods.py --methods gasearch randomized grid --runs 3

The reports include runtime, evaluated candidates, cross-validation effort, cache/duplicate counts, optimizer telemetry, holdout metrics, and best parameters.

Documentation and Examples

Useful links:

Documentation: https://rodrigo-arenas.github.io/Sklearn-genetic-opt/
Release notes: https://rodrigo-arenas.github.io/Sklearn-genetic-opt/versions/0.13/release-notes
PyPI: https://pypi.org/project/sklearn-genetic-opt/
Source code: https://github.com/rodrigo-arenas/Sklearn-genetic-opt/
Issues: https://github.com/rodrigo-arenas/Sklearn-genetic-opt/issues

The documentation includes tutorials and executed notebooks for:

comparing GASearchCV with sklearn search methods
pipeline tuning and prediction
feature selection
multi-metric optimization
MLflow 3 logging
checkpointing and persistence
advanced optimizer controls

Troubleshooting

TypeError: param_grid values must be instances of Integer, Continuous or Categorical

Use sklearn_genetic.space objects instead of plain lists.

from sklearn_genetic.space import Integer

param_grid = {"max_depth": Integer(2, 10)}

Missing optional dependencies

Install the optional extras:

pip install sklearn-genetic-opt[all]

TensorFlow, TensorBoard, or MLflow dependency conflicts

Install only the extras you need, or use a clean virtual environment. TensorFlow/TensorBoard support is only available on Python versions supported by those projects.

Contributing

Contributions are welcome. Please read the contribution guide, open issues for bugs or proposals, and include tests and documentation when changing behavior.

For local development:

git clone https://github.com/rodrigo-arenas/Sklearn-genetic-opt.git
cd Sklearn-genetic-opt
pip install -r dev-requirements.txt
pytest sklearn_genetic

Big thanks to everyone helping improve the project.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.13.0

Jun 22, 2026

0.12.0

Jul 23, 2025

0.11.1

Sep 17, 2024

0.11.0

Sep 12, 2024

0.11.0.dev0 pre-release

Sep 12, 2024

0.10.1

Mar 15, 2023

0.10.1.dev0 pre-release

Mar 15, 2023

0.10.0

Feb 15, 2023

0.10.0.dev0 pre-release

Feb 15, 2023

0.9.0

Jun 6, 2022

0.9.0.dev0 pre-release

Jun 6, 2022

0.8.1

Mar 9, 2022

0.8.1.dev0 pre-release

Mar 9, 2022

0.8.0

Jan 5, 2022

0.8.0.dev0 pre-release

Jan 5, 2022

0.7.0

Nov 17, 2021

0.7.0.dev0 pre-release

Nov 17, 2021

0.6.1

Aug 4, 2021

0.6.1.dev0 pre-release

Aug 4, 2021

0.6.0

Jul 5, 2021

0.5.0

Jun 22, 2021

0.5.0.dev0 pre-release

Jun 22, 2021

0.4.1

Jun 2, 2021

0.4.1.dev0 pre-release

Jun 2, 2021

0.4.0

May 31, 2021

0.3.0

May 28, 2021

0.3.0.dev1 pre-release

May 28, 2021

0.3.0.dev0 pre-release

May 28, 2021

0.2.1

May 26, 2021

0.2.1.dev0 pre-release

May 26, 2021

0.2.0

May 25, 2021

0.2.0.dev0 pre-release

May 25, 2021

0.1.1

Apr 28, 2021

0.1.0

Apr 27, 2021

0.1.0.dev0 pre-release

Apr 27, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sklearn_genetic_opt-0.13.0.tar.gz (60.7 kB view details)

Uploaded Jun 22, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

sklearn_genetic_opt-0.13.0-py3-none-any.whl (63.6 kB view details)

Uploaded Jun 22, 2026 Python 3

File details

Details for the file sklearn_genetic_opt-0.13.0.tar.gz.

File metadata

Download URL: sklearn_genetic_opt-0.13.0.tar.gz
Upload date: Jun 22, 2026
Size: 60.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.14

File hashes

Hashes for sklearn_genetic_opt-0.13.0.tar.gz
Algorithm	Hash digest
SHA256	`2a6200c4f3e808cc6869bf5c4e6c1f6b69c6fc88340c359590e08fffad70eaee`
MD5	`413819f9d754fce695dd82eb803e19d6`
BLAKE2b-256	`b6037be4aaa641a7396d673bfdc42791f2c48273e129649df11bde265977ff72`

See more details on using hashes here.

File details

Details for the file sklearn_genetic_opt-0.13.0-py3-none-any.whl.

File metadata

Download URL: sklearn_genetic_opt-0.13.0-py3-none-any.whl
Upload date: Jun 22, 2026
Size: 63.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.14

File hashes

Hashes for sklearn_genetic_opt-0.13.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`cc4d1d39c1cea67d429a5fba778719dc9bc897b3243b7303a26a8888c6e125ea`
MD5	`2edec0c76f64887bc551dc6398e64706`
BLAKE2b-256	`3e12aed378f9a38ae8bb56df4b046871b9d06aa8d05981c749331b1227e66fd2`

See more details on using hashes here.

sklearn-genetic-opt 0.13.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

sklearn-genetic-opt

Highlights

Installation

Requirements

Quick Start: Hyperparameter Search

Quick Start: Feature Selection

Improving Search Quality

Parallelism

Persistence and Checkpointing

Benchmarks

Documentation and Examples

Troubleshooting

Contributing

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes