Skip to main content

Stepwise hyperparameter search for scikit-learn estimators

Project description

sk-stepwise

sk-stepwise is a small Python library for staged hyperparameter optimization of scikit-learn compatible estimators.

The main API is StepwiseOptunaSearchCV, which runs Optuna search one step at a time. Each step optimizes a subset of parameters while carrying forward the best settings found in earlier steps.

Why stepwise search

A flat search space is often larger than it needs to be. Many workflows are easier to reason about in stages:

  • tune structural parameters first
  • tune regularization or sampling parameters next
  • tune learning-rate style parameters later

That is the model this library supports.

Installation

uv add sk-stepwise

For development:

uv sync
uv run pytest
uv run pytest -q tests/test_readme_doctest.py

Quickstart

>>> import numpy as np
>>> import pandas as pd
>>> from sklearn.ensemble import RandomForestRegressor
>>> from sk_stepwise import Float, Int, StepwiseOptunaSearchCV
>>>
>>> rng = np.random.default_rng(42)
>>> X = pd.DataFrame(rng.random((100, 5)), columns=[f"feature_{i}" for i in range(5)])
>>> y = pd.Series(rng.random(100))
>>>
>>> estimator = RandomForestRegressor(random_state=0)
>>> param_distributions = [
...     {"n_estimators": Int(50, 150)},
...     {"max_depth": Int(3, 10)},
...     {"min_samples_split": Float(0.1, 1.0)},
... ]
>>>
>>> search = StepwiseOptunaSearchCV(
...     estimator=estimator,
...     param_distributions=param_distributions,
...     n_trials_per_step=2,
...     random_state=0,
... )
>>> search.fit(X, y)  # doctest: +ELLIPSIS
StepwiseOptunaSearchCV(...)
>>> predictions = search.predict(X)
>>> len(predictions)
100
>>> sorted(search.best_params_.keys())
['max_depth', 'min_samples_split', 'n_estimators']
>>> isinstance(search.best_score_, float)
True

Build a real model from the search results

You can use best_params_ directly with a fresh estimator instance.

>>> from sklearn.ensemble import RandomForestRegressor
>>>
>>> best_params = search.best_params_
>>> sorted(best_params)
['max_depth', 'min_samples_split', 'n_estimators']
>>> final_model = RandomForestRegressor(random_state=0, **best_params)
>>> final_model.fit(X, y)
RandomForestRegressor(...)
>>> isinstance(final_model.get_params()["n_estimators"], int)
True
>>> tuned_predictions = final_model.predict(X)
>>> len(tuned_predictions)
100

Search-space types

Use the backend-neutral dimension helpers:

  • Int(low, high, log=False) for ordered integer values like n_estimators, max_depth, depth, min_samples_leaf
  • Float(low, high, log=False) for continuous values like learning_rate, subsample, regularization strengths
  • Categorical(choices) for unordered values like criterion, solver, bootstrap

Examples:

>>> from sk_stepwise import Categorical, Float, Int
>>>
>>> space = [
...     {"n_estimators": Int(50, 300)},
...     {"max_depth": Int(2, 12)},
...     {"learning_rate": Float(1e-3, 1e-1, log=True)},
...     {"criterion": Categorical(["squared_error", "absolute_error"])},
... ]
>>> len(space)
4

Numeric categorical warning

If you write Categorical([10, 20, 30]), the library now emits a warning. For ordered numeric values, Int(...) or Float(...) is usually a better fit because the optimizer can use the numeric ordering.

Progress logging

Set verbose=1 to print step-by-step progress:

  • Optimizing step 1/3
  • Best parameters after step 1: ...
  • Best score after step 1: ...
  • Improvement: ...

This is intentionally opt-in.

scikit-learn behavior

StepwiseOptunaSearchCV is designed to behave like a sklearn-style search estimator:

  • supports fit, predict, and score
  • exposes best_params_, best_score_, best_estimator_, study_, studies_, and step_results_
  • works with pipelines and namespaced params like regressor__max_depth
  • supports scorer strings and scorer callables
  • supports cv as an int, splitter object, or iterable of splits
  • passes fit metadata such as sample_weight through sklearn evaluation

Optional methods are delegated when supported by the fitted best estimator:

  • predict_proba
  • decision_function
  • transform

Migration from Hyperopt

The old StepwiseHyperoptOptimizer name is deprecated.

Current behavior:

  • StepwiseHyperoptOptimizer(...) still works as a compatibility shim
  • it emits DeprecationWarning
  • it maps old constructor names onto StepwiseOptunaSearchCV

Example migration:

>>> import warnings
>>> from sk_stepwise import StepwiseHyperoptOptimizer, StepwiseOptunaSearchCV
>>> warnings.simplefilter("ignore", DeprecationWarning)
>>> # old
>>> search = StepwiseHyperoptOptimizer(
...     model=estimator,
...     param_space_sequence=space,
...     max_evals_per_step=20,
... )
>>> # new
>>> search = StepwiseOptunaSearchCV(
...     estimator=estimator,
...     param_distributions=space,
...     n_trials_per_step=20,
... )

Important:

  • backend-neutral dimensions such as Int, Float, and Categorical are the supported path
  • old Hyperopt space objects are not part of the new mainline API

Example: pipeline usage

>>> from sklearn.ensemble import RandomForestRegressor
>>> from sklearn.pipeline import Pipeline
>>> from sklearn.preprocessing import StandardScaler
>>> from sk_stepwise import Int, StepwiseOptunaSearchCV
>>>
>>> pipeline = Pipeline(
...     [
...         ("scale", StandardScaler()),
...         ("regressor", RandomForestRegressor(random_state=0)),
...     ]
... )
>>> space = [
...     {"regressor__n_estimators": Int(50, 150)},
...     {"regressor__max_depth": Int(2, 8)},
... ]
>>> search = StepwiseOptunaSearchCV(
...     estimator=pipeline,
...     param_distributions=space,
...     n_trials_per_step=2,
...     random_state=0,
... )

Example: sample weights

>>> import numpy as np
>>> from sklearn.linear_model import LinearRegression
>>> from sk_stepwise import Categorical, StepwiseOptunaSearchCV
>>>
>>> sample_weight = np.linspace(1.0, 2.0, len(y))
>>> search = StepwiseOptunaSearchCV(
...     estimator=LinearRegression(),
...     param_distributions=[{"fit_intercept": Categorical([True, False])}],
...     n_trials_per_step=2,
...     random_state=0,
... )
>>> search.fit(X, y, sample_weight=sample_weight)  # doctest: +ELLIPSIS
StepwiseOptunaSearchCV(...)

Status

The core Optuna path is implemented and covered by tests for:

  • NumPy, pandas, and plain list inputs
  • regression and classification
  • sklearn pipelines
  • XGBoost and CatBoost integration
  • deprecated Hyperopt shim behavior

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sk_stepwise-0.2.0.tar.gz (5.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sk_stepwise-0.2.0-py3-none-any.whl (6.4 kB view details)

Uploaded Python 3

File details

Details for the file sk_stepwise-0.2.0.tar.gz.

File metadata

  • Download URL: sk_stepwise-0.2.0.tar.gz
  • Upload date:
  • Size: 5.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for sk_stepwise-0.2.0.tar.gz
Algorithm Hash digest
SHA256 6620f5c45f2e57bd9fb2f2aa7f36bc7bed32abbc4605c1e8244bc04b685beca7
MD5 ac98997bd4195a5c990dbd2e3306e848
BLAKE2b-256 64de3174340ae3f6a870dc3a79368e979b8ba3aee4a54793f9479daa8de92a2e

See more details on using hashes here.

File details

Details for the file sk_stepwise-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: sk_stepwise-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 6.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for sk_stepwise-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9b1ffaa95d8162d5133152e73cdf0db293fc167a1dfd32cfa4b5fdbbfc65990f
MD5 271069f323b019a94d2286f8a551dd68
BLAKE2b-256 bc60b544b3a489c5fea21e2ae352bb3bb630c4b0ed8933abb923c0b83a3ed532

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page