A GridSearchCV-like hyperparameter tuner for clustering algorithms.

These details have not been verified by PyPI

Project links

Project description

cluster-tuner

A GridSearchCV-like hyperparameter tuner for clustering algorithms.

Installation

pip install cluster-tuner

Requirements: Python >= 3.10, scikit-learn >= 1.6

Purpose

This project provides a simple, scikit-learn-compatible hyperparameter tuning tool for clustering. It's intended for situations where predicting clusters for new data points is a low priority. Many clustering algorithms in scikit-learn are transductive, meaning they are not designed to be applied to new observations. Even when using an inductive algorithm like KMeans, you might not need to predict clusters for new data—or prediction might be a lower priority than finding the best clusters.

Since scikit-learn's GridSearchCV uses cross-validation and is designed for inductive models, an alternative tool is necessary.

`ClusterTuner`

The ClusterTuner class is a hyperparameter search tool for clustering algorithms. It fits one model per hyperparameter combination and selects the best. The implementation is derived from scikit-learn's GridSearchCV, but without cross-validation. It works with clustering-specific scorers and doesn't always require a target variable, since metrics like silhouette, Calinski-Harabasz, and Davies-Bouldin are designed for unsupervised evaluation.

The interface is largely the same as GridSearchCV. Results are stored in the results_ attribute (cv_results_ also works as an alias for compatibility).

Basic Usage

from sklearn.cluster import DBSCAN
from cluster_tuner import ClusterTuner

tuner = ClusterTuner(
    DBSCAN(),
    param_grid={'eps': [0.3, 0.5, 0.7], 'min_samples': [5, 10]},
    scoring='silhouette',
)
tuner.fit(X)

print(tuner.best_params_)
print(tuner.best_score_)
labels = tuner.labels_

# Access detailed results (single-metric uses 'test_score')
print(tuner.results_['test_score'])

Key Parameters

scoring: Metric name (string), callable, or list/dict for multi-metric evaluation.
refit (default=True): Whether to refit the best estimator on the full dataset. For multi-metric, must be a string specifying which metric to use.
max_noise (default=0.1): Maximum allowed ratio of noise points (label=-1). Fits exceeding this threshold receive error_score.
min_cluster_size (default=3): Minimum allowed size for the smallest cluster. Fits with smaller clusters receive error_score.
error_score (default=np.nan): Value to assign when a fit fails or violates constraints. Use 'raise' to raise exceptions instead.
n_jobs: Number of parallel jobs (-1 for all CPUs).

Multi-Metric Scoring

Evaluate multiple metrics simultaneously using a list, tuple, or dict:

tuner = ClusterTuner(
    DBSCAN(),
    param_grid={'eps': [0.3, 0.5, 0.7]},
    scoring=['silhouette', 'calinski_harabasz', 'neg_davies_bouldin'],
    refit='silhouette',  # Required: which metric to use for selecting best
)
tuner.fit(X)

# Results use 'test_' prefix for each metric
print(tuner.results_['test_silhouette'])
print(tuner.results_['test_calinski_harabasz'])
print(tuner.results_['test_neg_davies_bouldin'])

Supervised Scoring

When ground truth labels are available, use supervised metrics:

from sklearn.cluster import KMeans

tuner = ClusterTuner(
    KMeans(n_init='auto'),
    param_grid={'n_clusters': [2, 3, 4, 5]},
    scoring='adjusted_rand',
)
tuner.fit(X, y=y_true)  # Pass ground truth labels

print(tuner.best_score_)  # Adjusted Rand Index

Pipeline Support

ClusterTuner works with scikit-learn pipelines:

from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.cluster import KMeans

pipe = make_pipeline(
    StandardScaler(),
    PCA(n_components=10),
    KMeans(n_init='auto'),
)

tuner = ClusterTuner(
    pipe,
    param_grid={'kmeans__n_clusters': [2, 3, 4, 5]},
    scoring='silhouette',
)
tuner.fit(X)

Scorers

You can use ClusterTuner by passing the string name of a clustering metric, e.g., 'silhouette', 'calinski_harabasz', or 'adjusted_rand' (the _score suffix is optional).

Recognized Scorer Names

Unsupervised metrics (no ground truth required):

'silhouette' / 'silhouette_score'
'silhouette_euclidean' / 'silhouette_score_euclidean'
'silhouette_cosine' / 'silhouette_score_cosine'
'neg_davies_bouldin' / 'neg_davies_bouldin_score'
'calinski_harabasz' / 'calinski_harabasz_score'

Supervised metrics (require ground truth labels y):

'mutual_info' / 'mutual_info_score'
'normalized_mutual_info' / 'normalized_mutual_info_score'
'adjusted_mutual_info' / 'adjusted_mutual_info_score'
'rand' / 'rand_score'
'adjusted_rand' / 'adjusted_rand_score'
'completeness' / 'completeness_score'
'fowlkes_mallows' / 'fowlkes_mallows_score'
'homogeneity' / 'homogeneity_score'
'v_measure' / 'v_measure_score'

Naming Convention

Following sklearn's convention, metrics where lower is better use a neg_ prefix. The score is negated internally so that higher values always indicate better clustering:

'neg_davies_bouldin' — Davies-Bouldin index (lower raw values = better separation)

Custom Scorers

Create custom scorers using make_scorer:

from cluster_tuner import make_scorer

# Unsupervised scorer: score_func(X, labels)
def my_metric(X, labels):
    return some_score

scorer = make_scorer(my_metric, ground_truth=False)

# Supervised scorer: score_func(y_true, labels)
def my_supervised_metric(y_true, labels):
    return some_score

scorer = make_scorer(my_supervised_metric, ground_truth=True)

tuner = ClusterTuner(estimator, param_grid, scoring=scorer)

Caveats

Comparing Clustering Algorithms

Consider your dataset and goals before comparing clustering algorithms. A higher score doesn't necessarily mean a better choice—different algorithms have different benefits, drawbacks, and use cases.

Credits

Most of the credit goes to the scikit-learn developers for the engineering behind the search estimators.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.1

Feb 1, 2026

This version

0.1.0

Feb 1, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cluster_tuner-0.1.0.tar.gz (33.9 kB view details)

Uploaded Feb 1, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

cluster_tuner-0.1.0-py3-none-any.whl (33.2 kB view details)

Uploaded Feb 1, 2026 Python 3

File details

Details for the file cluster_tuner-0.1.0.tar.gz.

File metadata

Download URL: cluster_tuner-0.1.0.tar.gz
Upload date: Feb 1, 2026
Size: 33.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.17 {"installer":{"name":"uv","version":"0.9.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for cluster_tuner-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`004ea74ceed43c915164327ffa1a0aee852c852d1a42bcb0d91ec3f644736bd3`
MD5	`320ad7c1af66f3323ceb945cf9c89002`
BLAKE2b-256	`f2e3d865becd8e6515ffe411d283052e376b3fcddb2e121992f0d0ea014f782e`

See more details on using hashes here.

File details

Details for the file cluster_tuner-0.1.0-py3-none-any.whl.

File metadata

Download URL: cluster_tuner-0.1.0-py3-none-any.whl
Upload date: Feb 1, 2026
Size: 33.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.17 {"installer":{"name":"uv","version":"0.9.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for cluster_tuner-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`818fc70b791e03e79006c8a21584db635c96d15b335b57758c76e2fcb9102163`
MD5	`e211282815f4acf4b15007efcbb12ade`
BLAKE2b-256	`2b77154a4cb2467ab55c850346fc47a6743429c9487cc17955009390bb983861`

See more details on using hashes here.

cluster-tuner 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

cluster-tuner

Installation

Purpose

ClusterTuner

Basic Usage

Key Parameters

Multi-Metric Scoring

Supervised Scoring

Pipeline Support

Scorers

Recognized Scorer Names

Naming Convention

Custom Scorers

Caveats

Comparing Clustering Algorithms

Credits

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`ClusterTuner`