Hyperparameter optimization for multiple clustering algorithms using Optuna, with Scikit-learn API

These details have not been verified by PyPI

Project links

Homepage

Project description

optuclust

optuclust is a Python module for optimizing clustering algorithms using the Optuna framework. It provides a scikit-learn compatible API with support for a variety of clustering methods and offers additional capabilities such as the calculation of centroids, medoids, and modes for clusters.

Features

Parameter Optimization: Optimize clustering parameters for various algorithms using Optuna.
Supported Clustering Methods:
- Algorithms from scikit-learn, such as KMeans, DBSCAN, and Agglomerative Clustering.
- Advanced methods like HDBSCAN, Self-Organizing Maps (SOM), and kMedoids.
Metrics and Scoring:
- silhouette_score
- calinski_harabasz_score
- davies_bouldin_score (automatically minimized)
- Noise points (label=-1) are filtered out before score computation for density-based algorithms.
Clustering Insights: Provides centroids (arithmetic mean), medoids (Euclidean), and modes (KDE with Scott's bandwidth) for clusters, even if the algorithm does not natively support these features. All descriptors are computed eagerly during fit() and work in any number of dimensions.
Scikit-learn Compatible: Inherits from BaseEstimator and ClusterMixin. Works with clone(), check_is_fitted(), and scikit-learn pipelines.
ClustGridSearch Class: A utility to test all clustering algorithms and identify the best one.
Timeout Management: Separate timeouts for optimization runs (timeout) and individual trials (trial_timeout).
Storage and Resume: Store optimization results in a SQLite database for future analysis, and resume the optimization process later.

Installation

Clone this repository:

git clone git@github.com:filipsPL/optuclust.git

Navigate to the cloned directory and install the required dependencies:
```
cd optuclust
pip install -r requirements.txt
```
Install optuclust:
```
python setup.py install
```

Requires: Python >= 3.8, scikit-learn >= 1.1

Usage

1. Optimizing a Clustering Algorithm

from optuclust import Optimizer
from sklearn.datasets import make_blobs

# Generate synthetic data
X, _ = make_blobs(n_samples=300, centers=4, n_features=2, random_state=42)

# Instantiate and fit the optimizer for KMeans
optimizer = Optimizer(algorithm="kmeans", n_trials=50, scoring="silhouette_score", verbose=True)
optimizer.fit(X)

# Access cluster details
print("Cluster Labels:", optimizer.labels_)
print("Centroids:", optimizer.centroids_)
print("Medoids:", optimizer.medoids_)
print("Modes:", optimizer.modes_)
print("Cluster Centers (native):", optimizer.cluster_centers_)

2. ClustGridSearch

from optuclust import ClustGridSearch
from sklearn.datasets import make_blobs

# Generate synthetic data
X, _ = make_blobs(n_samples=300, centers=4, n_features=2, random_state=42)

# Initialize ClustGridSearch to test all algorithms
grid_search = ClustGridSearch(mode="full", scoring="silhouette_score", verbose=True)

# Fit and get the best method
grid_search.fit(X)
print("Best Algorithm:", grid_search.best_estimator_.algorithm)
print("Best Score:", grid_search.best_score_)
print("Best Parameters:", grid_search.best_params_)

3. Benchmark Example

To benchmark different clustering algorithms, you can use the provided example script:

python example-loop.py

The benchmark will evaluate different clustering methods on various datasets and save the performance metrics and plots.

Supported Algorithms

algorithms = [
    'kmeans', 'kmedoids', 'minibatchkmeans', 'dbscan', 'agglomerativeclustering',
    'meanshift', 'spectralclustering', 'gaussianmixture', 'hdbscan',
    'affinitypropagation', 'birch', 'optics', 'som'
]

Note: Not all algorithms support predict() on new data. Algorithms with inductive prediction: kmeans, minibatchkmeans, meanshift, birch, gaussianmixture, kmedoids, som. Calling predict() on other algorithms (e.g. dbscan, hdbscan) will raise a TypeError.

Parameters

Optimizer Class

algorithm: The clustering algorithm to optimize. Options include those listed in Supported Algorithms.
n_trials: Number of Optuna trials for optimization. Default is 50.
scoring: The metric to optimize. Options are silhouette_score, calinski_harabasz_score, and davies_bouldin_score.
verbose: Enable additional logging if set to True. Can also be an int to set Optuna's verbosity level directly.
show_progress_bar: Display a progress bar during optimization. Default is True.
timeout: Maximum duration (in seconds) for all trials in the optimization process.
trial_timeout: Maximum duration (in seconds) for each individual trial (Unix only, uses SIGALRM).
storage: Optuna storage URI, e.g. sqlite:///optimization.db. When provided, enables resuming a previous optimization run.
logfile: Reserved for future use.

Fitted Attributes

After calling fit(X):

labels_: Cluster labels for each sample.
best_params_: Dictionary of the best hyperparameters found.
model_: The fitted clustering model with the best parameters.
study_: The Optuna Study object with full trial history.
centroids_: Arithmetic mean of each cluster (excludes noise points).
medoids_: Most central data point in each cluster (Euclidean distance).
modes_: Highest density point in each cluster (KDE with Scott's rule bandwidth).
cluster_centers_: Native cluster centers from the model (if available), otherwise None.

ClustGridSearch Class

mode:
- full: Test all algorithms.
- fast: Test a subset of algorithms (kmeans and hdbscan).
n_trials: Number of Optuna trials for each algorithm. Default is 20.
scoring: Metric to select the best clustering algorithm. Options are silhouette_score, calinski_harabasz_score, and davies_bouldin_score.
verbose: Enable detailed logging if set to True.
show_progress_bar: Display a progress bar for each algorithm.

Running Tests

We use pytest for testing. To run tests, simply run:

pytest -v

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.0.2

Feb 11, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

optuclust-0.0.2.tar.gz (13.5 kB view details)

Uploaded Feb 11, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

optuclust-0.0.2-py3-none-any.whl (11.6 kB view details)

Uploaded Feb 11, 2026 Python 3

File details

Details for the file optuclust-0.0.2.tar.gz.

File metadata

Download URL: optuclust-0.0.2.tar.gz
Upload date: Feb 11, 2026
Size: 13.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for optuclust-0.0.2.tar.gz
Algorithm	Hash digest
SHA256	`d755d5e04b9f6e2b01f01e8410d35f2d8a183d2b6c51b8bdf43ac47a6c1b9519`
MD5	`ef7380df52be05c3f69df5365bb6aa82`
BLAKE2b-256	`7f766aee5dec61de9bf7bd6276a65cef30e4a839d492e87d2971ebf6c3ed80be`

See more details on using hashes here.

File details

Details for the file optuclust-0.0.2-py3-none-any.whl.

File metadata

Download URL: optuclust-0.0.2-py3-none-any.whl
Upload date: Feb 11, 2026
Size: 11.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for optuclust-0.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`974d3a63078af77746278f822fcd7da75b9a4281b231415c6da226bc59ac7a20`
MD5	`af7ab07bdb60a30ac6db3d54f1e14bbc`
BLAKE2b-256	`51d8f933c6617706ba05748859cd76b9c2781c948a921dbf944e9980b8419b90`

See more details on using hashes here.

optuclust 0.0.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

optuclust

Features

Installation

Usage

1. Optimizing a Clustering Algorithm

2. ClustGridSearch

3. Benchmark Example

Supported Algorithms

Parameters

Optimizer Class

Fitted Attributes

ClustGridSearch Class

Running Tests

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes