A robust clustering evaluation framework that combines micro- and macro-averaged silhouette scores into a composite metric using statistical weighting.
Project description
Composite Silhouette
Composite Silhouette is a Python package for robust clustering evaluation. It introduces a composite metric that combines micro-averaged and macro-averaged silhouette using statistical weighting. This provides a more nuanced assessment of clustering quality, helping identify the optimal number of clusters and compare performance across different clustering scenarios with greater confidence. The framework is especially useful for data scientists, ML engineers, and researchers who want reliable metrics for centroid-based clustering.
Overview and Methodology
In standard clustering evaluation, the silhouette coefficient is widely used to measure how well each data point fits into its cluster in terms of intra-cluster cohesion and inter-cluster separation. It can be aggregated as:
- Micro-average: The overall average silhouette score across all data points.
- Macro-average: The per-cluster average silhouette score.
(A detailed implementation can be found at ipavlopoulos/revisiting-silhouette-aggregation.)
Composite Silhouette merges these two perspectives using a statistically-driven weighting strategy. The method performs repeated subsampled clustering to compute both micro- and macro-averaged silhouette scores. A Wilcoxon signed-rank test is then applied to their paired differences across subsamples to determine if one consistently and significantly outperforms the other. The final score is the weighted combination of the sample averages of micro- and macro-averaged scores (w · Smicro + (1-w) · Smacro). This convex combination keeps the result within the range of the individual scores and ties it meaningfully to both. When a statistically significant difference is found, the dominant metric receives at least 75% of the total weight, with the exact proportion adjusted based on the mean difference across subsamples. The greater this difference, the more the weighting shifts in its favor, while the other still contributes proportionally—reflecting the relative strength of both perspectives. If no significant difference is found, both sample-average metrics are weighted equally.
Note: The current implementation uses K-Means for clustering, which pairs well with silhouette-based evaluation and repeated subsampling. While the method can be adapted to other clustering algorithms, it already offers meaningful, statistically grounded insights for centroid-based clustering tasks.
Installation
You can install Composite Silhouette from PyPI:
pip install composite-silhouette
or directly from the GitHub repository:
pip install git+https://github.com/semoglou/composite_silhouette.git
Quick Start
from composite_silhouette import CompSil
Evaluate a Range of Cluster Counts
from sklearn.datasets import make_blobs
# Generate synthetic 2D data
X, y_true = make_blobs(n_samples=2000, centers=4, cluster_std=1.1, random_state=42)
# Initialize the Composite Silhouette evaluation
cs = CompSil(
data=X, # ndarray or DataFrame
ground_truth=len(set(y_true)) # (Optional) for visual reference in plots
k_values=range(2, 11), # Evaluate cluster counts from 2 to 10
num_samples=500, # Number of random subsamples per k
sample_size=100, # Number of points in each subsample
random_state=42, # Ensures reproducibility
n_jobs=-1 # Use all available CPU cores for parallel computation
)
# Run the evaluation for all specified cluster counts
cs.evaluate()
# Retrieve a DataFrame summarizing the results for each k
results_df = cs.get_results_dataframe()
# Get the k with the highest composite silhouette score
best_k = cs.get_optimal_k()
# Plot the silhouette scores and highlight the best k
cs.plot_results()
Evaluate a Single Cluster Count
cs = CompSil(
data=X,
k_values=4
)
cs.evaluate()
# Access the final composite silhouette score directly
score = cs.score_
# Optionally, still access the full results DataFrame
results = cs.get_results_dataframe()
Examples and Notebooks
Additional usage examples and experimental results can be found in the results/ folder:
-
example.ipynb
Basic usage of Composite Silhouette on synthetic data. -
performance.ipynb
Composite silhouette evaluation results on both synthetic and real-world datasets.
These notebooks provide insight into the method's behavior and demonstrate how to apply it in practical settings.
License
This project is licensed under the MIT License.
Composite Silhouette · v0.1.0 · Last updated: 04/2025 · MIT License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file composite_silhouette-0.1.2.tar.gz.
File metadata
- Download URL: composite_silhouette-0.1.2.tar.gz
- Upload date:
- Size: 8.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
be82db80a6187780e447704d0e4ec4766e6e4a490e3165eb2a69f12d632ebeba
|
|
| MD5 |
5795ad2c761be87d430bd98bd191931a
|
|
| BLAKE2b-256 |
0d9abc6d5aefea4427f1fbb0ffc98b3e81e28618c404cfdf9f1c8573040dadfd
|
Provenance
The following attestation bundles were made for composite_silhouette-0.1.2.tar.gz:
Publisher:
python-publish.yml on semoglou/composite_silhouette
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
composite_silhouette-0.1.2.tar.gz -
Subject digest:
be82db80a6187780e447704d0e4ec4766e6e4a490e3165eb2a69f12d632ebeba - Sigstore transparency entry: 245667204
- Sigstore integration time:
-
Permalink:
semoglou/composite_silhouette@aeaf8190b622a8c0c89bf9a65fbf5bde1da1f09e -
Branch / Tag:
refs/tags/0.1.2 - Owner: https://github.com/semoglou
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@aeaf8190b622a8c0c89bf9a65fbf5bde1da1f09e -
Trigger Event:
release
-
Statement type:
File details
Details for the file composite_silhouette-0.1.2-py3-none-any.whl.
File metadata
- Download URL: composite_silhouette-0.1.2-py3-none-any.whl
- Upload date:
- Size: 8.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
60449fa79201713e78498178602840eed8bf04e04a2c7cc0b04357746c4eb3c9
|
|
| MD5 |
879f55620d63c5f1f43ae835655400b0
|
|
| BLAKE2b-256 |
0a6ba00e3b243524a77ce7082c7506c773899bc9c8d772c3603dd8952c49a0cd
|
Provenance
The following attestation bundles were made for composite_silhouette-0.1.2-py3-none-any.whl:
Publisher:
python-publish.yml on semoglou/composite_silhouette
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
composite_silhouette-0.1.2-py3-none-any.whl -
Subject digest:
60449fa79201713e78498178602840eed8bf04e04a2c7cc0b04357746c4eb3c9 - Sigstore transparency entry: 245667206
- Sigstore integration time:
-
Permalink:
semoglou/composite_silhouette@aeaf8190b622a8c0c89bf9a65fbf5bde1da1f09e -
Branch / Tag:
refs/tags/0.1.2 - Owner: https://github.com/semoglou
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@aeaf8190b622a8c0c89bf9a65fbf5bde1da1f09e -
Trigger Event:
release
-
Statement type: