Skip to main content

A robust clustering evaluation framework that combines micro- and macro-averaged silhouette scores into a composite metric using statistical weighting.

Project description

Composite Silhouette

Composite Silhouette is a Python package for robust clustering evaluation. It introduces a composite metric that combines micro-averaged and macro-averaged silhouette using statistical weighting. This provides a more nuanced assessment of clustering quality, helping identify the optimal number of clusters and compare performance across different clustering scenarios with greater confidence. The framework is especially useful for data scientists, ML engineers, and researchers who want reliable metrics for centroid-based clustering.

Overview and Methodology

In standard clustering evaluation, the silhouette coefficient is widely used to measure how well each data point fits into its cluster in terms of intra-cluster cohesion and inter-cluster separation. It can be aggregated as:

Composite Silhouette merges these two perspectives using a statistically-driven weighting strategy. The method performs repeated subsampled clustering to compute both micro- and macro-averaged silhouette scores. A Wilcoxon signed-rank test is then applied to their paired differences across subsamples to determine if one consistently and significantly outperforms the other. The final score is the weighted combination of the sample averages of micro- and macro-averaged scores (w · Smicro + (1-w) · Smacro). This convex combination keeps the result within the range of the individual scores and ties it meaningfully to both. When a statistically significant difference is found, the dominant metric receives at least 75% of the total weight, with the exact proportion adjusted based on the mean difference across subsamples. The greater this difference, the more the weighting shifts in its favor, while the other still contributes proportionally—reflecting the relative strength of both perspectives. If no significant difference is found, both sample-average metrics are weighted equally.

Note: The current implementation uses K-Means for clustering, which pairs well with silhouette-based evaluation and repeated subsampling. While the method can be adapted to other clustering algorithms, it already offers meaningful, statistically grounded insights for centroid-based clustering tasks.

Installation

You can install Composite Silhouette from PyPI:

pip install composite-silhouette

or directly from the GitHub repository:

pip install git+https://github.com/semoglou/composite_silhouette.git

Quick Start

from composite_silhouette import CompSil

Evaluate a Range of Cluster Counts

from sklearn.datasets import make_blobs

# Generate synthetic 2D data
X, y_true = make_blobs(n_samples=2000, centers=4, cluster_std=1.1, random_state=42)

# Initialize the Composite Silhouette evaluation
cs = CompSil(
    data=X,                        # ndarray or DataFrame
    ground_truth=len(set(y_true))  # (Optional) for visual reference in plots
    k_values=range(2, 11),         # Evaluate cluster counts from 2 to 10
    num_samples=500,               # Number of random subsamples per k
    sample_size=100,               # Number of points in each subsample
    random_state=42,               # Ensures reproducibility
    n_jobs=-1                      # Use all available CPU cores for parallel computation
)

# Run the evaluation for all specified cluster counts
cs.evaluate()

# Retrieve a DataFrame summarizing the results for each k
results_df = cs.get_results_dataframe()

# Get the k with the highest composite silhouette score
best_k = cs.get_optimal_k()

# Plot the silhouette scores and highlight the best k
cs.plot_results()

Composite Silhouette Plot

Evaluate a Single Cluster Count

cs = CompSil(
    data=X,
    k_values=4
)

cs.evaluate()

# Access the final composite silhouette score directly
score = cs.score_

# Optionally, still access the full results DataFrame
results = cs.get_results_dataframe()

Examples and Notebooks

Additional usage examples and experimental results can be found in the results/ folder:

  • example.ipynb
    Basic usage of Composite Silhouette on synthetic data.

  • performance.ipynb
    Composite silhouette evaluation results on both synthetic and real-world datasets.

These notebooks provide insight into the method's behavior and demonstrate how to apply it in practical settings.

License

This project is licensed under the MIT License.

Composite Silhouette · v0.1.0 · Last updated: 04/2025 · MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

composite_silhouette-0.1.2.tar.gz (8.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

composite_silhouette-0.1.2-py3-none-any.whl (8.9 kB view details)

Uploaded Python 3

File details

Details for the file composite_silhouette-0.1.2.tar.gz.

File metadata

  • Download URL: composite_silhouette-0.1.2.tar.gz
  • Upload date:
  • Size: 8.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for composite_silhouette-0.1.2.tar.gz
Algorithm Hash digest
SHA256 be82db80a6187780e447704d0e4ec4766e6e4a490e3165eb2a69f12d632ebeba
MD5 5795ad2c761be87d430bd98bd191931a
BLAKE2b-256 0d9abc6d5aefea4427f1fbb0ffc98b3e81e28618c404cfdf9f1c8573040dadfd

See more details on using hashes here.

Provenance

The following attestation bundles were made for composite_silhouette-0.1.2.tar.gz:

Publisher: python-publish.yml on semoglou/composite_silhouette

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file composite_silhouette-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for composite_silhouette-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 60449fa79201713e78498178602840eed8bf04e04a2c7cc0b04357746c4eb3c9
MD5 879f55620d63c5f1f43ae835655400b0
BLAKE2b-256 0a6ba00e3b243524a77ce7082c7506c773899bc9c8d772c3603dd8952c49a0cd

See more details on using hashes here.

Provenance

The following attestation bundles were made for composite_silhouette-0.1.2-py3-none-any.whl:

Publisher: python-publish.yml on semoglou/composite_silhouette

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page