Skip to main content

Exact and approximate silhouette scoring with micro, macro, and weighted cluster averages.

Project description

sil_score

sil-score is a small Python package for exact and fast approximate silhouette scoring.

It extends the usual silhouette workflow with:

  • per-sample silhouette scores
  • micro-averaged silhouette score
  • macro-averaged silhouette score
  • cluster-weighted macro silhouette score
  • exact vs approximate comparison report

The exact mode uses scikit-learn's silhouette_samples.
The approximate mode uses Euclidean distances to cluster centroids, making it faster but not identical to the classical silhouette definition.


Installation

Install from PyPI:

pip install sil-score

Quick example

import numpy as np
from sil_score import (
    sil_samples,
    micro_sil_score,
    macro_sil_score,
    weighted_macro_sil_score,
    sil_approximation_report,
)

X = np.array([
    [0.0],
    [2.0],
    [10.0],
    [12.0],
])

labels = np.array([0, 0, 1, 1])

samples = sil_samples(X, labels)
micro = micro_sil_score(X, labels)
macro = macro_sil_score(X, labels)

print(samples)
print(micro)
print(macro)

Output:

[0.81818182 0.77777778 0.77777778 0.81818182]
0.797979797979798
0.797979797979798

Functions

sil_samples

sil_samples(X, labels, approximation=False, centers=None)

Computes the silhouette score for each sample.

By default, it computes the exact silhouette values using scikit-learn.

scores = sil_samples(X, labels)

For a faster centroid-based approximation:

scores = sil_samples(X, labels, approximation=True)

You can also pass precomputed cluster centers:

scores = sil_samples(
    X,
    labels,
    approximation=True,
    centers=centers,
)

micro_sil_score

micro_sil_score(X, labels, approximation=False, centers=None)

Computes the mean of all sample-level silhouette scores. This is the usual average silhouette score. Larger clusters naturally have more influence because they contain more samples.

# Standard usage
score = micro_sil_score(X, labels)

# Approximate version
score = micro_sil_score(X, labels, approximation=True)

macro_sil_score

macro_sil_score(X, labels, approximation=False, centers=None)

Computes the mean silhouette score inside each cluster, then averages the cluster means equally. This gives every cluster the same importance, regardless of its size.

# Standard usage
score = macro_sil_score(X, labels)

# Approximate version
score = macro_sil_score(X, labels, approximation=True)

weighted_macro_sil_score

weighted_macro_sil_score(X, labels, cluster_weights, approximation=False, centers=None)

Computes a cluster-weighted macro silhouette score. First, it computes the mean silhouette score for each cluster, then combines those cluster means using custom cluster weights.

Using a dictionary:

weights = {
    0: 0.2,
    1: 0.3,
    2: 0.5,
}

score = weighted_macro_sil_score(X, labels, cluster_weights=weights)

Using an array:

weights = [0.2, 0.3, 0.5]

score = weighted_macro_sil_score(X, labels, cluster_weights=weights)

sil_approximation_report

sil_approximation_report(X, labels, centers=None, return_samples=False)

Compares exact silhouette scores with centroid-based approximate scores. It returns(Pearson) correlation and error metrics:

report = sil_approximation_report(X, labels)
print(report)

Example output:

{
    "correlation": 0.96,
    "mean_absolute_error": 0.03,
    "mean_squared_error": 0.002,
    "root_mean_squared_error": 0.045,
    "max_absolute_error": 0.12,
    "mean_error": 0.01,
    "mean_exact_score": 0.52,
    "mean_approximate_score": 0.53,
    "n_samples": 300,
}

Use return_samples=True to also include the exact scores, approximate scores, and per-sample errors.


Exact vs Approximate mode

  • Exact mode: sil_samples(X, labels, approximation=False). Uses the classical silhouette definition based on distances between samples.
  • Approximate mode: sil_samples(X, labels, approximation=True). Uses distances from each sample to cluster centroids. This can be significantly faster for larger datasets.

Requirements

sil-score depends on:

  • NumPy
  • scikit-learn

License

This project is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sil_score-0.1.5.tar.gz (5.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sil_score-0.1.5-py3-none-any.whl (6.3 kB view details)

Uploaded Python 3

File details

Details for the file sil_score-0.1.5.tar.gz.

File metadata

  • Download URL: sil_score-0.1.5.tar.gz
  • Upload date:
  • Size: 5.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for sil_score-0.1.5.tar.gz
Algorithm Hash digest
SHA256 244ded5dc80148ce780a7e60b1f70739e01ad5537b5ec04b23c6d97974634b41
MD5 d5e218a8c2fd258c74935d44b343bfc2
BLAKE2b-256 9f7e738faa21b6812c1715f6319d6942bb444a6dca90ec7485dada66088b592d

See more details on using hashes here.

Provenance

The following attestation bundles were made for sil_score-0.1.5.tar.gz:

Publisher: python-publish.yml on semoglou/sil_score

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sil_score-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: sil_score-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 6.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for sil_score-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 bbdf608c42355c99de9db65f49cc54c75b7328f8374fde7bb5217c7df1ff7f6c
MD5 9f6efd3077ab89721d649211f76ff99c
BLAKE2b-256 b3efbfb23278db609a09b70e165116a62682d7f18c098679d33d0effa8fd3ed3

See more details on using hashes here.

Provenance

The following attestation bundles were made for sil_score-0.1.5-py3-none-any.whl:

Publisher: python-publish.yml on semoglou/sil_score

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page