Skip to main content

Exact and approximate silhouette scoring with micro, macro, and weighted cluster averages.

Project description

sil-score

PyPI version   Python versions   License: MIT   Downloads

sil-score is a small Python package for exact and fast approximate silhouette scoring.

It extends the usual silhouette workflow with:

  • per-sample silhouette scores
  • micro-averaged silhouette score
  • macro-averaged silhouette score
  • cluster-weighted macro silhouette score
  • exact vs approximate comparison report

The exact mode uses scikit-learn's silhouette_samples.
The approximate mode uses Euclidean distances to cluster centroids, making it faster but not identical to the classical silhouette definition.


Installation

Install from PyPI:

pip install sil-score

Quick example

import numpy as np
from sil_score import (
    sil_samples,
    micro_sil_score,
    macro_sil_score,
    weighted_macro_sil_score,
    sil_approximation_report,
)

X = np.array([
    [0.0],
    [2.0],
    [10.0],
    [12.0],
])

labels = np.array([0, 0, 1, 1])

samples = sil_samples(X, labels)
micro = micro_sil_score(X, labels)
macro = macro_sil_score(X, labels)

print(samples)
print(micro)
print(macro)

Output:

[0.81818182 0.77777778 0.77777778 0.81818182]
0.797979797979798
0.797979797979798

Functions

sil_samples

sil_samples(X, labels, approximation=False, centers=None)

Computes the silhouette score for each sample.

By default, it computes the exact silhouette values using scikit-learn.

scores = sil_samples(X, labels)

For a faster centroid-based approximation:

scores = sil_samples(X, labels, approximation=True)

You can also pass precomputed cluster centers:

scores = sil_samples(
    X,
    labels,
    approximation=True,
    centers=centers,
)

micro_sil_score

micro_sil_score(X, labels, approximation=False, centers=None)

Computes the mean of all sample-level silhouette scores. This is the usual average silhouette score. Larger clusters naturally have more influence because they contain more samples.

# Standard usage
score = micro_sil_score(X, labels)

# Approximate version
score = micro_sil_score(X, labels, approximation=True)

macro_sil_score

macro_sil_score(X, labels, approximation=False, centers=None)

Computes the mean silhouette score inside each cluster, then averages the cluster means equally. This gives every cluster the same importance, regardless of its size.

# Standard usage
score = macro_sil_score(X, labels)

# Approximate version
score = macro_sil_score(X, labels, approximation=True)

weighted_macro_sil_score

weighted_macro_sil_score(X, labels, cluster_weights, approximation=False, centers=None)

Computes a cluster-weighted macro silhouette score. First, it computes the mean silhouette score for each cluster, then combines those cluster means using custom cluster weights.

Using a dictionary:

weights = {
    0: 0.2,
    1: 0.3,
    2: 0.5,
}

score = weighted_macro_sil_score(X, labels, cluster_weights=weights)

Using an array:

weights = [0.2, 0.3, 0.5]

score = weighted_macro_sil_score(X, labels, cluster_weights=weights)

sil_approximation_report

sil_approximation_report(X, labels, centers=None, return_samples=False)

Compares exact silhouette scores with centroid-based approximate scores. It returns (Pearson) correlation and error metrics:

report = sil_approximation_report(X, labels)
print(report)

Example output:

{
    "correlation": 0.96,
    "mean_absolute_error": 0.03,
    "mean_squared_error": 0.002,
    "root_mean_squared_error": 0.045,
    "max_absolute_error": 0.12,
    "mean_error": 0.01,
    "mean_exact_score": 0.52,
    "mean_approximate_score": 0.53,
    "n_samples": 300,
}

Use return_samples=True to also include the exact scores, approximate scores, and per-sample errors.


Exact vs Approximate mode

  • Exact mode: sil_samples(X, labels, approximation=False). Uses the classical silhouette definition based on distances between samples.
  • Approximate mode: sil_samples(X, labels, approximation=True). Uses distances from each sample to cluster centroids. This can be significantly faster for larger datasets.

Requirements

sil-score depends on:

  • NumPy
  • scikit-learn

License

This project is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sil_score-0.1.6.tar.gz (6.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sil_score-0.1.6-py3-none-any.whl (6.5 kB view details)

Uploaded Python 3

File details

Details for the file sil_score-0.1.6.tar.gz.

File metadata

  • Download URL: sil_score-0.1.6.tar.gz
  • Upload date:
  • Size: 6.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for sil_score-0.1.6.tar.gz
Algorithm Hash digest
SHA256 d091bcc5e93bf8455bebd95d9f8ce314eef8496396f3361e0ca7f60db51626fc
MD5 5b82277eb6c318344dd11a69d7c5731c
BLAKE2b-256 8d9ca340b81a3e71ef7ba81be431bf85f6e3a44afbb4517b1ad98d09fb261b63

See more details on using hashes here.

Provenance

The following attestation bundles were made for sil_score-0.1.6.tar.gz:

Publisher: python-publish.yml on semoglou/sil_score

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sil_score-0.1.6-py3-none-any.whl.

File metadata

  • Download URL: sil_score-0.1.6-py3-none-any.whl
  • Upload date:
  • Size: 6.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for sil_score-0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 d4d52b6ebe22c9f5aa7723dc655befc3e7611d2d72af9f415ffbc6d7da3b1aa2
MD5 f5707be9773ea307ca3622feb221060d
BLAKE2b-256 8531ee4b464c50165cba7903bcf628b06ba6d0abe19e20d941249e8000a0717e

See more details on using hashes here.

Provenance

The following attestation bundles were made for sil_score-0.1.6-py3-none-any.whl:

Publisher: python-publish.yml on semoglou/sil_score

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page