Skip to main content

Exact and approximate silhouette scoring with micro, macro, and weighted cluster averages.

Project description

sil_score

PyPI Python License: MIT

sil-score is a small Python package for exact and fast approximate silhouette scoring.

It extends the usual silhouette workflow with:

  • per-sample silhouette scores
  • micro-averaged silhouette score
  • macro-averaged silhouette score
  • cluster-weighted macro silhouette score
  • exact vs approximate comparison report

The exact mode uses scikit-learn's silhouette_samples.
The approximate mode uses Euclidean distances to cluster centroids, making it faster but not identical to the classical silhouette definition.


Installation

Install from PyPI:

pip install sil-score

Quick example

import numpy as np
from sil_score import (
    sil_samples,
    micro_sil_score,
    macro_sil_score,
    weighted_macro_sil_score,
    sil_approximation_report,
)

X = np.array([
    [0.0],
    [2.0],
    [10.0],
    [12.0],
])

labels = np.array([0, 0, 1, 1])

samples = sil_samples(X, labels)
micro = micro_sil_score(X, labels)
macro = macro_sil_score(X, labels)

print(samples)
print(micro)
print(macro)

Output:

[0.81818182 0.77777778 0.77777778 0.81818182]
0.797979797979798
0.797979797979798

Functions

sil_samples

sil_samples(X, labels, approximation=False, centers=None)

Computes the silhouette score for each sample.

By default, it computes the exact silhouette values using scikit-learn.

scores = sil_samples(X, labels)

For a faster centroid-based approximation:

scores = sil_samples(X, labels, approximation=True)

You can also pass precomputed cluster centers:

scores = sil_samples(
    X,
    labels,
    approximation=True,
    centers=centers,
)

micro_sil_score

micro_sil_score(X, labels, approximation=False, centers=None)

Computes the mean of all sample-level silhouette scores. This is the usual average silhouette score. Larger clusters naturally have more influence because they contain more samples.

# Standard usage
score = micro_sil_score(X, labels)

# Approximate version
score = micro_sil_score(X, labels, approximation=True)

macro_sil_score

macro_sil_score(X, labels, approximation=False, centers=None)

Computes the mean silhouette score inside each cluster, then averages the cluster means equally. This gives every cluster the same importance, regardless of its size.

# Standard usage
score = macro_sil_score(X, labels)

# Approximate version
score = macro_sil_score(X, labels, approximation=True)

weighted_macro_sil_score

weighted_macro_sil_score(X, labels, cluster_weights, approximation=False, centers=None)

Computes a cluster-weighted macro silhouette score. First, it computes the mean silhouette score for each cluster, then combines those cluster means using custom cluster weights.

Using a dictionary:

weights = {
    0: 0.2,
    1: 0.3,
    2: 0.5,
}

score = weighted_macro_sil_score(X, labels, cluster_weights=weights)

Using an array:

weights = [0.2, 0.3, 0.5]

score = weighted_macro_sil_score(X, labels, cluster_weights=weights)

sil_approximation_report

sil_approximation_report(X, labels, centers=None, return_samples=False)

Compares exact silhouette scores with centroid-based approximate scores. It returns(Pearson) correlation and error metrics:

report = sil_approximation_report(X, labels)
print(report)

Example output:

{
    "correlation": 0.96,
    "mean_absolute_error": 0.03,
    "mean_squared_error": 0.002,
    "root_mean_squared_error": 0.045,
    "max_absolute_error": 0.12,
    "mean_error": 0.01,
    "mean_exact_score": 0.52,
    "mean_approximate_score": 0.53,
    "n_samples": 300,
}

Use return_samples=True to also include the exact scores, approximate scores, and per-sample errors.


Exact vs Approximate mode

  • Exact mode: sil_samples(X, labels, approximation=False). Uses the classical silhouette definition based on distances between samples.
  • Approximate mode: sil_samples(X, labels, approximation=True). Uses distances from each sample to cluster centroids. This can be significantly faster for larger datasets.

Requirements

sil-score depends on:

  • NumPy
  • scikit-learn

License

This project is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sil_score-0.1.4.tar.gz (6.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sil_score-0.1.4-py3-none-any.whl (6.3 kB view details)

Uploaded Python 3

File details

Details for the file sil_score-0.1.4.tar.gz.

File metadata

  • Download URL: sil_score-0.1.4.tar.gz
  • Upload date:
  • Size: 6.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for sil_score-0.1.4.tar.gz
Algorithm Hash digest
SHA256 3847fed3a536f4de522ddaf5783ed9811b9204698c33d0df479b11155585019a
MD5 ef9557685527f6ed882dd9c65cdb06cc
BLAKE2b-256 1adf56d874a3006404ac5991ff8be1a8d81ae80fe61d2c105d8948111a9345f8

See more details on using hashes here.

Provenance

The following attestation bundles were made for sil_score-0.1.4.tar.gz:

Publisher: python-publish.yml on semoglou/sil_score

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sil_score-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: sil_score-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 6.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for sil_score-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 ead0a9cac29c91743e323f640845439eb399ec1d85fedc1fba0058c42678bdef
MD5 1beb6d393813196fd5f0b2643a7bb2f9
BLAKE2b-256 5a2bbd17c192d82c53bf4d21be39a2e14662e4eec268eddb1ffd879ef087f0fb

See more details on using hashes here.

Provenance

The following attestation bundles were made for sil_score-0.1.4-py3-none-any.whl:

Publisher: python-publish.yml on semoglou/sil_score

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page