Skip to main content

The SPORE clustering algorithm.

Project description

SPORE

Skeleton Propagation Over Recalibrating Expansions — a graph-based clustering algorithm for arbitrary-shape, arbitrary-scale clusters.

SPORE

How it works

SPORE builds clusters in three stages:

  1. k-NN graph construction — a global nearest-neighbor graph is built (exact or approximate), with neighbor counts scaling as ~O(log N) by default.
  2. Variance-aware BFS expansion — clusters are seeded from densest points outward. Edges are accepted only if their distance is statistically consistent with the cluster's evolving internal distance distribution — mean and variance updated incrementally as expansion proceeds. This prevents bridging across low-density gaps while preserving irregular shapes.
  3. Reassignment — clusters below min_cluster_size are merged into nearby larger ones using a composite score weighing proximity, relative size, density, and angular isotropy, or labeled as noise.

Installation

pip install spore-clustering

Quick start

from spore_clustering import SPORE

labels = SPORE().fit_predict(X)

Key parameters

Parameter Description
expansion Z-score threshold controlling how aggressively clusters grow
neighborhood_percentile Bounded alternative to expansion; typical values: 25, 50, 75, 93.75
retention_rate Fraction of neighbors that must pass variance filter to continue expansion
min_cluster_size Minimum cluster size (int) or exponent for N-relative scaling (float)

See the full API reference for all parameters.

Reusing a precomputed neighbor index

dindex = SPORE.DataIndex(
    connectivity=k,
    neighbors=neighbors,
    dists=distances,
    dataset_scale=scale,
)

labels = SPORE(dindex=dindex, retention_rate=0.25).fit_predict(X)

Complexity

With approximate k-NN and default neighbor scaling (k ~ log N):

Phase Complexity
k-NN construction O(Nd log N)
BFS expansion O(N log N)
Reassignment O(Rd log N), RN

scikit-learn compatibility

SPORE follows standard scikit-learn estimator conventions: fit, fit_predict, get_params, set_params.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spore_clustering-0.1.0.tar.gz (202.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

spore_clustering-0.1.0-py3-none-any.whl (19.1 kB view details)

Uploaded Python 3

File details

Details for the file spore_clustering-0.1.0.tar.gz.

File metadata

  • Download URL: spore_clustering-0.1.0.tar.gz
  • Upload date:
  • Size: 202.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for spore_clustering-0.1.0.tar.gz
Algorithm Hash digest
SHA256 5fa454847ab89e351837ebc380da673a39c2db7cf3883d15eec18700b2143689
MD5 5eb15a7ffa17b95cc6e03101d448ab74
BLAKE2b-256 9c3c1538ab777f225bee0b0df35ef50983def890afa8868174785d7981e3d14f

See more details on using hashes here.

File details

Details for the file spore_clustering-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for spore_clustering-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 39ba312d44ec7b9aa372057d6d9743b3b0c05340ca08d141cdb0640f13cdeffd
MD5 e4b037fc6a2037e33abc1e771e7f8982
BLAKE2b-256 8c5e9312b63abb43b1b1d1c144aad0bc76bdd54fe744e988d9a588dda90f8834

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page