Skip to main content

DS2L-SOM is a Local Density-based Simultaneous Two-level Algorithm for Topographic Clustering

Project description

License: MIT Ruff

DS2L-SOM

DS2L-SOM is a topological, density-based clustering algorithm. It combines a Self-Organizing Map (SOM) for prototype learning with Gaussian KDE density estimation and gradient ascent to detect clusters — without requiring the number of clusters to be specified in advance.

Based on the papers by Cabanes, Bennani and Fresneau (2008, 2012).

Usage

from ds2l_som.ds2lsom import DS2LSOM

model = DS2LSOM(n_prototypes=100, threshold=10)
model.fit(X)
labels = model.predict(X)   # sequential integers; -1 = unassigned

# or short:
labels = model.fit_predict(X)

DS2LSOM implements the scikit-learn API (ClusterMixin, BaseEstimator) and is compatible with sklearn pipelines and GridSearchCV.

Parameters

Parameter Default Description
n_prototypes auto (10·√n) Maximum number of SOM prototypes
threshold 1 Minimum shared samples for a prototype edge
sigma auto Bandwidth for Gaussian KDE density estimation
method "som" Quantizer backend: "som" (dbgsom) or "kmeans"
model_args None Kwargs passed to the quantizer: {"init": {...}, "train": {...}}
verbose False Print progress

Example with custom SOM parameters:

model = DS2LSOM(
    n_prototypes=100,
    threshold=10,
    model_args={"init": {"sigma_end": 1.0, "random_state": 42}},
)

Performance

Evaluated on load_digits (1797 samples, 64 features, 10 classes) using pairwise Rand and Jaccard index (as defined in the papers). DS2LSOM does not receive the true number of clusters.

Algorithm Clusters Noise Rand Jaccard
DS2LSOM 9 (auto) 37 0.912 0.461
KMeans (n=10 given) 10 0 0.906 0.415
Agglomerative (n=10 given) 10 0 0.930 0.542
HDBSCAN 6 (auto) 1244

HDBSCAN metrics are excluded: 69% noise points make the scores not comparable.

Scalability

Time and memory complexity: O(n · k) where k ≈ 10·√n (default heuristic), giving O(n^1.5) overall. The distance matrix (shape k × n) is the main memory bottleneck — practical limit is around 50 000–100 000 samples.

Installing

# Development
git clone https://github.com/SandroMartens/ds2l-som.git
cd ds2l-som
uv sync

# As a dependency in another project
uv add ds2l-som

References

  • A Local Density-based Simultaneous Two-level Algorithm for Topographic Clustering, Guénaël Cabanes and Younès Bennani, 2008
  • Enriched topological learning for cluster detection and visualization, Guénaël Cabanes, Younès Bennani and Dominique Fresneau, 2012

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ds2l_som-0.3.0b1.tar.gz (1.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ds2l_som-0.3.0b1-py3-none-any.whl (43.8 kB view details)

Uploaded Python 3

File details

Details for the file ds2l_som-0.3.0b1.tar.gz.

File metadata

  • Download URL: ds2l_som-0.3.0b1.tar.gz
  • Upload date:
  • Size: 1.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ds2l_som-0.3.0b1.tar.gz
Algorithm Hash digest
SHA256 89ca982b3e6bfe06d067cbd9fdadb3db7d4d75b7d9564363a64b3e4ba4f313a7
MD5 78a32f407dfb513714f890b1ac7aba73
BLAKE2b-256 b8a59a168e1daa031235c6eb47568bd9a0cb8d683b7f6699c10ccd5be2aa517f

See more details on using hashes here.

Provenance

The following attestation bundles were made for ds2l_som-0.3.0b1.tar.gz:

Publisher: publish.yml on SandroMartens/ds2lsom

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ds2l_som-0.3.0b1-py3-none-any.whl.

File metadata

  • Download URL: ds2l_som-0.3.0b1-py3-none-any.whl
  • Upload date:
  • Size: 43.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ds2l_som-0.3.0b1-py3-none-any.whl
Algorithm Hash digest
SHA256 487370bc11124842947081fb72e8e33b555bb51f8dc35ec37dd8228b0f4671af
MD5 91ac6499ec50bb8a9ecc2c8a4b069a64
BLAKE2b-256 a7f3793d242b1cbe2ee6d421e3ad88315771aa496f4a186f34bd349467e259e1

See more details on using hashes here.

Provenance

The following attestation bundles were made for ds2l_som-0.3.0b1-py3-none-any.whl:

Publisher: publish.yml on SandroMartens/ds2lsom

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page