DS2L-SOM is a Local Density-based Simultaneous Two-level Algorithm for Topographic Clustering
Project description
DS2L-SOM
DS2L-SOM is a topological, density-based clustering algorithm. It combines a Self-Organizing Map (SOM) for prototype learning with Gaussian KDE density estimation and gradient ascent to detect clusters — without requiring the number of clusters to be specified in advance.
Based on the papers by Cabanes, Bennani and Fresneau (2008, 2012).
Usage
from ds2l_som.ds2lsom import DS2LSOM
model = DS2LSOM(n_prototypes=100, threshold=10)
model.fit(X)
labels = model.predict(X) # sequential integers; -1 = unassigned
# or short:
labels = model.fit_predict(X)
DS2LSOM implements the scikit-learn API (ClusterMixin, BaseEstimator) and is compatible with sklearn pipelines and GridSearchCV.
Parameters
| Parameter | Default | Description |
|---|---|---|
n_prototypes |
auto (10·√n) | Maximum number of SOM prototypes |
threshold |
1 | Minimum shared samples for a prototype edge |
sigma |
auto | Bandwidth for Gaussian KDE density estimation |
method |
"som" |
Quantizer backend: "som" (dbgsom) or "kmeans" |
model_args |
None |
Kwargs passed to the quantizer: {"init": {...}, "train": {...}} |
verbose |
False |
Print progress |
Example with custom SOM parameters:
model = DS2LSOM(
n_prototypes=100,
threshold=10,
model_args={"init": {"sigma_end": 1.0, "random_state": 42}},
)
Performance
Evaluated on load_digits (1797 samples, 64 features, 10 classes) using pairwise Rand and Jaccard index (as defined in the papers). DS2LSOM does not receive the true number of clusters.
| Algorithm | Clusters | Noise | Rand | Jaccard |
|---|---|---|---|---|
| DS2LSOM | 9 (auto) | 37 | 0.912 | 0.461 |
| KMeans (n=10 given) | 10 | 0 | 0.906 | 0.415 |
| Agglomerative (n=10 given) | 10 | 0 | 0.930 | 0.542 |
| HDBSCAN | 6 (auto) | 1244 | — | — |
HDBSCAN metrics are excluded: 69% noise points make the scores not comparable.
Scalability
Time and memory complexity: O(n · k) where k ≈ 10·√n (default heuristic), giving O(n^1.5) overall. The distance matrix (shape k × n) is the main memory bottleneck — practical limit is around 50 000–100 000 samples.
Installing
# Development
git clone https://github.com/SandroMartens/ds2l-som.git
cd ds2l-som
uv sync
# As a dependency in another project
uv add ds2l-som
References
- A Local Density-based Simultaneous Two-level Algorithm for Topographic Clustering, Guénaël Cabanes and Younès Bennani, 2008
- Enriched topological learning for cluster detection and visualization, Guénaël Cabanes, Younès Bennani and Dominique Fresneau, 2012
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ds2l_som-0.3.0b1.tar.gz.
File metadata
- Download URL: ds2l_som-0.3.0b1.tar.gz
- Upload date:
- Size: 1.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
89ca982b3e6bfe06d067cbd9fdadb3db7d4d75b7d9564363a64b3e4ba4f313a7
|
|
| MD5 |
78a32f407dfb513714f890b1ac7aba73
|
|
| BLAKE2b-256 |
b8a59a168e1daa031235c6eb47568bd9a0cb8d683b7f6699c10ccd5be2aa517f
|
Provenance
The following attestation bundles were made for ds2l_som-0.3.0b1.tar.gz:
Publisher:
publish.yml on SandroMartens/ds2lsom
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ds2l_som-0.3.0b1.tar.gz -
Subject digest:
89ca982b3e6bfe06d067cbd9fdadb3db7d4d75b7d9564363a64b3e4ba4f313a7 - Sigstore transparency entry: 1747523831
- Sigstore integration time:
-
Permalink:
SandroMartens/ds2lsom@aeff9ab205e3b688756b51f306da6a3bd47b0d11 -
Branch / Tag:
refs/tags/v0.3.0b1 - Owner: https://github.com/SandroMartens
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@aeff9ab205e3b688756b51f306da6a3bd47b0d11 -
Trigger Event:
push
-
Statement type:
File details
Details for the file ds2l_som-0.3.0b1-py3-none-any.whl.
File metadata
- Download URL: ds2l_som-0.3.0b1-py3-none-any.whl
- Upload date:
- Size: 43.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
487370bc11124842947081fb72e8e33b555bb51f8dc35ec37dd8228b0f4671af
|
|
| MD5 |
91ac6499ec50bb8a9ecc2c8a4b069a64
|
|
| BLAKE2b-256 |
a7f3793d242b1cbe2ee6d421e3ad88315771aa496f4a186f34bd349467e259e1
|
Provenance
The following attestation bundles were made for ds2l_som-0.3.0b1-py3-none-any.whl:
Publisher:
publish.yml on SandroMartens/ds2lsom
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ds2l_som-0.3.0b1-py3-none-any.whl -
Subject digest:
487370bc11124842947081fb72e8e33b555bb51f8dc35ec37dd8228b0f4671af - Sigstore transparency entry: 1747523899
- Sigstore integration time:
-
Permalink:
SandroMartens/ds2lsom@aeff9ab205e3b688756b51f306da6a3bd47b0d11 -
Branch / Tag:
refs/tags/v0.3.0b1 - Owner: https://github.com/SandroMartens
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@aeff9ab205e3b688756b51f306da6a3bd47b0d11 -
Trigger Event:
push
-
Statement type: