A Python implementation of the Directed Batch Growing Self-Organizing Map
Project description
DBGSOM
DBGSOM (Directed Batch Growing Self-Organizing Map): A Neural Network for Clustering, Classification, Nonlinear Projection/Manifold learning, Data Visualization.
The network automatically determines the number of prototypes needed to represent the data. Starting from 4 neurons, the map expands at boundary positions where quantization error exceeds a configurable threshold: no need to pre-specify cluster count. The result is a topology-preserving 2D grid where neighboring neurons represent similar inputs.
Features
- No cluster count needed — map grows until quantization error falls below threshold;
lambda_controls sensitivity - sklearn-compatible — drop-in for
KMeans,DBSCAN: implementsfit_predict,transform,score, andpredict_proba - Topology-preserving — related samples cluster as grid neighbors; topographic error < 5% on Digits
- Faster than classical SOMs — batch learning rule trains on all samples per epoch (vs. online, sample-by-sample)
- Built-in visualization —
plot()renders neuron grid coloured by density, label, error or hit count.
How it works
In brief: Four neurons initialize → samples assigned to nearest neuron → weights update toward assigned samples → boundary neurons with high error spawn new neighbors → σ decays → repeat until max_neurons or n_iter reached. Neighboring neurons influence each other's weight update → topology preserved during training.
DBGSOM builds a 2D rectangular prototype map where each neuron connects to four neighbors. Four neurons init with random weights from input data. Each epoch: every sample is assigned to the nearest neuron (BMU); weights are updated toward mean of the mapped samples. A neighborhood function couples neighboring neurons so that low-dimensional map ordering is preserved; neighborhood width shrinks over time (global → local structure). A growing mechanism inserts new neurons at boundary positions where quantization error exceeds growing threshold.
How to install
Download from PyPI
Install from PyPI via uv (recommended):
uv add dbgsom
or with pip:
pip install dbgsom
Install from source
Clone and install with uv (recommended):
git clone https://github.com/SandroMartens/DBGSOM.git
cd DBGSOM
uv sync
Alternatively with pip:
git clone https://github.com/SandroMartens/DBGSOM.git
cd DBGSOM
pip install -e .
Usage
DBGSOM implements the scikit-learn API and provides two estimators:
| Class | Use case |
|---|---|
SomVQ |
Unsupervised clustering / vector quantization |
SomClassifier |
Supervised classification |
Clustering / Vector Quantization
from dbgsom import SomVQ
from sklearn.datasets import load_digits
X, y = load_digits(return_X_y=True)
vq = SomVQ(lambda_=80.0, max_neurons=80)
labels = vq.fit_predict(X)
print(f"Neurons: {len(vq.neurons_)}")
print(f"Quantization error: {vq.quantization_error_:.4f}")
print(f"Topographic error: {vq.topographic_error_:.4f}")
Key growth parameters:
| Parameter | Default | Effect |
|---|---|---|
lambda_ |
115.0 | Growing threshold — higher → fewer neurons |
max_neurons |
5 x sqrt(n_samples) |
Hard cap on neuron count |
n_iter |
500 | Training epochs; growth only happens in first half |
Classification
from dbgsom import SomClassifier
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
X, y = load_digits(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
clf = SomClassifier(lambda_=80.0, max_neurons=80)
clf.fit(X_train, y_train)
print(clf.score(X_test, y_test)) # accuracy
proba = clf.predict_proba(X_test) # class probabilities
Transform
Both estimators implement transform() — represents each sample as sparse non-negative linear combination of prototype weights:
coefs = vq.transform(X) # shape (n_samples, n_prototypes)
Visualization
plot() renders SOM neurons as dots and neighborhood edges as grey lines via seaborn objects.
vq.plot(color="density") # continuous -> colour gradient
clf.plot(color="label") # categorical -> colour legend
vq.plot(color="hit_count", pointsize="error") # colour + size encoding
vq.plot(color="density", layout="pca", palette="magma_r")
Supported attributes for color / pointsize:
'label', 'epoch_created', 'error', 'average_distance', 'density', 'hit_count'
| Parameter | Options | Description |
|---|---|---|
color |
any node attribute | Numeric attributes → continuous colour scale; int/str with ≤ 20 unique values → legend |
pointsize |
any numeric attribute | Node size proportional to attribute value |
layout |
'grid' (default), 'pca' |
Node placement algorithm |
palette |
any Matplotlib colormap | Applied to colour mapping |
Examples
| Example | Description |
|---|---|
| 2D input: prototypes (red) approximate input distribution (white), square topology preserved. | |
| Fashion-MNIST: weight of each prototype plotted; neighboring prototypes pairwise similar. | |
| Each prototype coloured by majority class; same-class samples cluster together. Trained on MNIST digits. |
Comparisons
SOM algorithm comparison (Digits, PCA projection)
DBGSOM (dynamic grid, size determined automatically) vs. MiniSom and SuSi (fixed grids) vs. KMeans (no topology). All trained on same Digits embedding.
Clustering metrics (Digits dataset)
ARI, Silhouette, Davies-Bouldin, training time. All algorithms use same cluster count — determined automatically by DBGSOM.
Full benchmark notebooks:
| Notebook | What it shows |
|---|---|
clustering_comparison.ipynb |
DBGSOM vs. KMeans, MiniBatchKMeans, AgglomerativeClustering on Iris and Digits |
som_comparison.ipynb |
DBGSOM vs. MiniSom, SuSi on Digits and Fashion-MNIST (QE, TE, training time, scaling) |
manifold_comparison.ipynb |
DBGSOM vs. Isomap, t-SNE, UMAP on MNIST: trustworthiness, continuity, folds/tears, runtime |
Dependencies
- Python >= 3.12
- numpy
- numba
- NetworkX
- tqdm
- scikit-learn
- seaborn
- pandas
Citation
If you use DBGSOM in your research, please cite:
Martens, S. (2025). DBGSOM: A Python implementation of the Directed Batch Growing Self-Organizing Map. Zenodo. https://doi.org/10.5281/zenodo.20525611
References
- A directed batch growing approach to enhance the topology preservation of self-organizing map, Mahdi Vasighi and Homa Amini, 2017, http://dx.doi.org/10.1016/j.asoc.2017.02.015
- Reference implementation by the authors in Matlab: https://github.com/mvasighi/DBGSOM
- Statistics-enhanced Direct Batch Growth Self-Organizing Mapping for efficient DoS Attack Detection, Xiaofei Qu et al., 2019, 10.1109/ACCESS.2019.2922737
- Entropy-Defined Direct Batch Growing Hierarchical Self-Organizing Mapping for Efficient Network Anomaly Detection, Xiaofei Qu et al., 2021, 10.1109/ACCESS.2021.3064200
- Self-Organizing Maps, 3rd Edition, Teuvo Kohonen, 2003
- MATLAB Implementations and Applications of the Self-Organizing Map, Teuvo Kohonen, 2014
- Smoothed self-organizing map for robust clustering, P. D'Urso, L. De Giovanni and R. Massari, 2019, https://doi.org/10.1016/j.ins.2019.06.038
License
dbgsom is licensed under MIT license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dbgsom-1.3.0.tar.gz.
File metadata
- Download URL: dbgsom-1.3.0.tar.gz
- Upload date:
- Size: 41.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.22 {"installer":{"name":"uv","version":"0.11.22","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
873be768a364450bce5a7b4bc7e6742ec5f17a8449b7d6bb07b4296b9c3e414f
|
|
| MD5 |
349753fa428048a479033d1e43d7ea49
|
|
| BLAKE2b-256 |
0b71cec7ae2d1896309f243ff030583292fc90435f09c66fe2b2b6c7af2064ca
|
File details
Details for the file dbgsom-1.3.0-py3-none-any.whl.
File metadata
- Download URL: dbgsom-1.3.0-py3-none-any.whl
- Upload date:
- Size: 32.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.22 {"installer":{"name":"uv","version":"0.11.22","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0bc2f54e575dee330eaa2f0e012188cb7abe262e2b50ce15229918434929fb75
|
|
| MD5 |
296fe2c0be466621e53d25c244a88c40
|
|
| BLAKE2b-256 |
b1ed70ec92d6bd3011c7545dcbb8fe6ca6b35de94df2ac5964689309edcfc3b6
|