Skip to main content

A Python implementation of the Directed Batch Growing Self-Organizing Map

Project description

license readthedocs DOI Python package Publish to PyPI CodeQL Advanced Ruff

DBGSOM

Clustering that determines its own size — the map grows until the data fits. No k to specify.

DBGSOM (Directed Batch Growing Self-Organizing Map) is a clustering algorithm that automatically determines the number of prototypes needed to represent the data. Starting from 4 neurons, the map expands at boundary positions where quantization error exceeds a configurable threshold — no need to pre-specify cluster count. The result is a topology-preserving 2D grid where neighboring neurons represent similar inputs, usable for clustering, classification, and visualization.

Features

  • No cluster count needed — the map grows until quantization error falls below threshold; lambda_ controls sensitivity
  • sklearn-compatible — drop-in for KMeans, DBSCAN: implements fit_predict, transform, score, and predict_proba
  • Topology-preserving — related samples cluster as grid neighbors; topographic error < 5% on Digits
  • Faster than classical SOMs — batch learning rule trains on all samples per epoch (vs. online, sample-by-sample)
  • Built-in visualizationplot() renders the neuron grid coloured by density, label, error, or PCA-RGB

How it works

In brief: 4 neurons initialize → samples assign to nearest neuron → weights update toward assigned samples → boundary neurons with high error spawn new neighbors → repeat until error threshold met or max_neurons reached.

The DBGSOM algorithm builds a two-dimensional map of prototypes (neurons) where each neuron is connected to its neighbors. Four neurons are initialized with random weight vectors drawn from the input data. During training every sample is assigned to its nearest neuron (best matching unit), and the neuron weights are updated towards the samples mapped to them. Neighboring neurons influence each other's updates so that the low-dimensional ordering of the map is preserved. A growing mechanism expands the map as needed: new neurons are inserted at boundary positions where the quantization error exceeds a configurable growing threshold.

How to install

Download from PyPI

DBGSOM can be installed from PyPI via uv (recommended):

uv add dbgsom

or with pip:

pip install dbgsom

Install from source

Clone the repository and install with uv (recommended):

git clone https://github.com/SandroMartens/DBGSOM.git
cd DBGSOM
uv sync

Alternatively with pip:

git clone https://github.com/SandroMartens/DBGSOM.git
cd DBGSOM
pip install -e .

Usage

DBGSOM implements the scikit-learn API and provides two estimators:

Class Use case
SomVQ Unsupervised clustering / vector quantization
SomClassifier Supervised classification

Clustering / Vector Quantization

from dbgsom import SomVQ
from sklearn.datasets import load_digits

X, y = load_digits(return_X_y=True)

vq = SomVQ(lambda_=80.0, max_neurons=80)
labels = vq.fit_predict(X)

print(f"Neurons: {len(vq.neurons_)}")
print(f"Quantization error: {vq.quantization_error_:.4f}")
print(f"Topographic error:  {vq.topographic_error_:.4f}")

Key growth parameters:

Parameter Default Effect
lambda_ 115.0 Growing threshold — higher → fewer neurons
max_neurons 5 x sqrt(n_samples) Hard cap on neuron count
n_iter 500 Training epochs; growth only happens in first half

Classification

from dbgsom import SomClassifier
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split

X, y = load_digits(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

clf = SomClassifier(lambda_=80.0, max_neurons=80)
clf.fit(X_train, y_train)

print(clf.score(X_test, y_test))           # accuracy
proba = clf.predict_proba(X_test)          # class probabilities

Transform

Both estimators implement transform(), which represents each sample as a sparse non-negative linear combination of the prototype weight vectors:

coefs = vq.transform(X)   # shape (n_samples, n_prototypes)

Visualization

plot() renders the SOM neurons as dots and the neighbourhood edges as grey lines, all via seaborn objects.

vq.plot(color="density")                       # continuous -> colour gradient
clf.plot(color="label")                        # categorical -> colour legend
vq.plot(color="hit_count", pointsize="error")  # colour + size encoding
vq.plot(color="density", layout="pca", palette="magma_r")
vq.plot(color="pca_rgb")                       # RGB colour from PCA of weight vectors

Supported attributes for color / pointsize: 'label', 'epoch_created', 'error', 'average_distance', 'density', 'hit_count'

Parameter Options Description
color any node attribute Numeric attributes → continuous colour scale; int/str with ≤ 20 unique values → legend
pointsize any numeric attribute Node size proportional to attribute value
layout 'grid' (default), 'pca' Node placement algorithm
palette any Matplotlib colormap Applied to the colour mapping

Examples

Example Description
example With two-dimensional input we can clearly see how the prototypes (red) approximate the input distribution (white) while preserving the square topology to their neighbors.
The fashion mnist dataset After training on the Fashion-MNIST dataset we can plot the weight of each prototype. Neighboring prototypes are pairwise similar.
digits Each prototype is coloured by its majority class. Samples from the same class cluster together. Trained on MNIST digits.
darknet_pca Linear transformations like PCA can colour-code relative distances between prototypes in the input space. See the darknet example notebook.

Comparisons

SOM algorithm comparison (Digits, PCA projection)

SOM comparison

DBGSOM (dynamic grid, size determined automatically) vs. MiniSom and SuSi (fixed grids) vs. KMeans (no topology). All trained on the same Digits embedding.

Clustering metrics (Digits dataset)

Clustering metrics

ARI, Silhouette, Davies-Bouldin, and training time. All algorithms use the same number of clusters — the neuron count DBGSOM determined automatically.

Full benchmark notebooks:

Notebook What it shows
clustering_comparison.ipynb DBGSOM vs. KMeans, MiniBatchKMeans, AgglomerativeClustering on Iris and Digits
som_comparison.ipynb DBGSOM vs. MiniSom, SuSi on Digits and Fashion-MNIST (QE, TE, training time, scaling)
manifold_comparison.ipynb DBGSOM vs. Isomap, t-SNE, UMAP on MNIST: trustworthiness, continuity, folds/tears, runtime

Dependencies

  • Python >= 3.12
  • numpy
  • numba
  • NetworkX
  • tqdm
  • scikit-learn
  • seaborn
  • pandas

Citation

If you use DBGSOM in your research, please cite:

Martens, S. (2025). DBGSOM: A Python implementation of the Directed Batch Growing Self-Organizing Map. Zenodo. https://doi.org/10.5281/zenodo.20525611

References

  • A directed batch growing approach to enhance the topology preservation of self-organizing map, Mahdi Vasighi and Homa Amini, 2017, http://dx.doi.org/10.1016/j.asoc.2017.02.015
  • Reference implementation by the authors in Matlab: https://github.com/mvasighi/DBGSOM
  • Statistics-enhanced Direct Batch Growth Self-Organizing Mapping for efficient DoS Attack Detection, Xiaofei Qu et al., 2019, 10.1109/ACCESS.2019.2922737
  • Entropy-Defined Direct Batch Growing Hierarchical Self-Organizing Mapping for Efficient Network Anomaly Detection, Xiaofei Qu et al., 2021, 10.1109/ACCESS.2021.3064200
  • Self-Organizing Maps, 3rd Edition, Teuvo Kohonen, 2003
  • MATLAB Implementations and Applications of the Self-Organizing Map, Teuvo Kohonen, 2014
  • Smoothed self-organizing map for robust clustering, P. D'Urso, L. De Giovanni and R. Massari, 2019, https://doi.org/10.1016/j.ins.2019.06.038

License

dbgsom is licensed under the MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dbgsom-1.2.5.tar.gz (37.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dbgsom-1.2.5-py3-none-any.whl (29.7 kB view details)

Uploaded Python 3

File details

Details for the file dbgsom-1.2.5.tar.gz.

File metadata

  • Download URL: dbgsom-1.2.5.tar.gz
  • Upload date:
  • Size: 37.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.19 {"installer":{"name":"uv","version":"0.11.19","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for dbgsom-1.2.5.tar.gz
Algorithm Hash digest
SHA256 8244d21c37b82f423cfe1673030da07d3db98f78d7a20dd8532c65d9b8eeb849
MD5 bf174611213655c36e0730b2eb16293c
BLAKE2b-256 e0e0a89c85c3532b3a66b0d77129e3139a8693e71e25e1b912f2b1c11435dfc3

See more details on using hashes here.

File details

Details for the file dbgsom-1.2.5-py3-none-any.whl.

File metadata

  • Download URL: dbgsom-1.2.5-py3-none-any.whl
  • Upload date:
  • Size: 29.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.19 {"installer":{"name":"uv","version":"0.11.19","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for dbgsom-1.2.5-py3-none-any.whl
Algorithm Hash digest
SHA256 f99f53b7a5983f0a8cee0ed6f12ab48d1bf758ba70805f3a8a2b8ca059fee920
MD5 e696f3e1869dfd026c560a030428c8b5
BLAKE2b-256 c31bfbb7de961697e2c235fb1971796d3646784d07e5b2d2a384dd60615b537a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page