Skip to main content

A Python implementation of the Directed Batch Growing Self-Organizing Map

Project description

license readthedocs

DOI Python package Publish to PyPI CodeQL Advanced Ruff

DBGSOM

DBGSOM (Directed Batch Growing Self-Organizing Map): A Neural Network for Clustering, Classification, Nonlinear Projection/Manifold learning, Data Visualization.

The network automatically determines the number of prototypes needed to represent the data. Starting from 4 neurons, the map expands at boundary positions where quantization error exceeds a configurable threshold: no need to pre-specify cluster count. The result is a topology-preserving 2D grid where neighboring neurons represent similar inputs.

Features

  • No cluster count needed — map grows until quantization error falls below threshold; lambda_ controls sensitivity
  • sklearn-compatible — drop-in for KMeans, DBSCAN: implements fit_predict, transform, score, and predict_proba
  • Topology-preserving — related samples cluster as grid neighbors; topographic error < 5% on Digits
  • Faster than classical SOMs — batch learning rule trains on all samples per epoch (vs. online, sample-by-sample)
  • Built-in visualizationplot() renders neuron grid coloured by density, label, error or hit count.

How it works

In brief: Four neurons initialize → samples assigned to nearest neuron → weights update toward assigned samples → boundary neurons with high error spawn new neighbors → σ decays → repeat until max_neurons or n_iter reached. Neighboring neurons influence each other's weight update → topology preserved during training.

DBGSOM builds a 2D rectangular prototype map where each neuron connects to four neighbors. Four neurons init with random weights from input data. Each epoch: every sample is assigned to the nearest neuron (BMU); weights are updated toward mean of the mapped samples. A neighborhood function couples neighboring neurons so that low-dimensional map ordering is preserved; neighborhood width shrinks over time (global → local structure). A growing mechanism inserts new neurons at boundary positions where quantization error exceeds growing threshold.

How to install

Download from PyPI

Install from PyPI via uv (recommended):

uv add dbgsom

or with pip:

pip install dbgsom

Install from source

Clone and install with uv (recommended):

git clone https://github.com/SandroMartens/DBGSOM.git
cd DBGSOM
uv sync

Alternatively with pip:

git clone https://github.com/SandroMartens/DBGSOM.git
cd DBGSOM
pip install -e .

Usage

DBGSOM implements the scikit-learn API and provides two estimators:

Class Use case
SomVQ Unsupervised clustering / vector quantization
SomClassifier Supervised classification

Clustering / Vector Quantization

from dbgsom import SomVQ
from sklearn.datasets import load_digits

X, y = load_digits(return_X_y=True)

vq = SomVQ(lambda_=80.0, max_neurons=80)
labels = vq.fit_predict(X)

print(f"Neurons: {len(vq.neurons_)}")
print(f"Quantization error: {vq.quantization_error_:.4f}")
print(f"Topographic error:  {vq.topographic_error_:.4f}")

Key growth parameters:

Parameter Default Effect
lambda_ 115.0 Growing threshold — higher → fewer neurons
max_neurons 5 x sqrt(n_samples) Hard cap on neuron count
n_iter 500 Training epochs; growth only happens in first half

Classification

from dbgsom import SomClassifier
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split

X, y = load_digits(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

clf = SomClassifier(lambda_=80.0, max_neurons=80)
clf.fit(X_train, y_train)

print(clf.score(X_test, y_test))           # accuracy
proba = clf.predict_proba(X_test)          # class probabilities

Transform

Both estimators implement transform() — represents each sample as sparse non-negative linear combination of prototype weights:

coefs = vq.transform(X)   # shape (n_samples, n_prototypes)

Visualization

plot() renders SOM neurons as dots and neighborhood edges as grey lines via seaborn objects.

vq.plot(color="density")                       # continuous -> colour gradient
clf.plot(color="label")                        # categorical -> colour legend
vq.plot(color="hit_count", pointsize="error")  # colour + size encoding
vq.plot(color="density", layout="pca", palette="magma_r")

Supported attributes for color / pointsize: 'label', 'epoch_created', 'error', 'average_distance', 'density', 'hit_count'

Parameter Options Description
color any node attribute Numeric attributes → continuous colour scale; int/str with ≤ 20 unique values → legend
pointsize any numeric attribute Node size proportional to attribute value
layout 'grid' (default), 'pca' Node placement algorithm
palette any Matplotlib colormap Applied to colour mapping

Examples

Example Description
example 2D input: prototypes (red) approximate input distribution (white), square topology preserved.
The fashion mnist dataset Fashion-MNIST: weight of each prototype plotted; neighboring prototypes pairwise similar.
digits Each prototype coloured by majority class; same-class samples cluster together. Trained on MNIST digits.

Comparisons

SOM algorithm comparison (Digits, PCA projection)

SOM comparison

DBGSOM (dynamic grid, size determined automatically) vs. MiniSom and SuSi (fixed grids) vs. KMeans (no topology). All trained on same Digits embedding.

Clustering metrics (Digits dataset)

Clustering metrics

ARI, Silhouette, Davies-Bouldin, training time. All algorithms use same cluster count — determined automatically by DBGSOM.

Full benchmark notebooks:

Notebook What it shows
clustering_comparison.ipynb DBGSOM vs. KMeans, MiniBatchKMeans, AgglomerativeClustering on Iris and Digits
som_comparison.ipynb DBGSOM vs. MiniSom, SuSi on Digits and Fashion-MNIST (QE, TE, training time, scaling)
manifold_comparison.ipynb DBGSOM vs. Isomap, t-SNE, UMAP on MNIST: trustworthiness, continuity, folds/tears, runtime

Dependencies

  • Python >= 3.12
  • numpy
  • numba
  • NetworkX
  • tqdm
  • scikit-learn
  • seaborn
  • pandas

Citation

If you use DBGSOM in your research, please cite:

Martens, S. (2025). DBGSOM: A Python implementation of the Directed Batch Growing Self-Organizing Map. Zenodo. https://doi.org/10.5281/zenodo.20525611

References

  • A directed batch growing approach to enhance the topology preservation of self-organizing map, Mahdi Vasighi and Homa Amini, 2017, http://dx.doi.org/10.1016/j.asoc.2017.02.015
  • Reference implementation by the authors in Matlab: https://github.com/mvasighi/DBGSOM
  • Statistics-enhanced Direct Batch Growth Self-Organizing Mapping for efficient DoS Attack Detection, Xiaofei Qu et al., 2019, 10.1109/ACCESS.2019.2922737
  • Entropy-Defined Direct Batch Growing Hierarchical Self-Organizing Mapping for Efficient Network Anomaly Detection, Xiaofei Qu et al., 2021, 10.1109/ACCESS.2021.3064200
  • Self-Organizing Maps, 3rd Edition, Teuvo Kohonen, 2003
  • MATLAB Implementations and Applications of the Self-Organizing Map, Teuvo Kohonen, 2014
  • Smoothed self-organizing map for robust clustering, P. D'Urso, L. De Giovanni and R. Massari, 2019, https://doi.org/10.1016/j.ins.2019.06.038

License

dbgsom is licensed under MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dbgsom-1.3.0.tar.gz (41.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dbgsom-1.3.0-py3-none-any.whl (32.1 kB view details)

Uploaded Python 3

File details

Details for the file dbgsom-1.3.0.tar.gz.

File metadata

  • Download URL: dbgsom-1.3.0.tar.gz
  • Upload date:
  • Size: 41.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.22 {"installer":{"name":"uv","version":"0.11.22","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for dbgsom-1.3.0.tar.gz
Algorithm Hash digest
SHA256 873be768a364450bce5a7b4bc7e6742ec5f17a8449b7d6bb07b4296b9c3e414f
MD5 349753fa428048a479033d1e43d7ea49
BLAKE2b-256 0b71cec7ae2d1896309f243ff030583292fc90435f09c66fe2b2b6c7af2064ca

See more details on using hashes here.

File details

Details for the file dbgsom-1.3.0-py3-none-any.whl.

File metadata

  • Download URL: dbgsom-1.3.0-py3-none-any.whl
  • Upload date:
  • Size: 32.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.22 {"installer":{"name":"uv","version":"0.11.22","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for dbgsom-1.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0bc2f54e575dee330eaa2f0e012188cb7abe262e2b50ce15229918434929fb75
MD5 296fe2c0be466621e53d25c244a88c40
BLAKE2b-256 b1ed70ec92d6bd3011c7545dcbb8fe6ca6b35de94df2ac5964689309edcfc3b6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page