A Python implementation of the Directed Batch Growing Self-Organizing Map
Project description
DBGSOM
DBGSOM is short for Directed Batch Growing Self-Organizing Map. A SOM is a type of artificial neural network that is used to produce a low-dimensional representation of a higher-dimensional data set while preserving the topological structure of the data. It can be used for supervised and unsupervised vector quantization, classification and many different data visualization tasks.
Features
- Compatible with scikit-learn's API — drop-in replacement for other clustering and classification algorithms
- Can handle high-dimensional and non-uniform data distributions
- Good results without extensive parameter tuning
- Better topology preservation and faster training time than classical SOMs
- Interpretability of the results through interactive plotting
How it works
The DBGSOM algorithm builds a two-dimensional map of prototypes (neurons) where each neuron is connected to its neighbors. Four neurons are initialized with random weight vectors drawn from the input data. During training every sample is assigned to its nearest neuron (best matching unit), and the neuron weights are updated towards the samples mapped to them. Neighboring neurons influence each other's updates so that the low-dimensional ordering of the map is preserved. A growing mechanism expands the map as needed: new neurons are inserted at boundary positions where the quantization error exceeds a configurable growing threshold.
How to install
Download from PyPI
DBGSOM can be installed from PyPI via uv (recommended):
uv add dbgsom
or with pip:
pip install dbgsom
Install from source
Clone the repository and install with uv (recommended):
git clone https://github.com/SandroMartens/DBGSOM.git
cd DBGSOM
uv sync
Alternatively with pip:
git clone https://github.com/SandroMartens/DBGSOM.git
cd DBGSOM
pip install -e .
Usage
DBGSOM implements the scikit-learn API and provides two estimators:
| Class | Use case |
|---|---|
SomVQ |
Unsupervised clustering / vector quantization |
SomClassifier |
Supervised classification |
Clustering / Vector Quantization
from dbgsom import SomVQ
from sklearn.datasets import load_digits
X, y = load_digits(return_X_y=True)
vq = SomVQ(spreading_factor=0.5, max_neurons=80)
labels = vq.fit_predict(X)
print(f"Neurons: {len(vq.neurons_)}")
print(f"Quantization error: {vq.quantization_error_:.4f}")
print(f"Topographic error: {vq.topographic_error_:.4f}")
Classification
from dbgsom import SomClassifier
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
X, y = load_digits(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
clf = SomClassifier(spreading_factor=0.5, max_neurons=80)
clf.fit(X_train, y_train)
print(clf.score(X_test, y_test)) # accuracy
proba = clf.predict_proba(X_test) # class probabilities
Transform
Both estimators implement transform(), which represents each sample as a sparse non-negative linear combination of the prototype weight vectors:
coefs = vq.transform(X) # shape (n_samples, n_prototypes)
Visualization
plot() renders the SOM neurons as dots and the neighbourhood edges as grey lines, all via seaborn objects.
vq.plot(color="density") # continuous → colour gradient
clf.plot(color="label") # categorical → colour legend
vq.plot(color="hit_count", pointsize="error") # colour + size encoding
vq.plot(color="density", layout="pca", palette="magma_r")
vq.plot(color="pca_rgb") # RGB colour from PCA of weight vectors
Supported attributes for color / pointsize:
'label', 'epoch_created', 'error', 'average_distance', 'density', 'hit_count'
| Parameter | Options | Description |
|---|---|---|
color |
any node attribute | Numeric attributes → continuous colour scale; int/str with ≤ 20 unique values → legend |
pointsize |
any numeric attribute | Node size proportional to attribute value |
layout |
'grid' (default), 'pca' |
Node placement algorithm |
palette |
any Matplotlib colormap | Applied to the colour mapping |
Examples
| Example | Description |
|---|---|
| With two-dimensional input we can clearly see how the prototypes (red) approximate the input distribution (white) while preserving the square topology to their neighbors. | |
| After training on the Fashion-MNIST dataset we can plot the weigth of each prototype. Neighboring prototypes are pairwise similar. | |
| Each prototype is coloured by its majority class. Samples from the same class cluster together. Trained on MNIST digits. | |
| Linear transformations like PCA can colour-code relative distances between prototypes in the input space. See the darknet example notebook. |
Dependencies
- Python ≥ 3.12
- numpy
- numba
- NetworkX
- tqdm
- scikit-learn
- scikit-image
- seaborn
- matplotlib
- pandas
- scipy
References
- A directed batch growing approach to enhance the topology preservation of self-organizing map, Mahdi Vasighi and Homa Amini, 2017, http://dx.doi.org/10.1016/j.asoc.2017.02.015
- Reference implementation by the authors in Matlab: https://github.com/mvasighi/DBGSOM
- Statistics-enhanced Direct Batch Growth Self-Organizing Mapping for efficient DoS Attack Detection, Xiaofei Qu et al., 2019, 10.1109/ACCESS.2019.2922737
- Entropy-Defined Direct Batch Growing Hierarchical Self-Organizing Mapping for Efficient Network Anomaly Detection, Xiaofei Qu et al., 2021, 10.1109/ACCESS.2021.3064200
- Self-Organizing Maps, 3rd Edition, Teuvo Kohonen, 2003
- MATLAB Implementations and Applications of the Self-Organizing Map, Teuvo Kohonen, 2014
- Smoothed self-organizing map for robust clustering, P. D'Urso, L. De Giovanni and R. Massari, 2019, https://doi.org/10.1016/j.ins.2019.06.038
License
dbgsom is licensed under the MIT license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dbgsom-1.2.1.tar.gz.
File metadata
- Download URL: dbgsom-1.2.1.tar.gz
- Upload date:
- Size: 32.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.16 {"installer":{"name":"uv","version":"0.11.16","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7b5e509c9b50c005975d3b8ec2f08179d2641bc79396eaad93ecbce040f68299
|
|
| MD5 |
23d4f219e7baa9726bede55332ee25f9
|
|
| BLAKE2b-256 |
7b69ac483feb81333068b9f9a25ae85b0334a3d4f32fbb8e5b5b84e24c073e26
|
File details
Details for the file dbgsom-1.2.1-py3-none-any.whl.
File metadata
- Download URL: dbgsom-1.2.1-py3-none-any.whl
- Upload date:
- Size: 25.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.16 {"installer":{"name":"uv","version":"0.11.16","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e89fa8ae7051befc4628e36460b973d5d613b1254573b3fb97869ffa0516e8a4
|
|
| MD5 |
7b2c731accbc539818e8dcdd5e54cf4e
|
|
| BLAKE2b-256 |
28ef2cf9809de4da53537e9d47b0ace6d8cd6b45fe914ee3b8fc948580101aec
|