Skip to main content

A Python implementation of the Directed Batch Growing Self-Organizing Map

Project description

license CircleCI readthedocs AppVeyor Python package Upload Python Package

DBGSOM

DBGSOM is short for Directed Batch Growing Self-Organizing Map. A SOM is a type of artificial neural network that is used to to produce a low-dimensional representation of a higher dimensional data set while preserving the topological structure of the data. It can be used for supervised and unsupervised vector quantization, classification and many different data visualization tasks.

Features

  • Compatible with scikit-learn's API and can be used as a drop-in replacement for other clustering and classification algorithms
  • Can handle high-dimensional and non-uniform data distributions
  • Good results without parameter tuning
  • Better topology preservation and faster training time than classical SOMs
  • Interpretability of the results through plotting

How it works

The DBGSOM algorithm works by constructing a two-dimensional map of prototypes (neurons) where each neuron is connected to its neighbors. The first neurons on the map are initialized with random weights from the input data. The input data is then presented to the SOM. Each sample gets assigned to it's nearest neuron. The neuron weights are then updated to the samples that were mapped to each neuron. Neighboring neurons affect each others updates, so the low dimensional ordering of the map is preserved. The DBGSOM algorithm uses a growing mechanism to expand the map as needed. New neurons are added to the edge of the map where the quantization error of the boundary neurons is above a given growing threshold.

How to install

DBGSOM can be installed from PyPi via pip.

pip install DBGSOM

Usage

dbgsom implements the scikit-learn API. We have the SomClassifier and SomVQ for classification and clustering/vector quantization.

from dbgsom import SomVQ, SomClassifier
from sklearn.datasets import load_digits

digits_X, digits_y = load_digits(return_X_y=True)

quantizer = SomVQ()
classifier = SomClassifier()

quantizer.fit_predict(X=digits_X)
classifier.fit_predict(X=digits_X, y=digits_y)

Examples

Here are a few example use cases for DBGSOM.

Example Description
example With a two dimensional input we can clearly see how the protoypes (red) approximate the input distribution (white) while still preserving the square topology to their neighbors.
The fashion mnist dataset After training the SOM on the fashion mnist dataset we can plot the nearest neighbor of each prototype. We can see that the SOM ordered the prototypes in a way that neighboring prototypes are pairwise similar.
digits We can show the majority class each prototype represents. Samples from the same class are clustered together. The SOM was train on mnist digits.
darknet_pca We can use linear transformations like PCA to color code relative distances between prototypes in the input space. See darknet example notebook.

Dependencies

  • Python > 3.7
  • Numpy
  • NetworkX
  • tqdm
  • scikit-learn
  • seaborn
  • pandas

References

  • A directed batch growing approach to enhance the topology preservation of self-organizing map, Mahdi Vasighi and Homa Amini, 2017, http://dx.doi.org/10.1016/j.asoc.2017.02.015
  • Reference implementation by the authors in Matlab: https://github.com/mvasighi/DBGSOM
  • Statistics-enhanced Direct Batch Growth Self- organizing Mapping for efficient DoS Attack Detection, Xiaofei Qu et al., 2019, 10.1109/ACCESS.2019.2922737
  • Entropy-Defined Direct Batch Growing Hierarchical Self-Organizing Mapping for Efficient Network Anomaly Detection, Xiaofei Qu et al., 2021 10.1109/ACCESS.2021.3064200
  • Self-Organizing Maps, 3rd Edition, Teuvo Kohonen, 2003
  • MATLAB Implementations and Applications of the Self-Organizing Map, Teuvo Kohonen, 2014

License

dbgsom is licensed under the MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dbgsom-1.1.0.tar.gz (24.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dbgsom-1.1.0-py3-none-any.whl (29.4 kB view details)

Uploaded Python 3

File details

Details for the file dbgsom-1.1.0.tar.gz.

File metadata

  • Download URL: dbgsom-1.1.0.tar.gz
  • Upload date:
  • Size: 24.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.11.5

File hashes

Hashes for dbgsom-1.1.0.tar.gz
Algorithm Hash digest
SHA256 a830991e5e9c3869eb159e12f1d4efc685e4a228b65e683faf438d14621281fb
MD5 8a2a690d2aa9b61c4ea70a42ba85067e
BLAKE2b-256 256df40ca602c188cc97331687f59973e6d77c8835699f0b9c6fe6f5ddced66b

See more details on using hashes here.

File details

Details for the file dbgsom-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: dbgsom-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 29.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.11.5

File hashes

Hashes for dbgsom-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 794609dfdb7dfa9d922e8278007e4ab973d1f2394e2d08af3b73feaca4a50303
MD5 34b26cb518def074214c2e0e74dbea76
BLAKE2b-256 2195fd92aaa8ce82f1b8f8bf266d4f56423f11c7db102b9e502c17e0408b1451

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page