Skip to main content

IntraSOM: Library for Self-Organizing Maps with missing data, hexagonal lattice and toroidal projection

Project description

IntraSOM


IntraSOM is a fully Python-based implementation of self-organizing maps (SOM) developed by the Integrated Technology for Rock and Fluid Analysis (InTRA) research center (https://www.usp.br/intra/). IntraSOM is built using Object-Oriented Programming and includes support for hexagonal grids, toroidal topologies, and a wide range of visualization tools to enhance the analysis, exploration, and classification of complex datasets. Furthermore, IntraSOM includes features for handling missing data during training and efficient clustering algorithms. This library aims to make Self-Organizing techniques more accessible to researchers and professionals in various fields by providing a comprehensive Python implementation of SOM and a framework for easily expanding and implementing other SOM-based algorithms.

-----

Framework


An visualization library

U-Matrix

U-Matrix with Samples Label

U-Matrix with Watermark Neuron Template

Component Plots

Clustering

Clustering with Merged Visualization

Clustering with Neuron Template


Structure

The structure of this library is based on the structure of the SOMPY library by Moosavi et al. (2014), with implementations of:

  • Training projected toroidal topology
  • Training on hexagonal lattice
  • Training with missing data
  • Data imputation
  • Loading a previously performed training
  • Module for evaluating semi-supervised training with ROC curve plotting
  • Module for plotting and calculating the U-matrix and component maps of the training
  • Saving training data
  • Generation of Training Report
  • Projection of new data onto a trained map
  • Clustering module for trained neurons using k-means and visualization of the results
  • Accelerated distance matrix calculation using matrix shifts
  • Parquet format for input and output of data and training results
  • Label plotting on the U-matrix
  • Implementation of representative sample analysis and visualization on the U-matrix

Documentation and Examples

For documented examples of usage of the functions and features of this library, please refer to the Jupyter Notebook: IntraSOM: Documented Examples
Note: This notebook is not loaded via GitHub due to its size, but it can be accessed through any IDE that supports Jupyter Notebooks.

Documented Examples in Jupyter Notebook

Open on Google Colab:

It is possible to access and visualize this notebook through Google Colab:
Open In Colab

To run the notebook inside your Google Drive and connect it to Google Colab, follow these steps:

  • Upload the notebook and the data files to your Google Drive.
  • Open Google Colab (https://colab.research.google.com/) in your web browser.
  • Click on "File" in the menu and select "Open Notebook".
  • In the "Notebook" tab, select the "Google Drive" option.
  • Navigate to the location where you uploaded the notebook file in your Google Drive and select it.
  • The notebook will open in Google Colab, and you will have access to your Google Drive files from within the notebook.

Here's an example of code you can use inside the notebook to access files in your Google Drive:

from google.colab import drive

# Mount Google Drive and Accept Connection
drive.mount('/content/drive')

# Access files in your Google Drive
file_path = '/content/drive/MyDrive/path/to/your/file.txt'

# Install Intrasom
!pip install intrasom

#Ignore versioning warnings

# Continue running the notebook
# Load dataframe
data = pd.read_excel(file_path+"data/Animais_missing.xlsx", index_col=0)

Access to Methods Docstrings

All functions in the IntraSOM library have documentation for input and output parameters in the form of Docstrings, which can be accessed using the Python help(...) built-in function.

Example:

>>> help(som_test.train)
Help on method train in module intrasom.intrasom:

train(bootstrap=False, bootstrap_proportion=0.8, n_job=-1, save=True, summary=True, dtypes='parquet', shared_memory=False, train_rough_len=None, train_rough_radiusin=None, train_rough_radiusfin=None, train_finetune_len=None, train_finetune_radiusin=None, train_finetune_radiusfin=None, train_len_factor=1, maxtrainlen=1000, history_plot=False, previous_epoch=False) method of intrasom.intrasom.SOM instance
    Class method for training the SOM object.
    
    Args:
        n_job: number of jobs to use and parallelize training.
    
        shared_memory: flag to enable shared memory.
    
        train_rough_len: number of iterations during rough training.
    
        train_rough_radiusin: initial BMU fetching radius during
            rough training.
    
        train_rough_radiusfin: BMU search final radius during
            rough training.
    
        train_finetune_len: number of iterations during fine training.
    
        train_finetune_radiusin: initial BMU scan radius during
            fine training.
    
        train_finetune_radiusfin: BMU search final radius during
            fine training.
    
        train_len_factor: factor that multiplies the values ​​of the training
            extension (rough, fine, etc)
    
        maxtrainlen: maximum value of desired interactions.
            Default: np.Inf (infinity).
    
    Returns:
        SOM object trained according to the chosen parameters.

Dependencies

The IntraSOM dependencies are:

Library Version
matplotlib 3.7.1
scipy 1.10.1
joblib 1.2.0
scikit-learn 1.2.2
pandas 2.0.1
tqdm 4.65.0
plotly 5.14.1
scikit-image 0.20.0
pyarrow 9.0.0
openpyxl 3.1.2
geopandas 0.13.0
shapely 2.0.1
ipywidgets 8.0.6

Instalation

Progress Bar

For the progress bar to work in Jupyter Notebook or JupyterLab:

pip install ipywidgets
jupyter nbextension enable --py widgetsnbextension

Repository

# Clone repository
git clone https://github.com/InTRA-USP/IntraSOM.git

# Access directory where IntraSOM is placed
cd IntraSOM

# Install setup.py
pip install setup.py

Pip

pip install intrasom

Citation

de Gouvêa, R. C. T., Gioria, R. dos S., Marques, G. R., & Carneiro, C. de C. (2023). IntraSOM: A comprehensive Python library for Self-Organizing Maps with hexagonal toroidal maps training and missing data handling. Software Impacts, 17, 100570. https://doi.org/10.1016/j.simpa.2023.100570

BibTeX

@article{DEGOUVEA2023100570,
title = {IntraSOM: A comprehensive Python library for Self-Organizing Maps with hexagonal toroidal maps training and missing data handling},
journal = {Software Impacts},
volume = {17},
pages = {100570},
year = {2023},
issn = {2665-9638},
doi = {https://doi.org/10.1016/j.simpa.2023.100570},
url = {https://www.sciencedirect.com/science/article/pii/S2665963823001070},
author = {Rodrigo César Teixeira {de Gouvêa} and Rafael dos Santos Gioria and Gustavo Rodovalho Marques and Cleyton de Carvalho Carneiro},
keywords = {Self-Organizing Maps, Python, Missing data, Imputation, Visualization},
abstract = {IntraSOM is a new Python library that implements Self-Organizing Maps (SOM). It supports hexagonal lattices, toroidal topology, and provides visualization tools for analyzing complex data sets. The library handles missing data during training and offers efficient clustering algorithms. IntraSOM aims to make SOM more accessible to researchers and practitioners by providing a comprehensive Python implementation. It has an expandable framework and can be integrated with other Python algorithms and libraries. The IntraSOM library is available on GitHub at (https://github.com/InTRA-USP/IntraSOM).}
}

Code Ocean Reproducibility Badge

Open in Code Ocean


Main Authors

Avatar
InTRA
Avatar
Rodrigo Gouvêa

Lattes
Avatar
Cleyton Carneiro

Lattes
Avatar
Rafael Gioria

Lattes
Avatar
Gustavo Rodovalho

Lattes

Acknowledgments

Avatar
USP
Avatar
PMI
Avatar
PPGEMin
Avatar
LCT - USP
Avatar
IGCe - USP
Avatar
ICMC - USP
Avatar
CeMEAI - USP

Thanks to people who directly or indirectly contributed to the development of this library:


Stephen Fraser VectORE Pty Ltd
Michel J Friedel University of Colorado/Univeristy of Hawaii
Carina Ulsen PMI/InTRA/LCT - USP
Jean Ferrari PMI/Intra - USP
Michele Kuroda Cepetro - Unicamp
Guilherme Barreto Universidade Federal do Ceará
Afonso Paiva Neto ICMC - USP
Cibele Russo ICMC - USP

License

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

IntraSOM-1.0.4.4.tar.gz (3.8 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

intrasom-1.0.4.4-py3-none-any.whl (3.8 MB view details)

Uploaded Python 3

IntraSOM-1.0.4.4-py3-none-any.whl (3.8 MB view details)

Uploaded Python 3

File details

Details for the file IntraSOM-1.0.4.4.tar.gz.

File metadata

  • Download URL: IntraSOM-1.0.4.4.tar.gz
  • Upload date:
  • Size: 3.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.2

File hashes

Hashes for IntraSOM-1.0.4.4.tar.gz
Algorithm Hash digest
SHA256 c44cc54c502964e336bd61073b16ea79b25cbfa6899d793f63cfea126c200f0c
MD5 d5b583d06de7dd55697b41f49da029c4
BLAKE2b-256 976baf8a494da8e97d7a4f95477380cda515f0a1bb799717bcd6dec5060674b5

See more details on using hashes here.

File details

Details for the file intrasom-1.0.4.4-py3-none-any.whl.

File metadata

  • Download URL: intrasom-1.0.4.4-py3-none-any.whl
  • Upload date:
  • Size: 3.8 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for intrasom-1.0.4.4-py3-none-any.whl
Algorithm Hash digest
SHA256 73fac5e59aaea84024386f94b5a5b3864a5d2bb50c3e305b28c5650802948bdd
MD5 315a056ff911d1885048d3766e2edb05
BLAKE2b-256 135df2d6ffd003961441a31715e53baba3e2766b5563176680a2b19a188a2257

See more details on using hashes here.

File details

Details for the file IntraSOM-1.0.4.4-py3-none-any.whl.

File metadata

  • Download URL: IntraSOM-1.0.4.4-py3-none-any.whl
  • Upload date:
  • Size: 3.8 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.2

File hashes

Hashes for IntraSOM-1.0.4.4-py3-none-any.whl
Algorithm Hash digest
SHA256 96a98fdb50f5dea5d5d877e2ef2f7ae3afcd515c53438ae30c289dd82bb979ed
MD5 a8c090b66fa347f1ac8313174c0cd336
BLAKE2b-256 d7c2035372e49e893866376d88d0aedfc1554df0734afd70f757abf9ebdba55f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page