Skip to main content

IntraSOM: Library for Self-Organizing Maps with missing data, hexagonal lattice and toroidal projection

Project description

IntraSOM


IntraSOM is a fully Python-based implementation of self-organizing maps (SOM) developed by the Integrated Technology for Rock and Fluid Analysis (InTRA) research center (https://www.usp.br/intra/). IntraSOM is built using Object-Oriented Programming and includes support for hexagonal grids, toroidal topologies, and a wide range of visualization tools to enhance the analysis, exploration, and classification of complex datasets. Furthermore, IntraSOM includes features for handling missing data during training and efficient clustering algorithms. This library aims to make Self-Organizing techniques more accessible to researchers and professionals in various fields by providing a comprehensive Python implementation of SOM and a framework for easily expanding and implementing other SOM-based algorithms.

-----

Framework


An visualization library

U-Matrix

U-Matrix with Samples Label

U-Matrix with Watermark Neuron Template

Component Plots

Clustering

Clustering with Merged Visualization

Clustering with Neuron Template


Structure

The structure of this library is based on the structure of the SOMPY library by Moosavi et al. (2014), with implementations of:

  • Training projected toroidal topology
  • Training on hexagonal lattice
  • Training with missing data
  • Data imputation
  • Loading a previously performed training
  • Module for evaluating semi-supervised training with ROC curve plotting
  • Module for plotting and calculating the U-matrix and component maps of the training
  • Saving training data
  • Generation of Training Report
  • Projection of new data onto a trained map
  • Clustering module for trained neurons using k-means and visualization of the results
  • Accelerated distance matrix calculation using matrix shifts
  • Parquet format for input and output of data and training results
  • Label plotting on the U-matrix
  • Implementation of representative sample analysis and visualization on the U-matrix

Documentation and Examples

For documented examples of usage of the functions and features of this library, please refer to the Jupyter Notebook: IntraSOM: Documented Examples
Note: This notebook is not loaded via GitHub due to its size, but it can be accessed through any IDE that supports Jupyter Notebooks.

Documented Examples in Jupyter Notebook

Open on Google Colab:

It is possible to access and visualize this notebook through Google Colab:
Open In Colab

To run the notebook inside your Google Drive and connect it to Google Colab, follow these steps:

  • Upload the notebook and the data files to your Google Drive.
  • Open Google Colab (https://colab.research.google.com/) in your web browser.
  • Click on "File" in the menu and select "Open Notebook".
  • In the "Notebook" tab, select the "Google Drive" option.
  • Navigate to the location where you uploaded the notebook file in your Google Drive and select it.
  • The notebook will open in Google Colab, and you will have access to your Google Drive files from within the notebook.

Here's an example of code you can use inside the notebook to access files in your Google Drive:

from google.colab import drive

# Mount Google Drive and Accept Connection
drive.mount('/content/drive')

# Access files in your Google Drive
file_path = '/content/drive/MyDrive/path/to/your/file.txt'

# Install Intrasom
!pip install intrasom

#Ignore versioning warnings

# Continue running the notebook
# Load dataframe
data = pd.read_excel(file_path+"data/Animais_missing.xlsx", index_col=0)

Access to Methods Docstrings

All functions in the IntraSOM library have documentation for input and output parameters in the form of Docstrings, which can be accessed using the Python help(...) built-in function.

Example:

>>> help(som_test.train)
Help on method train in module intrasom.intrasom:

train(bootstrap=False, bootstrap_proportion=0.8, n_job=-1, save=True, summary=True, dtypes='parquet', shared_memory=False, train_rough_len=None, train_rough_radiusin=None, train_rough_radiusfin=None, train_finetune_len=None, train_finetune_radiusin=None, train_finetune_radiusfin=None, train_len_factor=1, maxtrainlen=1000, history_plot=False, previous_epoch=False) method of intrasom.intrasom.SOM instance
    Class method for training the SOM object.
    
    Args:
        n_job: number of jobs to use and parallelize training.
    
        shared_memory: flag to enable shared memory.
    
        train_rough_len: number of iterations during rough training.
    
        train_rough_radiusin: initial BMU fetching radius during
            rough training.
    
        train_rough_radiusfin: BMU search final radius during
            rough training.
    
        train_finetune_len: number of iterations during fine training.
    
        train_finetune_radiusin: initial BMU scan radius during
            fine training.
    
        train_finetune_radiusfin: BMU search final radius during
            fine training.
    
        train_len_factor: factor that multiplies the values ​​of the training
            extension (rough, fine, etc)
    
        maxtrainlen: maximum value of desired interactions.
            Default: np.Inf (infinity).
    
    Returns:
        SOM object trained according to the chosen parameters.

Dependencies

The IntraSOM dependencies are:

Library Version
matplotlib 3.7.1
scipy 1.10.1
joblib 1.2.0
scikit-learn 1.2.2
pandas 2.0.1
tqdm 4.65.0
plotly 5.14.1
scikit-image 0.20.0
pyarrow 9.0.0
openpyxl 3.1.2
geopandas 0.13.0
shapely 2.0.1
ipywidgets 8.0.6

Instalation

Progress Bar

For the progress bar to work in Jupyter Notebook or JupyterLab:

pip install ipywidgets
jupyter nbextension enable --py widgetsnbextension

Repository

# Clone repository
git clone https://github.com/InTRA-USP/IntraSOM.git

# Access directory where IntraSOM is placed
cd IntraSOM

# Install setup.py
pip install setup.py

Pip

pip install intrasom

Citation

de Gouvêa, R. C. T., Gioria, R. dos S., Marques, G. R., & Carneiro, C. de C. (2023). IntraSOM: A comprehensive Python library for Self-Organizing Maps with hexagonal toroidal maps training and missing data handling. Software Impacts, 17, 100570. https://doi.org/10.1016/j.simpa.2023.100570

BibTeX

@article{DEGOUVEA2023100570,
title = {IntraSOM: A comprehensive Python library for Self-Organizing Maps with hexagonal toroidal maps training and missing data handling},
journal = {Software Impacts},
volume = {17},
pages = {100570},
year = {2023},
issn = {2665-9638},
doi = {https://doi.org/10.1016/j.simpa.2023.100570},
url = {https://www.sciencedirect.com/science/article/pii/S2665963823001070},
author = {Rodrigo César Teixeira {de Gouvêa} and Rafael dos Santos Gioria and Gustavo Rodovalho Marques and Cleyton de Carvalho Carneiro},
keywords = {Self-Organizing Maps, Python, Missing data, Imputation, Visualization},
abstract = {IntraSOM is a new Python library that implements Self-Organizing Maps (SOM). It supports hexagonal lattices, toroidal topology, and provides visualization tools for analyzing complex data sets. The library handles missing data during training and offers efficient clustering algorithms. IntraSOM aims to make SOM more accessible to researchers and practitioners by providing a comprehensive Python implementation. It has an expandable framework and can be integrated with other Python algorithms and libraries. The IntraSOM library is available on GitHub at (https://github.com/InTRA-USP/IntraSOM).}
}

Code Ocean Reproducibility Badge

Open in Code Ocean


Main Authors

Avatar
InTRA
Avatar
Rodrigo Gouvêa

Lattes
Avatar
Cleyton Carneiro

Lattes
Avatar
Rafael Gioria

Lattes
Avatar
Gustavo Rodovalho

Lattes

Acknowledgments

Avatar
USP
Avatar
PMI
Avatar
PPGEMin
Avatar
LCT - USP
Avatar
IGCe - USP
Avatar
ICMC - USP
Avatar
CeMEAI - USP

Thanks to people who directly or indirectly contributed to the development of this library:


Stephen Fraser VectORE Pty Ltd
Michel J Friedel University of Colorado/Univeristy of Hawaii
Carina Ulsen PMI/InTRA/LCT - USP
Jean Ferrari PMI/Intra - USP
Michele Kuroda Cepetro - Unicamp
Guilherme Barreto Universidade Federal do Ceará
Afonso Paiva Neto ICMC - USP
Cibele Russo ICMC - USP

License

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

IntraSOM-1.0.4.4.tar.gz (3.8 MB view hashes)

Uploaded Source

Built Distribution

IntraSOM-1.0.4.4-py3-none-any.whl (3.8 MB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page