IntraSOM: Library for Self-Organizing Maps with missing data, hexagonal lattice and toroidal projection
Project description
IntraSOM
IntraSOM is a fully Python-based implementation of self-organizing maps (SOM) developed by the Integrated Technology for Rock and Fluid Analysis (InTRA) research center (https://www.usp.br/intra/). IntraSOM is built using Object-Oriented Programming and includes support for hexagonal grids, toroidal topologies, and a wide range of visualization tools to enhance the analysis, exploration, and classification of complex datasets. Furthermore, IntraSOM includes features for handling missing data during training and efficient clustering algorithms. This library aims to make Self-Organizing techniques more accessible to researchers and professionals in various fields by providing a comprehensive Python implementation of SOM and a framework for easily expanding and implementing other SOM-based algorithms.
-----Framework
An visualization library
U-Matrix
U-Matrix with Samples Label
U-Matrix with Watermark Neuron Template
Component Plots
Clustering
Clustering with Merged Visualization
Clustering with Neuron Template
Structure
The structure of this library is based on the structure of the SOMPY library by Moosavi et al. (2014), with implementations of:
- Training projected toroidal topology
- Training on hexagonal lattice
- Training with missing data
- Data imputation
- Loading a previously performed training
- Module for evaluating semi-supervised training with ROC curve plotting
- Module for plotting and calculating the U-matrix and component maps of the training
- Saving training data
- Generation of Training Report
- Projection of new data onto a trained map
- Clustering module for trained neurons using k-means and visualization of the results
- Accelerated distance matrix calculation using matrix shifts
- Parquet format for input and output of data and training results
- Label plotting on the U-matrix
- Implementation of representative sample analysis and visualization on the U-matrix
Documentation and Examples
For documented examples of usage of the functions and features of this library, please refer to the Jupyter Notebook:
IntraSOM: Documented Examples
Note: This notebook is not loaded via GitHub due to its size, but it can be accessed through any IDE that supports Jupyter Notebooks.
Documented Examples in Jupyter Notebook
Open on Google Colab:
It is possible to access and visualize this notebook through Google Colab:
To run the notebook inside your Google Drive and connect it to Google Colab, follow these steps:
- Upload the notebook and the data files to your Google Drive.
- Open Google Colab (https://colab.research.google.com/) in your web browser.
- Click on "File" in the menu and select "Open Notebook".
- In the "Notebook" tab, select the "Google Drive" option.
- Navigate to the location where you uploaded the notebook file in your Google Drive and select it.
- The notebook will open in Google Colab, and you will have access to your Google Drive files from within the notebook.
Here's an example of code you can use inside the notebook to access files in your Google Drive:
from google.colab import drive
# Mount Google Drive and Accept Connection
drive.mount('/content/drive')
# Access files in your Google Drive
file_path = '/content/drive/MyDrive/path/to/your/file.txt'
# Install Intrasom
!pip install intrasom
#Ignore versioning warnings
# Continue running the notebook
# Load dataframe
data = pd.read_excel(file_path+"data/Animais_missing.xlsx", index_col=0)
Access to Methods Docstrings
All functions in the IntraSOM library have documentation for input and output parameters in the form of Docstrings, which can be accessed using the Python help(...) built-in function.
Example:
>>> help(som_test.train)
Help on method train in module intrasom.intrasom:
train(bootstrap=False, bootstrap_proportion=0.8, n_job=-1, save=True, summary=True, dtypes='parquet', shared_memory=False, train_rough_len=None, train_rough_radiusin=None, train_rough_radiusfin=None, train_finetune_len=None, train_finetune_radiusin=None, train_finetune_radiusfin=None, train_len_factor=1, maxtrainlen=1000, history_plot=False, previous_epoch=False) method of intrasom.intrasom.SOM instance
Class method for training the SOM object.
Args:
n_job: number of jobs to use and parallelize training.
shared_memory: flag to enable shared memory.
train_rough_len: number of iterations during rough training.
train_rough_radiusin: initial BMU fetching radius during
rough training.
train_rough_radiusfin: BMU search final radius during
rough training.
train_finetune_len: number of iterations during fine training.
train_finetune_radiusin: initial BMU scan radius during
fine training.
train_finetune_radiusfin: BMU search final radius during
fine training.
train_len_factor: factor that multiplies the values of the training
extension (rough, fine, etc)
maxtrainlen: maximum value of desired interactions.
Default: np.Inf (infinity).
Returns:
SOM object trained according to the chosen parameters.
Dependencies
The IntraSOM dependencies are:
Library | Version |
---|---|
matplotlib | 3.7.1 |
scipy | 1.10.1 |
joblib | 1.2.0 |
scikit-learn | 1.2.2 |
pandas | 2.0.1 |
tqdm | 4.65.0 |
plotly | 5.14.1 |
scikit-image | 0.20.0 |
pyarrow | 9.0.0 |
openpyxl | 3.1.2 |
geopandas | 0.13.0 |
shapely | 2.0.1 |
ipywidgets | 8.0.6 |
Instalation
Progress Bar
For the progress bar to work in Jupyter Notebook or JupyterLab:
pip install ipywidgets
jupyter nbextension enable --py widgetsnbextension
Repository
# Clone repository
git clone https://github.com/InTRA-USP/IntraSOM.git
# Access directory where IntraSOM is placed
cd IntraSOM
# Install setup.py
pip install setup.py
Pip
pip install intrasom
Citation
de Gouvêa, R. C. T., Gioria, R. dos S., Marques, G. R., & Carneiro, C. de C. (2023). IntraSOM: A comprehensive Python library for Self-Organizing Maps with hexagonal toroidal maps training and missing data handling. Software Impacts, 17, 100570. https://doi.org/10.1016/j.simpa.2023.100570
BibTeX
@article{DEGOUVEA2023100570,
title = {IntraSOM: A comprehensive Python library for Self-Organizing Maps with hexagonal toroidal maps training and missing data handling},
journal = {Software Impacts},
volume = {17},
pages = {100570},
year = {2023},
issn = {2665-9638},
doi = {https://doi.org/10.1016/j.simpa.2023.100570},
url = {https://www.sciencedirect.com/science/article/pii/S2665963823001070},
author = {Rodrigo César Teixeira {de Gouvêa} and Rafael dos Santos Gioria and Gustavo Rodovalho Marques and Cleyton de Carvalho Carneiro},
keywords = {Self-Organizing Maps, Python, Missing data, Imputation, Visualization},
abstract = {IntraSOM is a new Python library that implements Self-Organizing Maps (SOM). It supports hexagonal lattices, toroidal topology, and provides visualization tools for analyzing complex data sets. The library handles missing data during training and offers efficient clustering algorithms. IntraSOM aims to make SOM more accessible to researchers and practitioners by providing a comprehensive Python implementation. It has an expandable framework and can be integrated with other Python algorithms and libraries. The IntraSOM library is available on GitHub at (https://github.com/InTRA-USP/IntraSOM).}
}
Code Ocean Reproducibility Badge
Main Authors
InTRA |
Rodrigo Gouvêa Lattes |
Cleyton Carneiro Lattes |
Rafael Gioria Lattes |
Gustavo Rodovalho Lattes |
---|
Acknowledgments
USP |
PMI |
PPGEMin |
LCT - USP |
IGCe - USP |
ICMC - USP |
CeMEAI - USP |
---|
Thanks to people who directly or indirectly contributed to the development of this library:
Stephen Fraser | VectORE Pty Ltd |
Michel J Friedel | University of Colorado/Univeristy of Hawaii |
Carina Ulsen | PMI/InTRA/LCT - USP |
Jean Ferrari | PMI/Intra - USP |
Michele Kuroda | Cepetro - Unicamp |
Guilherme Barreto | Universidade Federal do Ceará |
Afonso Paiva Neto | ICMC - USP |
Cibele Russo | ICMC - USP |
License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for IntraSOM-1.0.4.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0d3ce6091917e384381b3243f2554559aab5270f3b1199691167cce313fda748 |
|
MD5 | 07d6fd40aab86eab43fd6bf987e67498 |
|
BLAKE2b-256 | 5c778cc7af720ef6d533953cd4bff91e3a14721eb7885ad146d369aa0b37e46a |