Skip to main content

A Python package for tabular data analysis using TabMap.

Project description

TabMap

Interpretable Discovery of Patterns in Tabular Data via Spatially Semantic Topographic Maps

Nature Biomedical Engineering, 2024. HTML | PDF | Cite

TL;DR: Python implementation of TabMap proposed in our paper.

  • TabMap unravels intertwined relationships in tabular data by transforming each data sample into a spatially semantic 2D topographic map, which we refer to as TabMap.
  • A TabMap preserves the original feature values as pixel intensities, with the relationships among the features spatially encoded in the map (the strength of two inter-related features correlates with their distance on the map).
  • Our approach makes it possible to apply 2D convolutional neural networks to extract association patterns in the data to facilitate data analysis, and offers interpretability by ranking features according to importance.
  • We demonstrate TabMap's superior predictive performance across a diverse set of biomedical datasets.

Table of Contents

Set Up the Conda Environment

git clone https://github.com/rui-yan/TabMap.git
cd TabMap
conda env create -f tabmap_conda.yml
conda activate tabmap
  • NVIDIA GPU (Tested on Nvidia Quadro RTX 8000 48G x 1) on local workstations
  • Python (3.10.13), torch (1.13.1), numpy (1.23.1), pandas (1.5.3), scikit-learn (1.4.2), scipy (1.10.1), seaborn (0.12.2); For further details on the software and package versions used, please refer to the tabmap_conda.yml file.

Train and evaluate the TabMap classifier

TabMap construction: transforming tabular data into 2D topographic maps

from tabmap_construction import TabMapGenerator
generator = TabMapGenerator(metric='correlation', loss_fun='kl_loss')
X_tabmap = generator.fit_transform(X)

Parameters:

  • metric: Metric used to compute the feature inter-relationships. {'correlation', 'euclidean', 'gower'}
  • loss_fun: Loss function used for computing the optimal transport. {'kl_loss', 'sqeuclidean', 'square_loss'}
  • epsilon: Entropic regularization parameter (>=0). default=0 (no regularization applied)
  • version: Version of the distance matrix calculation algorithm. default='v2.0'
    • Versions 'v1.0' and 'v2.0' use different methods for computing grid distances.
  • num_iter: Number of iterations for the optimal transport problem. default=10

TabMapGenerator class functions:

  • fit(X, truncate=False): Computes the coupling matrix to map the feature space to the 2D map space. X is of shape (n_samples, n_features). The truncate parameter determines whether to truncate or zero-pad the data to fit the 2D map.
  • transform(X): Performs the mapping from feature space to image space.
  • fit_transform(X, truncate=False): Fits the generator to the data and then performs the transformation.

Train a 2D convolutional neural network (CNN) model for classification

python main.py

Refer to the main.py file for details on model training and evaluation. This file also includes k-fold cross-validation, hyperparameter tuning, and comparisons with other classifiers used to generate the results presented in our paper.

Example Jupyter notebooks for using TabMap

Citation

If you find our work helpful in your research or if you use any source codes, please cite our paper.

@article{yan2024interpretable,
  title={Interpretable discovery of patterns in tabular data via spatially semantic topographic maps},
  author={Yan, Rui and Islam, Md Tauhidual and Xing, Lei},
  journal={Nature Biomedical Engineering},
  pages={1--12},
  year={2024},
  publisher={Nature Publishing Group UK London}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tabmap-0.1.0.tar.gz (70.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tabmap-0.1.0-py3-none-any.whl (97.7 kB view details)

Uploaded Python 3

File details

Details for the file tabmap-0.1.0.tar.gz.

File metadata

  • Download URL: tabmap-0.1.0.tar.gz
  • Upload date:
  • Size: 70.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.19

File hashes

Hashes for tabmap-0.1.0.tar.gz
Algorithm Hash digest
SHA256 c1af209bd32297768db1bfa0e1b720992a8d9f2bd78b2651fa6a9eb291c28599
MD5 04c932466d1947a030222893e7ff9dcc
BLAKE2b-256 6fdfd0cc22ed91f6e077dbe4f3c335ac8ee977c2fe6bb4bd882fe24947772a94

See more details on using hashes here.

File details

Details for the file tabmap-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: tabmap-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 97.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.19

File hashes

Hashes for tabmap-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a38d9e16d15c70d6f736acd0f078b5fe5be50b5f688d1fd1462dbbe5a07627ed
MD5 b15541fa0b5ddca469bf89f21b3296b1
BLAKE2b-256 18af2014bcb3ce08d01078ae9d676bfb36b23ac177a66bfc37d75c42a27dad01

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page