Skip to main content

Converting tabular data into images

Project description

TINTOlib

License Python Version Documentation Status Open In Colab-CNN Open In Colab-CNN+MLP Open In Colab-ViT Open In Colab-ViT+MLP

TINTO Logo

TINTOlib is a state-of-the-art Python library that transforms tidy data (also known as tabular data) into synthetic images, enabling the application of advanced deep learning techniques, including Vision Transformers (ViTs) and Convolutional Neural Networks (CNNs), to traditionally structured data. This transformation bridges the gap between tabular data and powerful vision-based machine learning models, unlocking new possibilities for tackling regression, classification, and other complex tasks.

Citing TINTO: If you used TINTO in your work, please cite the SoftwareX:

@article{softwarex_TINTO,
    title = {TINTO: Converting Tidy Data into Image for Classification with 2-Dimensional Convolutional Neural Networks},
    journal = {SoftwareX},
    author = {Manuel Castillo-Cara and Reewos Talla-Chumpitaz and Raúl García-Castro and Luis Orozco-Barbosa},
    volume={22},
    pages={101391},
    year = {2023},
    issn = {2352-7110},
    doi = {https://doi.org/10.1016/j.softx.2023.101391}
}

And use-case developed in INFFUS Paper

@article{inffus_TINTO,
    title = {A novel deep learning approach using blurring image techniques for Bluetooth-based indoor localisation},
    journal = {Information Fusion},
    author = {Reewos Talla-Chumpitaz and Manuel Castillo-Cara and Luis Orozco-Barbosa and Raúl García-Castro},
    volume = {91},
    pages = {173-186},
    year = {2023},
    issn = {1566-2535},
    doi = {https://doi.org/10.1016/j.inffus.2022.10.011}
}

Features

  • Input data formats (2 options):

    • Pandas Dataframe
    • Files with the following format
      • Tabular files: The input data must be in CSV, taking into account the Tidy Data format.
      • Tidy Data: The target (variable to be predicted) should be set as the last column of the dataset. Therefore, the first columns will be the features.
      • All data must be in numerical form.
  • Runs on Linux, Windows and macOS systems.

  • Compatible with Python 3.7 or higher.


Models

TINTOlib includes a variety of models for generating synthetic images. Below is a summary of the supported models and their hyperparameters:

Models Class Hyperparameters
TINTO TINTO() problem normalize verbose pixels algorithm blur submatrix amplification distance steps option times train_m zoom random_seed
IGTD IGTD() problem normalize verbose scale fea_dist_method image_dist_method error max_step val_step switch_t min_gain zoom random_seed
REFINED REFINED() problem normalize verbose hcIterations n_processors zoom random_seed
BarGraph BarGraph() problem normalize verbose pixel_width gap zoom
DistanceMatrix DistanceMatrix() problem normalize verbose zoom
Combination Combination() problem normalize verbose zoom
SuperTML SuperTML() problem normalize verbose pixels feature_importance font_size random_seed
FeatureWrap FeatureWrap() problem normalize verbose size bins zoom
BIE BIE() problem normalize verbose precision zoom

Getting Started

You can install TINTOlib using Pypi:

    pip install torchmetrics pytorch_lightning TINTOlib imblearn keras_preprocessing mpi4py

To import a specific model use

    from TINTOlib.tinto import TINTO

Create the model. If you don't set any hyperparameter, the model will use the default values, refer to the Models Section or the TINTO Documentation.

    model = TINTO(blur=True)

Generating Synthetic Images

To generate synthetic images, use the following workflow with the fit, transform, and fit_transform methods:

Fitting the Model

The fit method trains the model on the tabular data and prepares it for image generation.

model.fit(data)

Parameters:

  • data: A path to a CSV file or a Pandas DataFrame containing the features and targets.
    • The target column must be the last column.

Generating Synthetic Images

The transform method generates and saves synthetic images in a specified folder. It requires the model to be fitted first.

model.transform(data, folder)

Parameters:

  • data: A path to a CSV file or a Pandas DataFrame containing the features and targets.
    • The target column must be the last column.
  • folder: Path to the folder where the synthetic images will be saved.

Combining Fit and Transform

The fit_transform method combines the training and image generation steps. It fits the model to the data and generates synthetic images in one step.

model.fit_transform(data, folder)

Parameters:

  • data: A path to a CSV file or a Pandas DataFrame containing the features and targets.
    • The target column must be the last column.
  • folder: Path to the folder where the synthetic images will be saved.

Notes:

  • The model must be fitted before using the transform method. If the model isn't fitted, a RuntimeError will be raised.

Documentation

For detailed usage, examples, and tutorials, visit the TINTOlib Documentation.

How to use TINTOlib - Google Colab crash course

To get started with TINTOlib, a dedicated crash course repository is available. This repository provides a comprehensive guide to using TINTOlib for transforming tabular data into synthetic images and applying these images to machine learning tasks. It includes:

  • Slides and Jupyter notebooks demonstrating how to:

    • Transform tabular data into images using TINTOlib.
    • Apply state-of-the-art vision models like Vision Transformers (ViTs) and Convolutional Neural Networks (CNNs) to classification and regression problems.
  • Integration of Hybrid Neural Networks (HyNNs), where:

    • One branch (MLP) processes the original tabular data.
    • Another branch (CNN or ViT) processes synthetic images.

This architecture leverages the strengths of both tabular and image-based data representations, enabling improved performance on complex machine learning tasks. The repository is ideal for those looking to integrate image-based deep learning techniques into tabular data workflows.

Converting Tidy Data into image

For example, the following table shows a classic example of the IRIS CSV dataset as it should look like for the run:

sepal length sepal width petal length petal width target
4.9 3.0 1.4 0.2 1
7.0 3.2 4.7 1.4 2
6.3 3.3 6.0 2.5 3

Simple example without Blurring

The following example shows how to create 20x20 images with characteristic pixels, i.e. without blurring. Also, as no other parameters are indicated, you will choose the following parameters which are set by default:

  • Image size: 20x20 pixels
  • Blurring: No blurring will be used.
  • Seed: with the seed set to 20.

TINTO characteristic pixel

More specific example

The following example shows how to create with blurring with a more especific parameters.

The images are created with the following considerations regarding the parameters used:

  • Blurring (-B): Create the images with blurring technique.
  • Dimensional Reduction Algorithm (-alg): t-SNE is used.
  • Blurring option (-oB): Create de images with maximum value of overlaping pixel
  • Image size (-px): 30x30 pixels
  • Blurring steps (-sB): Expand 5 pixels the blurring.

TINTO blurring


License

TINTOlib is available under the Apache License 2.0.

Authors

Contributors

Ontology Engineering Group Universidad Politécnica de Madrid Universidad Nacional de Educación a Distancia Universidad de Castilla-La Mancha

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tintolib-1.0.6.tar.gz (2.2 MB view details)

Uploaded Source

Built Distribution

tintolib-1.0.6-py3-none-any.whl (54.4 kB view details)

Uploaded Python 3

File details

Details for the file tintolib-1.0.6.tar.gz.

File metadata

  • Download URL: tintolib-1.0.6.tar.gz
  • Upload date:
  • Size: 2.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.13.0

File hashes

Hashes for tintolib-1.0.6.tar.gz
Algorithm Hash digest
SHA256 9669f4c1ae8a732542c7e3437f3ada17642e500ee38d0a599cd2f95b657691e5
MD5 ca5df7aa069f14710c9a91e7f161438e
BLAKE2b-256 efca0e8d5dc8c640ecb8683f7008f954c8651870bcd252b11a574a19d904f094

See more details on using hashes here.

File details

Details for the file tintolib-1.0.6-py3-none-any.whl.

File metadata

  • Download URL: tintolib-1.0.6-py3-none-any.whl
  • Upload date:
  • Size: 54.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.13.0

File hashes

Hashes for tintolib-1.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 6bea0b561b13300e54269c8695cedff308611d3f762a24e952f62517312021b5
MD5 63fcc0ebe1eb0f5f593718497f0923d8
BLAKE2b-256 03c34cf20c2ffa721ce0a53753dc7ec234098649eeb01b8112fc31ac4c16bdb3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page