Converting tabular data into images
Project description
TINTOlib
TINTOlib is a state-of-the-art Python library that transforms tidy data (also known as tabular data) into synthetic images, enabling the application of advanced deep learning techniques, including Vision Transformers (ViTs) and Convolutional Neural Networks (CNNs), to traditionally structured data. This transformation bridges the gap between tabular data and powerful vision-based machine learning models, unlocking new possibilities for tackling regression, classification, and other complex tasks.
Citing TINTO: If you used TINTO in your work, please cite the SoftwareX:
@article{softwarex_TINTO,
title = {TINTO: Converting Tidy Data into Image for Classification with 2-Dimensional Convolutional Neural Networks},
journal = {SoftwareX},
author = {Manuel Castillo-Cara and Reewos Talla-Chumpitaz and Raúl García-Castro and Luis Orozco-Barbosa},
volume={22},
pages={101391},
year = {2023},
issn = {2352-7110},
doi = {https://doi.org/10.1016/j.softx.2023.101391}
}
And use-case developed in INFFUS Paper
@article{inffus_TINTO,
title = {A novel deep learning approach using blurring image techniques for Bluetooth-based indoor localisation},
journal = {Information Fusion},
author = {Reewos Talla-Chumpitaz and Manuel Castillo-Cara and Luis Orozco-Barbosa and Raúl García-Castro},
volume = {91},
pages = {173-186},
year = {2023},
issn = {1566-2535},
doi = {https://doi.org/10.1016/j.inffus.2022.10.011}
}
Features
-
Input data formats (2 options):
- Pandas Dataframe
- Files with the following format
-
Runs on Linux, Windows and macOS systems.
-
Compatible with Python 3.7 or higher.
Models
TINTOlib includes a variety of models for generating synthetic images. Below is a summary of the supported models and their hyperparameters:
Models | Class | Hyperparameters |
---|---|---|
TINTO | TINTO() |
problem normalize verbose pixels algorithm blur submatrix amplification distance steps option times train_m zoom random_seed |
IGTD | IGTD() |
problem normalize verbose scale fea_dist_method image_dist_method error max_step val_step switch_t min_gain zoom random_seed |
REFINED | REFINED() |
problem normalize verbose hcIterations n_processors zoom random_seed |
BarGraph | BarGraph() |
problem normalize verbose pixel_width gap zoom |
DistanceMatrix | DistanceMatrix() |
problem normalize verbose zoom |
Combination | Combination() |
problem normalize verbose zoom |
SuperTML | SuperTML() |
problem normalize verbose pixels feature_importance font_size random_seed |
FeatureWrap | FeatureWrap() |
problem normalize verbose size bins zoom |
BIE | BIE() |
problem normalize verbose precision zoom |
Getting Started
You can install TINTOlib using Pypi:
pip install torchmetrics pytorch_lightning TINTOlib imblearn keras_preprocessing mpi4py
To import a specific model use
from TINTOlib.tinto import TINTO
Create the model. If you don't set any hyperparameter, the model will use the default values, refer to the Models Section or the TINTO Documentation.
model = TINTO(blur=True)
Generating Synthetic Images
To generate synthetic images, use the following workflow with the fit
, transform
, and fit_transform
methods:
Fitting the Model
The fit
method trains the model on the tabular data and prepares it for image generation.
model.fit(data)
Parameters:
- data: A path to a CSV file or a Pandas DataFrame containing the features and targets.
- The target column must be the last column.
Generating Synthetic Images
The transform
method generates and saves synthetic images in a specified folder. It requires the model to be fitted first.
model.transform(data, folder)
Parameters:
- data: A path to a CSV file or a Pandas DataFrame containing the features and targets.
- The target column must be the last column.
- folder: Path to the folder where the synthetic images will be saved.
Combining Fit and Transform
The fit_transform
method combines the training and image generation steps. It fits the model to the data and generates synthetic images in one step.
model.fit_transform(data, folder)
Parameters:
- data: A path to a CSV file or a Pandas DataFrame containing the features and targets.
- The target column must be the last column.
- folder: Path to the folder where the synthetic images will be saved.
Notes:
- The model must be fitted before using the
transform
method. If the model isn't fitted, aRuntimeError
will be raised.
Documentation
For detailed usage, examples, and tutorials, visit the TINTOlib Documentation.
How to use TINTOlib - Google Colab crash course
To get started with TINTOlib, a dedicated crash course repository is available. This repository provides a comprehensive guide to using TINTOlib for transforming tabular data into synthetic images and applying these images to machine learning tasks. It includes:
-
Slides and Jupyter notebooks demonstrating how to:
- Transform tabular data into images using TINTOlib.
- Apply state-of-the-art vision models like Vision Transformers (ViTs) and Convolutional Neural Networks (CNNs) to classification and regression problems.
-
Integration of Hybrid Neural Networks (HyNNs), where:
- One branch (MLP) processes the original tabular data.
- Another branch (CNN or ViT) processes synthetic images.
This architecture leverages the strengths of both tabular and image-based data representations, enabling improved performance on complex machine learning tasks. The repository is ideal for those looking to integrate image-based deep learning techniques into tabular data workflows.
Converting Tidy Data into image
For example, the following table shows a classic example of the IRIS CSV dataset as it should look like for the run:
sepal length | sepal width | petal length | petal width | target |
---|---|---|---|---|
4.9 | 3.0 | 1.4 | 0.2 | 1 |
7.0 | 3.2 | 4.7 | 1.4 | 2 |
6.3 | 3.3 | 6.0 | 2.5 | 3 |
Simple example without Blurring
The following example shows how to create 20x20 images with characteristic pixels, i.e. without blurring. Also, as no other parameters are indicated, you will choose the following parameters which are set by default:
- Image size: 20x20 pixels
- Blurring: No blurring will be used.
- Seed: with the seed set to 20.
More specific example
The following example shows how to create with blurring with a more especific parameters.
The images are created with the following considerations regarding the parameters used:
- Blurring (-B): Create the images with blurring technique.
- Dimensional Reduction Algorithm (-alg): t-SNE is used.
- Blurring option (-oB): Create de images with maximum value of overlaping pixel
- Image size (-px): 30x30 pixels
- Blurring steps (-sB): Expand 5 pixels the blurring.
License
TINTOlib is available under the Apache License 2.0.
Authors
- Manuel Castillo-Cara
- Raúl García-Castro
- Borja Reinoso -borjareinoso@gmail.com
- David González Fernández
- Jiayun Liu
Contributors
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file tintolib-1.0.6.tar.gz
.
File metadata
- Download URL: tintolib-1.0.6.tar.gz
- Upload date:
- Size: 2.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.13.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
9669f4c1ae8a732542c7e3437f3ada17642e500ee38d0a599cd2f95b657691e5
|
|
MD5 |
ca5df7aa069f14710c9a91e7f161438e
|
|
BLAKE2b-256 |
efca0e8d5dc8c640ecb8683f7008f954c8651870bcd252b11a574a19d904f094
|
File details
Details for the file tintolib-1.0.6-py3-none-any.whl
.
File metadata
- Download URL: tintolib-1.0.6-py3-none-any.whl
- Upload date:
- Size: 54.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.13.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
6bea0b561b13300e54269c8695cedff308611d3f762a24e952f62517312021b5
|
|
MD5 |
63fcc0ebe1eb0f5f593718497f0923d8
|
|
BLAKE2b-256 |
03c34cf20c2ffa721ce0a53753dc7ec234098649eeb01b8112fc31ac4c16bdb3
|