Skip to main content

Griddify high-dimensional tabular data for easy visualization and deep learning

Project description

Griddify

Redistribute tabular data into a grid for easy visualization and image-based deep learning. This library is greatly inspired by the excellent MolMap library.

Installation

git clone https://github.com/ersilia-os/griddify.git
cd griddify
pip install -e .

Step by step

Get a multidimensional dataset and preprocess it

In this example, we will use a dataset of 200 physicochemical descriptors calculated for about 10k compounds. You can get these data with the following command.

from griddify import datasets

data = datasets.get_compound_descriptors()

It is important that you preprocess your data (impute missing values, normalize, etc.). We provide functionality to do so.

from griddify import Preprocessing

pp = Preprocessing()
pp.fit(data)
data = pp.transform(data)

Create a 2D cloud of data features

Start by calculating distances between features.

from griddify import FeatureDistances

fd = FeatureDistances(metric="cosine").calculate(data)

You can now obtain a 2D cloud of your data features. By default, UMAP is used.

from griddify import Tabular2Cloud

tc = Tabular2Cloud()
tc.fit(fd)
Xc = tc.transform(fd)

It is always good to inspect the resulting projection. The cloud contains as many points as features exist in your dataset.

from griddify.plots import cloud_plot

cloud_plot(Xc)

Rearrange the 2D cloud onto a grid

Distribute cloud points on a grid using a linear assignment algorithm.

from griddify import Cloud2Grid

cg = Cloud2Grid()
cg.fit(Xc)
Xg = cg.transform(Xc)

You can check the rearrangement with an arrows plot.

from griddify.plots import arrows_plot

arrows_plot(Xc, Xg)

To continue with the next steps, it is actually more convenient to get mappings as integers. The following method gives you the size of the grid as well.

mappings, side = cg.get_mappings(Xc)

Rearrange your flat data points into grids

Let's go back to the original tabular data. We want to transform the input data, where each data sample is represented with a one-dimensional array, into an output data where each sample is represented with an image (i.e. a two-dimensional grid). Please ensure that data are normalize or scaled.

from griddify import Flat2Grid

fg = Flat2Grid(mappings, side)
Xi = fg.transform(data)

Explore one sample.

from griddify.plots import grid_plot

grid_plot(Xi[0])

Full pipeline

You can run the full pipeline described above in only a few lines of code.

from griddify import datasets
from griddify import Griddify

data = datasets.get_compound_descriptors()

gf = Griddify(preprocess=True)
gf.fit(data)
Xi = gf.transform(data)

You can find more examples as Jupyter Notebooks in the notebooks folder.

Learn more

The Ersilia Open Source Initiative is on a mission to strenghten research capacity in low income countries. Please reach out to us if you want to contribute: hello@ersilia.io

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

griddify-0.0.1.tar.gz (9.7 MB view details)

Uploaded Source

Built Distribution

griddify-0.0.1-py3-none-any.whl (20.8 kB view details)

Uploaded Python 3

File details

Details for the file griddify-0.0.1.tar.gz.

File metadata

  • Download URL: griddify-0.0.1.tar.gz
  • Upload date:
  • Size: 9.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.11.3 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.64.0 CPython/3.8.13

File hashes

Hashes for griddify-0.0.1.tar.gz
Algorithm Hash digest
SHA256 43ccd859767df88d8f236a3e7589fdb643c72ebffb7b0abd752fffacf0e6b179
MD5 a3987095f42c1017ba70ef8f88712a44
BLAKE2b-256 a6dbcc942b27aa84f8cf776612148fc2036356b835508e65dfe88882f6f0978c

See more details on using hashes here.

File details

Details for the file griddify-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: griddify-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 20.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.8

File hashes

Hashes for griddify-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e86b0bf4dad574aee50dabfa1b00d4ea9d0a51d46f3c3afbc05cd1b186655917
MD5 57244149031c2f06228db06ab4492cfd
BLAKE2b-256 b809969928f61f4f5c2907b821284bec31aae5f7190c26c63cbc261979e5ab84

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page