Skip to main content

No project description provided

Project description

GIMIC (Smoothed Graph IMages of the MICrobiome)

This code is attached to the paper "GIMIC - Smoothed Graph-Image representation of Microbiome Samples induce an optimal distance". We introduce GIMIC (Smoothed Graph IMages of the MICrobiome) as the smoothed tree-based images. GIMIC highlights consistent patterns across different cohorts, and as a metric (the difference between two GIMIC images), it outperforms existing metrics in a wide array of tasks over many 16S and WGS datasets, both within and between cohorts. Technically, GIMIC employs a fast Fourier transform (FFT) with adjustable thresholding to smooth the images, reducing noise and accentuating meaningful information. Various distance metrics, such as SAM and MSE, can be applied to the processed images.

How to apply GIMIC

GIMIC's code is available at this GitHub as well as PyPi.

GIMIC's GitHub

GIMIC as a metric

There is an example in example_use.py. You should follow the following steps:

  1. Load the raw ASVs table in the following format: the first column is named "ID", each row represents a sample and each column represents an ASV. The last row contains the taxonomy information, named "taxonomy".

    df = pd.read_csv("example_data/for_preprocess.csv")
    
  2. Apply the MIPMLP with the defaulting parameters (see MIPMLP for more explanations).

    processed = MIPMLP.preprocess(df)
    
  3. micro2matrix (translate microbiome into matrix according to iMic, and save the images in a prepared folder

     folder = "example_data/2D_images"
     micro2matrix(processed, folder, save=True)
    
  4. Calculate the distance matrix according to GIMIC One can choose the FFT cutoff (in the range of [0,1]), and the final metric (one of "sam","mse","d1","d2","d3").

     DM = build_SAMBA_distance_matrix(folder,cutoff=CUTOFF,metric=METRIC)
    
  5. If a tag table is available. One can load the tag file and visualize the data according to the SAMBA metric by the plot_umap function. "example_data" is the folder path for saving.

    tag = pd.read_csv("example_data/tag.csv",index_col=0)
    plot_umap(DM,tag,"example_data")
    

GIMIC as a cross-cohort visualization tool

There is an example in example_use_meta_analysis.py.
You should follow the following steps:

  1. Set a cutoff for the smoothing. (A float between 0 to 1, when 1 is no smooting)

     CUTOFF = 0.8
    
  2. Provide a list of datasets names

    NOTE: in each data folder there should be the following csvs 'for_preprocess.csv' and 'tag.csv', in the format of the files in example_data_meta folder.

    list_data_names = ["D1","D2","D3"]
    
  3. Provide a folder where the datasets are saved

     folder = "example_data_meta"
    
  4. Call the 'apply_meta_analysis' function

    apply_meta_analysis(folder,list_data_names,CUTOFF)
    

    This function plots a convenient meta-analysis visualization and saves it.

GIMIC's PyPi

  1. Install the GIMIC package
pip install samba-metric
  1. Apply the MIPMLP with the default parameters
processed = MIPMLP.preprocess(df)

or - load a MIPMLP processed data directly.

processed = pd.read_csv("example_data.csv",index_col=0)

  1. Apply GIMIC metric on a MIPMLP processed data:
from samba import *
CLASS = False
    # Load the raw data in the required format
    df = pd.read_csv("example_data/for_preprocess.csv")
    # If tag is available
    tag = pd.read_csv("example_data/tag.csv", index_col=0)

    # Apply the MIPMLP with the defaultive parameters
    processed = MIPMLP.preprocess(df)

    # micro2matrix there is an option to save the images in a prepared folder, the default is no saving at all
    folder = "example_data/2D_images"
    array_of_imgs,bact_names, ordered_df = micro2matrix(processed, folder, save=False)

    # Calculate the distance matrix according to GIMIC
    DM = build_SAMBA_distance_matrix(folder,imgs=array_of_imgs,ordered_df=ordered_df,bact_names=bact_names,class_=CLASS)

    # Plot UMAP according to the distance matrix and some tag (NOTE: only when tag is available)
    plot_umap(DM, tag, "example_data")

Output

  1. Apply GIMIC cross-cohort visualization on several cohorts from the same phenotype. NOTE: in each data folder there should be the following csvs 'for_preprocess.csv' and 'tag.csv', in the format of the files in example_data_meta folder.
     # Set a cutoff for the smoothing
     CUTOFF = 0.8
     # List of datasets names
     list_data_names = ["D1","D2","D3"]
    
     # Folder where the datasets are saved
     folder = "example_data_meta"
    
     apply_meta_analysis(folder,list_data_names,CUTOFF)
    

Alt text

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

samba_metric-0.0.14.tar.gz (12.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

samba_metric-0.0.14-py3-none-any.whl (11.6 kB view details)

Uploaded Python 3

File details

Details for the file samba_metric-0.0.14.tar.gz.

File metadata

  • Download URL: samba_metric-0.0.14.tar.gz
  • Upload date:
  • Size: 12.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.12

File hashes

Hashes for samba_metric-0.0.14.tar.gz
Algorithm Hash digest
SHA256 d51fbbcb1bcd47f6e72c80aaefc64066db6c57aaa0a87189e5716bb095ec299a
MD5 2f626caa105b2be2183209e0daf54465
BLAKE2b-256 b26c143245967b52917012a125ee7c325563a6debaed93a22cb8114e1673504e

See more details on using hashes here.

File details

Details for the file samba_metric-0.0.14-py3-none-any.whl.

File metadata

  • Download URL: samba_metric-0.0.14-py3-none-any.whl
  • Upload date:
  • Size: 11.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.12

File hashes

Hashes for samba_metric-0.0.14-py3-none-any.whl
Algorithm Hash digest
SHA256 e6383fe4ae1aecfdbd7c0fceab7bd627c1da3de3752ebedab3b5ad3174fa7f82
MD5 52707f4e4bb4e7cb964356644f857717
BLAKE2b-256 f2287c8a7bd5744693ada616c0d854eea5c485051b2a7b260b5daa2fb462eacf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page