Skip to main content

No project description provided

Project description

SAMBA (Smoothed imAge MicroBiome distAnce)

This code is attached to the paper "SAMBA: A Novel Smoothed Image-Based Metric for Microbial Sequencing Data Analysis". SAMBA is a novel microbial metric. SAMBA utilizes the iMic method to transform microbial data into images, incorporating phylogenetic structure and abundance similarity. This image-based representation enhances data visualization and analysis. Moreover, SAMBA employs a fast Fourier transform (FFT) with adjustable thresholding to smooth the images, reducing noise and accentuating meaningful information. Various distance metrics, such as SAM and MSE, can be applied to the processed images.

How to apply SAMBA

SAMBA's code is available at this GitHub as well as PyPi.

SAMBA's GitHub

There is an example in example_use.py. You should follow the following steps:

  1. Load the raw ASVs table in the following format: the first column is named "ID", each row represents a sample and each column represents an ASV. The last row contains the taxonomy information, named "taxonomy".

    df = pd.read_csv("example_data/for_preprocess.csv")
    
  2. Apply the MIPMLP with the defaulting parameters (see MIPMLP for more explanations).

    processed = MIPMLP.preprocess(df)
    
  3. micro2matrix (translate microbiome into matrix according to iMic, and save the images in a prepared folder

     folder = "example_data/2D_images"
     micro2matrix(processed, folder, save=True)
    
  4. Calculate the distance matrix according to SAMBA One can choose the FFT cutoff (in the range of [0,1]), and the final metric (one of "sam","mse","d1","d2","d3").

     DM = build_SAMBA_distance_matrix(folder,cutoff=CUTOFF,metric=METRIC)
    
  5. If a tag table is available. One can load the tag file and visualize the data according to the SAMBA metric by the plot_umap function. "example_data" is the folder path for saving.

    tag = pd.read_csv("example_data/tag.csv",index_col=0)
    plot_umap(DM,tag,"example_data")
    

SAMBA's PyPi

  1. Install the SAMBA package
pip install samba-metric
  1. Apply the MIPMLP with the default parameters
processed = MIPMLP.preprocess(df)

or - load a MIPMLP processed data directly.

processed = pd.read_csv("example_data.csv",index_col=0)

  1. Apply SAMBA on a MIPMLP processed data:
from samba import *
# If the tag is available
tag = pd.read_csv("example_data/tag.csv",index_col=0)

folder = "FOLDER_NAME"
micro2matrix(processed, folder, save=True)

# Calculate the distance matrix according to SAMBA
DM = build_SAMBA_distance_matrix(folder)
DM.to_csv(f"{folder}/samba_dists.csv")
  1. If a tag table is available. One can load the tag file and visualize the data according to the SAMBA metric by the plot_umap function. "example_data" is the folder path for saving.
tag = pd.read_csv("example_data/tag.csv",index_col=0)
plot_umap(DM,tag,"example_data")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

samba_metric-0.0.10.tar.gz (8.6 kB view hashes)

Uploaded Source

Built Distribution

samba_metric-0.0.10-py3-none-any.whl (8.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page