No project description provided
Project description
miMic (Mann-Whitney image microbiome)
This repository is attached to the paper "miMic - a novel multi-layer statistical test for microbiome disease".
miMic is a straightforward yet remarkably versatile and scalable approach for differential abundance analysis.
miMic consists of three main steps:
-
Data preprocessing and translation to a cladogram of means.
-
An apriori nested ANOVA (or nested GLM for continuous labels) to detect overall microbiome-label relations.
-
A post hoc test along the cladogram trajectories.
miMic
miMic is available through the following platforms:
Install the package
pip install mimic-da
How to apply miMic
See example_use.py
for an example of how to use miMic.
The example containing the following steps:
-
Import miMic and additional packages.
from mimic import apply_mimic import pandas as pd
-
Load the raw ASVs table in the following format:
- The first column is named "ID"
- Each row represents a sample and each column represents an ASV.
- The last row contains the taxonomy information, named "taxonomy".
df = pd.read_csv("example_data/for_process.csv")
- Note:
for_process.csv
is a file that contains the raw ASVs table in the required format, you can find an exmaple file inexample_data
folder.
-
Load a tag table as csv, such that the tag column is named "Tag".
tag = pd.read_csv("example_data/tag.csv",index_col=0)
- Note:
tag.csv
is a file that contains the tag table in the required format, you can find an example tag inexample_data
folder.
- Note:
-
Apply MIPMLP.
- MIPMLP using defaulting parameters, you can find more in 'Note' section below.
- taxonomy_group: ["sub PCA", "mean", "sum"], "sub PCA" method is preferred.
processed = apply_mimic(folder=folder, tag=tag, mode="preprocess", preprocess=True, rawData=df, taxnomy_group='sub PCA')
- Note: MIPMLP is a package that is used to preprocess the raw ASVs table, see MIPMLP PyPi or MIPMLP website for more explanations.
If you have your own processed data, setpreprocess
to False, and use your processed data as input forproceesed
parameter in the next step.
-
Apply miMic test.
miMic using the following hyperparameters:- eval: evaluation method, ["man", "corr", "cat"]. Default is "man".
- "man" for binary labels.
- "corr" for continuous labels.
- "cat" for categorical labels.
- sis: apply sister correction,["fdr_bh", "bonferroni", "no"]. Default is "dfr_bh".
- correct_first: apply FDR correction to the starting taxonomy level according to
sis
parameter,[True, False] Default is True. - mode: 2 different formats of running,["test", "plot"]. Default is "test".
- save: whether to save the corrs_df of the miMic test to computer,[True, False] Default is True.
- tax: starting taxonomy of the post hoc test,["None", 1, 2, 3, "noAnova", "nosignifacnt"]
- In "test" mode the defaulting value is "None".
- In the "plot" mode the tax is set automatically to the selected taxonomy of the "test" mode [1, 2, 3, "noAnova"].
- "noAnova", where apriori nested ANOVA test is not significant.
- "nosignificant", where apriori nested ANOVA test is not significant and miMic did not find any significant taxa in the leafs. In this case, the post hoc test will not be applied.
- colorful: Determines whether to apply colorful mode on the plots [True, False]. Default is True.
- threshold_p: the threshold for significant values. Default is 0.05.
- THRESHOLD_edge: the threshold for having an edge in "interaction" plot. Default is 0.5.
- processed: the processed data from the previous step. Default is None.
if processed is not None: taxonomy_selected = apply_mimic(folder, tag, eval="man", threshold_p=0.05, save=True, processed=processed) if taxonomy_selected is not None: apply_mimic(folder, tag, mode="plot", tax=taxonomy_selected, eval="man", sis='fdr_bh', save=False, threshold_p=0.05, THRESHOLD_edge=0.5)
- eval: evaluation method, ["man", "corr", "cat"]. Default is "man".
miMic output
miMic will output the following:
-
If
save
is set to True, the following csv will be saved to your specified folder:- corrs_df: a dataframe containing the results of the miMic test (including Utest results).
- just_mimic: a dataframe containing the results of the miMic test without the Utest results.
- u_test_without_mimic: a dataframe containing the results of the Utest without the miMic results.
- miMic&Utest: a dataframe containing the joint results of miMic and Utest tests.
-
If
mode
is set to "plot", plots will be saved in the folder named 'plots' in your current working directory.
The following plots will be saved:-
tax_vs_rp_sp_anova_p: plot RP vs SP over the different taxonomy levels and the p-values of the apriori test as function of taxonomy.
-
rsp_vs_beta: calculate RSP score for different betas and create the appropriate plot.
-
hist: a histogram of the ASVs in each taxonomy level.
-
corrs_within_family: a plot of the correlation between the significant ASVs within the family level, if
colorful
is set to True, each family will be colored.
-
interaction: a plot of the interaction between the significant ASVs.
-
correlations_tree: create correlation cladogram, such that tha size of each node is according to the -log(p-value), the color of each node represents the sign of the post hoc test, the shape of the node (circle, square,sphere) is based on miMic, Utest, or both results accordingly, and if
colorful
is set to True, the background color of the node will be colored based on the family color.
-
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.