Skip to main content

MetaDR package

Project description

MetaDR is a pipeline that can integrate various information to predict human diseases. MetaDR consists of a predictor and an interpreter. The predictor can embed the taxonomic relationship into microbial features and ensembles the prediction results from multiple perspectives. The interpreter can extract and elucidate biological insights from different microbial contexts.

MetaDR can provide reference biomarkers from the combination of both known and unknown microbial organisms for the metagenomic dataset as well as achieving competitive prediction performance for human diseases.

For application, we designed an operator to choose the best model from several candidate models according to the performance on the validation set, so the discrepancy on different combinations of abundance profiles and taxonomic information can be avoided since only the best model will be chosen, rather than the last ensemble model.

Inputs of our MetaDR:

The output of MicroPro can be direct as the input of our pipeline. Optionally, the users can also prepare thier data in the following formats based on other analysis pipelines.

There are 4 files that need to be prepared as the input for our pipeline, assume the name of the example set is ‘Karlsson_T2D’, then the file names should be ‘Karlsson_T2D_known’.csv, ‘Karlsson_T2D_unknown’.csv, ‘Karlsson_T2D_y’.csv, and Unknown_name.xlsx’. Where the first two files are the abundance tables of known and unknown features. ‘Karlsson_T2D_y’.csv is the label file for each patient. Unknown_name.xlsx’ includes the genus-level assignments for each MAG.

Outputs of our MetaDR:

A txt file that saves all the metrics and prediction results will be generated for each function.

For WRF:

“WeighRF.train” will return the final ensemble RF model, the weights for known and unknown features, and the final ensemble prediction results.

“WeighRF.select” will return the top 30 features.

For EPCNN:

“EpCNN.phygen” will generate four “csv” files, known_level, known_postorder, unknown_level, and unknown_postorder which represents the postorder phylogenetic-sorting based on known features, level phylogenetic-sorting based on known features, postorder phylogenetic-sortingbased on unknown features, and level phylogenetic-sorting based on unknown features.

“EpCNN.train” will return 4 single models and the final ensemble prediction results.

See https://github.com/Microbiods/MetaDR for more details.

The tutorial for MetaDR is made available at GitHub: https://github.com/Microbiods/MetaDR/blob/main/test_MetaDR_pypi.ipynb

contact: Xingjian chen email: xingjchen3-c@my.cityu.edu.hk

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

MetaDR-3.0.0.tar.gz (8.2 kB view hashes)

Uploaded Source

Built Distribution

MetaDR-3.0.0-py3-none-any.whl (10.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page