Skip to main content

Metagenomics toolkit.

Project description

Build Status PyPi package Downloads

Metagenomics toolkit enables scientists to download all of the sample metadata for a given study or sequence to a single csv file.

Install metagenomics toolkit

pip install -U mg-toolkit


$ mg-toolkit -h
usage: mg-toolkit [-h] [-V] [-d]
                  {original_metadata,sequence_search,bulk_download} ...

Metagenomics toolkit

positional arguments:
    original_metadata   Download original metadata.
    sequence_search     Search non-redundant protein database using HMMER
    bulk_download       Download result files in bulks for an entire study.

optional arguments:
  -h, --help            show this help message and exit
  -V, --version         print version information
  -d, --debug           print debugging information


Download metadata:

$ mg-toolkit original_metadata -a ERP001736

Search non-redundant protein database using HMMER and fetch metadata:

$ mg-toolkit sequence_search -seq test.fasta -db full evalue -incE 0.02

- full - Full length sequences (default)
- all - All sequences
- partial - Partial sequences

How to bulk download result files for an entire study?

usage: mg-toolkit bulk_download [-h] -a ACCESSION [-o OUTPUT_PATH]
                                [-p {1.0,2.0,3.0,4.0,4.1,5.0}]
                                [-g {statistics,sequence_data,functional_analysis,taxonomic_analysis,taxonomic_analysis_ssu_rrna,taxonomic_analysis_lsu_rrna,non-coding_rnas,taxonomic_analysis_itsonedb,taxonomic_analysis_unite,taxonomic_analysis_motupathways_and_systems}]

optional arguments:
-h, --help            show this help message and exit
                        Provide the study/project accession of your interest, e.g. ERP001736, SRP000319. The study must be publicly available in MGnify.
-o OUTPUT_PATH, --output_path OUTPUT_PATH
                        Location of the output directory, where the downloadable files are written to.
                        DEFAULT: CWD
-p {1.0,2.0,3.0,4.0,4.1,5.0}, --pipeline {1.0,2.0,3.0,4.0,4.1,5.0}
                        Specify the version of the pipeline you are interested in.
                        Lets say your study of interest has been analysed with
                        multiple version, but you are only interested in a particular
                        version then used this option to filter down the results by
                        the version you interested in.
                        DEFAULT: Downloads all versions
-g {statistics,sequence_data,functional_analysis,taxonomic_analysis,taxonomic_analysis_ssu_rrna,taxonomic_analysis_lsu_rrna,non-coding_rnas,taxonomic_analysis_itsonedb,taxonomic_analysis_unite,taxonomic_analysis_motupathways_and_systems}, --result_group {statistics,sequence_data,functional_analysis,taxonomic_analysis,taxonomic_analysis_ssu_rrna,taxonomic_analysis_lsu_rrna,non-coding_rnas,taxonomic_analysis_itsonedb,taxonomic_analysis_unite,taxonomic_analysis_motupathways_and_systems}
                        Provide a single result group if needed.
                        Supported result groups are:
                        - statistics
                        - sequence_data (all versions)
                        - functional_analysis (all versions)
                        - taxonomic_analysis (1.0-3.0)
                        - taxonomic_analysis_ssu_rrna (>=4.0)
                        - taxonomic_analysis_lsu_rrna (>=4.0)
                        - non-coding_rnas (>=4.0)
                        - taxonomic_analysis_itsonedb (>= 5.0)
                        - taxonomic_analysis_unite (>= 5.0)
                        - taxonomic_analysis_motu  (>= 5.0)
                        - pathways_and_systems (>= 5.0)
                        DEFAULT: Downloads all result groups if not provided.
                        (default: None).

How to download all files for a given study accession?

$ mg-toolkit -d bulk_download -a ERP009703

How to download results of a specific version for given study accession?

$ mg-toolkit -d bulk_download -a ERP009703 -v 4.0

How to download specific result file groups (e.g. functional analysis only) for given study accession?

$ mg-toolkit -d bulk_download -a ERP009703 -g functional_analysis


Thanks goes to these wonderful people (emoji key):

This project follows the all-contributors specification. Contributions of any kind welcome!


If the documentation do not answer your questions, please contact us.

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for mg-toolkit, version 0.9.1
Filename, size File type Python version Upload date Hashes
Filename, size mg_toolkit-0.9.1-py3-none-any.whl (20.9 kB) File type Wheel Python version py3 Upload date Hashes View
Filename, size mg-toolkit-0.9.1.tar.gz (17.1 kB) File type Source Python version None Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page