Skip to main content

Metagenomics toolkit.

Project description

Build Status PyPi package Downloads

Metagenomics toolkit enables scientists to download all of the sample metadata for a given study or sequence to a single csv file.

Install metagenomics toolkit

pip install -U mg-toolkit

Usage

$ mg-toolkit -h
usage: mg-toolkit [-h] [-V] [-d]
                  {original_metadata,sequence_search,bulk_download} ...

Metagenomics toolkit
--------------------

positional arguments:
  {original_metadata,sequence_search,bulk_download}
    original_metadata   Download original metadata.
    sequence_search     Search non-redundant protein database using HMMER
    bulk_download       Download result files in bulks for an entire study.

optional arguments:
  -h, --help            show this help message and exit
  -V, --version         print version information
  -d, --debug           print debugging information

Examples

Download metadata:

$ mg-toolkit original_metadata -a ERP001736

Search non-redundant protein database using HMMER and fetch metadata:

$ mg-toolkit sequence_search -seq test.fasta -db full evalue -incE 0.02

Databases:
- full - Full length sequences (default)
- all - All sequences
- partial - Partial sequences

How to bulk download result files for an entire study?

usage: mg-toolkit bulk_download [-h] -a ACCESSION [-o OUTPUT_PATH]
                                [-p {1.0,2.0,3.0,4.0,4.1,5.0}]
                                [-g {statistics,sequence_data,functional_analysis,taxonomic_analysis,taxonomic_analysis_ssu_rrna,taxonomic_analysis_lsu_rrna,non-coding_rnas,taxonomic_analysis_itsonedb,taxonomic_analysis_unite,taxonomic_analysis_motupathways_and_systems}]

optional arguments:
-h, --help            show this help message and exit
-a ACCESSION, --accession ACCESSION
                        Provide the study/project accession of your interest, e.g. ERP001736, SRP000319. The study must be publicly available in MGnify.
-o OUTPUT_PATH, --output_path OUTPUT_PATH
                        Location of the output directory, where the downloadable files are written to.
                        DEFAULT: CWD
-p {1.0,2.0,3.0,4.0,4.1,5.0}, --pipeline {1.0,2.0,3.0,4.0,4.1,5.0}
                        Specify the version of the pipeline you are interested in.
                        Lets say your study of interest has been analysed with
                        multiple version, but you are only interested in a particular
                        version then used this option to filter down the results by
                        the version you interested in.
                        DEFAULT: Downloads all versions
-g {statistics,sequence_data,functional_analysis,taxonomic_analysis,taxonomic_analysis_ssu_rrna,taxonomic_analysis_lsu_rrna,non-coding_rnas,taxonomic_analysis_itsonedb,taxonomic_analysis_unite,taxonomic_analysis_motupathways_and_systems}, --result_group {statistics,sequence_data,functional_analysis,taxonomic_analysis,taxonomic_analysis_ssu_rrna,taxonomic_analysis_lsu_rrna,non-coding_rnas,taxonomic_analysis_itsonedb,taxonomic_analysis_unite,taxonomic_analysis_motupathways_and_systems}
                        Provide a single result group if needed.
                        Supported result groups are:
                        - statistics
                        - sequence_data (all versions)
                        - functional_analysis (all versions)
                        - taxonomic_analysis (1.0-3.0)
                        - taxonomic_analysis_ssu_rrna (>=4.0)
                        - taxonomic_analysis_lsu_rrna (>=4.0)
                        - non-coding_rnas (>=4.0)
                        - taxonomic_analysis_itsonedb (>= 5.0)
                        - taxonomic_analysis_unite (>= 5.0)
                        - taxonomic_analysis_motu  (>= 5.0)
                        - pathways_and_systems (>= 5.0)
                        DEFAULT: Downloads all result groups if not provided.
                        (default: None).

How to download all files for a given study accession?

$ mg-toolkit -d bulk_download -a ERP009703

How to download results of a specific version for given study accession?

$ mg-toolkit -d bulk_download -a ERP009703 -v 4.0

How to download specific result file groups (e.g. functional analysis only) for given study accession?

$ mg-toolkit -d bulk_download -a ERP009703 -g functional_analysis

Contributors

Thanks goes to these wonderful people (emoji key):


Ola Tarkowska

💻📖

Maxim Scheremetjew

💻📖

Martin Beracochea

💻

This project follows the all-contributors specification. Contributions of any kind welcome!

Contact

If the documentation do not answer your questions, please contact us.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mg-toolkit-0.8.0.tar.gz (17.0 kB view details)

Uploaded Source

Built Distribution

mg_toolkit-0.8.0-py3-none-any.whl (20.8 kB view details)

Uploaded Python 3

File details

Details for the file mg-toolkit-0.8.0.tar.gz.

File metadata

  • Download URL: mg-toolkit-0.8.0.tar.gz
  • Upload date:
  • Size: 17.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.6.12

File hashes

Hashes for mg-toolkit-0.8.0.tar.gz
Algorithm Hash digest
SHA256 198c56dc5296775d0e0110873ea88ebd2bda6723570ae5e21bb34c05bb1055e7
MD5 663c931469abceb0c05f257d13f152f6
BLAKE2b-256 28113193fed069341811a8eda94a8ad58af1c76c5daa7d5eb704e67874982051

See more details on using hashes here.

File details

Details for the file mg_toolkit-0.8.0-py3-none-any.whl.

File metadata

  • Download URL: mg_toolkit-0.8.0-py3-none-any.whl
  • Upload date:
  • Size: 20.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.6.12

File hashes

Hashes for mg_toolkit-0.8.0-py3-none-any.whl
Algorithm Hash digest
SHA256 69ef3fb695d039d702a3df6d2d04af603774f36b5e7666a56b9983a461b5d292
MD5 4f033fde5163cf16b6ffd4a69b77a5d6
BLAKE2b-256 433c4a2d40e66f62ad7dfc118972fe01e54a0a991d369bfaef5ca8b6be4eae0b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page