Skip to main content

Metagenomics toolkit.

Project description

Build Status PyPi package Downloads

Metagenomics toolkit enables scientists to download all of the sample metadata for a given study or sequence to a single csv file.

Install metagenomics toolkit

pip install -U mg-toolkit

Usage

$ mg-toolkit -h
usage: mg-toolkit [-h] [-V] [-d]
                  {original_metadata,sequence_search,bulk_download} ...

Metagenomics toolkit
--------------------

positional arguments:
  {original_metadata,sequence_search,bulk_download}
    original_metadata   Download original metadata.
    sequence_search     Search non-redundant protein database using HMMER
    bulk_download       Download result files in bulks for an entire study.

optional arguments:
  -h, --help            show this help message and exit
  -V, --version         print version information
  -d, --debug           print debugging information

Examples

Download metadata:

$ mg-toolkit original_metadata -a ERP001736

Search non-redundant protein database using HMMER and fetch metadata:

$ mg-toolkit sequence_search -seq test.fasta -db full evalue -incE 0.02

Databases:
- full - Full length sequences (default)
- all - All sequences
- partial - Partial sequences

How to bulk download result files for an entire study?

usage: mg-toolkit bulk_download [-h] -a ACCESSION [-o OUTPUT_PATH]
                                [-p {1.0,2.0,3.0,4.0,4.1,5.0}]
                                [-g {statistics,sequence_data,functional_analysis,taxonomic_analysis,taxonomic_analysis_ssu_rrna,taxonomic_analysis_lsu_rrna,non-coding_rnas,taxonomic_analysis_itsonedb,taxonomic_analysis_unite,taxonomic_analysis_motupathways_and_systems}]

optional arguments:
-h, --help            show this help message and exit
-a ACCESSION, --accession ACCESSION
                        Provide the study/project accession of your interest, e.g. ERP001736, SRP000319. The study must be publicly available in MGnify.
-o OUTPUT_PATH, --output_path OUTPUT_PATH
                        Location of the output directory, where the downloadable files are written to.
                        DEFAULT: CWD
-p {1.0,2.0,3.0,4.0,4.1,5.0}, --pipeline {1.0,2.0,3.0,4.0,4.1,5.0}
                        Specify the version of the pipeline you are interested in.
                        Lets say your study of interest has been analysed with
                        multiple version, but you are only interested in a particular
                        version then used this option to filter down the results by
                        the version you interested in.
                        DEFAULT: Downloads all versions
-g {statistics,sequence_data,functional_analysis,taxonomic_analysis,taxonomic_analysis_ssu_rrna,taxonomic_analysis_lsu_rrna,non-coding_rnas,taxonomic_analysis_itsonedb,taxonomic_analysis_unite,taxonomic_analysis_motupathways_and_systems}, --result_group {statistics,sequence_data,functional_analysis,taxonomic_analysis,taxonomic_analysis_ssu_rrna,taxonomic_analysis_lsu_rrna,non-coding_rnas,taxonomic_analysis_itsonedb,taxonomic_analysis_unite,taxonomic_analysis_motupathways_and_systems}
                        Provide a single result group if needed.
                        Supported result groups are:
                        - statistics
                        - sequence_data (all versions)
                        - functional_analysis (all versions)
                        - taxonomic_analysis (1.0-3.0)
                        - taxonomic_analysis_ssu_rrna (>=4.0)
                        - taxonomic_analysis_lsu_rrna (>=4.0)
                        - non-coding_rnas (>=4.0)
                        - taxonomic_analysis_itsonedb (>= 5.0)
                        - taxonomic_analysis_unite (>= 5.0)
                        - taxonomic_analysis_motu  (>= 5.0)
                        - pathways_and_systems (>= 5.0)
                        DEFAULT: Downloads all result groups if not provided.
                        (default: None).

How to download all files for a given study accession?

$ mg-toolkit -d bulk_download -a ERP009703

How to download results of a specific version for given study accession?

$ mg-toolkit -d bulk_download -a ERP009703 -v 4.0

How to download specific result file groups (e.g. functional analysis only) for given study accession?

$ mg-toolkit -d bulk_download -a ERP009703 -g functional_analysis

Contributors

Thanks goes to these wonderful people (emoji key):


Ola Tarkowska

💻📖

Maxim Scheremetjew

💻📖

Martin Beracochea

💻

This project follows the all-contributors specification. Contributions of any kind welcome!

Contact

If the documentation do not answer your questions, please contact us.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mg-toolkit-0.7.0.tar.gz (17.0 kB view details)

Uploaded Source

Built Distribution

mg_toolkit-0.7.0-py3-none-any.whl (20.8 kB view details)

Uploaded Python 3

File details

Details for the file mg-toolkit-0.7.0.tar.gz.

File metadata

  • Download URL: mg-toolkit-0.7.0.tar.gz
  • Upload date:
  • Size: 17.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.6.12

File hashes

Hashes for mg-toolkit-0.7.0.tar.gz
Algorithm Hash digest
SHA256 9f7d4b8845fbb3e13f25c7add6f337c402690a07bc2aa70edb97d4d53b46c203
MD5 94c40f2a4584bf8ba8d253d1b85e2054
BLAKE2b-256 af90c0ef5ea3920f81e011a73a1dbd7ddf4bd7ce00ef5c1596dea1fff776e569

See more details on using hashes here.

File details

Details for the file mg_toolkit-0.7.0-py3-none-any.whl.

File metadata

  • Download URL: mg_toolkit-0.7.0-py3-none-any.whl
  • Upload date:
  • Size: 20.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.6.12

File hashes

Hashes for mg_toolkit-0.7.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4813b202e0ac3e8712ac36bfb6423f7e8929db6955f7f8ec9168efb7276e8cc8
MD5 2e999ef82d47d33353ea69e211f239c8
BLAKE2b-256 816b127fa061f2e93e48c457c8452b70bf2e6b0319af1e080345ac0bd23444b9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page