Skip to main content

Metagenomics toolkit

Project description

[![Build Status](https://travis-ci.org/EBI-Metagenomics/emg-toolkit.svg?branch=master)](https://travis-ci.org/EBI-Metagenomics/emg-toolkit) [![PyPi package](https://badge.fury.io/py/mg-toolkit.svg)](https://badge.fury.io/py/mg-toolkit) [![Downloads](http://pepy.tech/badge/mg-toolkit)](http://pepy.tech/project/mg-toolkit)


Metagenomics toolkit enables scientists to download all of the sample
metadata for a given study or sequence to a single csv file.


Install metagenomics toolkit
============================

pip install -U mg-toolkit


Usage
=====

$ mg-toolkit -h
usage: mg-toolkit [-h] [-V] [-d]
{original_metadata,sequence_search,bulk_download} ...

Metagenomics toolkit
--------------------

positional arguments:
{original_metadata,sequence_search,bulk_download}
original_metadata Download original metadata.
sequence_search Search non-redundant protein database using HMMER
bulk_download Download result files in bulks for an entire study.

optional arguments:
-h, --help show this help message and exit
-V, --version print version information
-d, --debug print debugging information


Examples
========

Download metadata:

$ mg-toolkit original_metadata -a ERP001736


Search non-redundant protein database using HMMER and fetch metadata:

$ mg-toolkit sequence_search -seq test.fasta -db full evalue -incE 0.02

Databases:
- full - Full length sequences (default)
- all - All sequences
- partial - Partial sequences


How to bulk download result files for an entire study?

$ mg-toolkit bulk_download -h
usage: mg-toolkit bulk_download [-h] -a ACCESSION [-o OUTPUT_PATH]
[-p {1.0,2.0,3.0,4.0,4.1}]
[-g {sequence_data,functional_analysis,taxonomic_analysis,taxonomic_analysis_ssu,taxonomic_analysis_lsu,stats,non_coding_rna}]

optional arguments:
-h, --help show this help message and exit
-a ACCESSION, --accession ACCESSION
Provide the study/project accession of your interest,
e.g. ERP001736, SRP000319. The study must be publicly
available in MGnify.
-o OUTPUT_PATH, --output_path OUTPUT_PATH
Location of the output directory, where the
downloadable files are written to. DEFAULT: CWD
-p {1.0,2.0,3.0,4.0,4.1}, --pipeline {1.0,2.0,3.0,4.0,4.1}
Specify the version of the pipeline you are interested
in. Lets say your study of interest has been analysed
with multiple version, but you are only interested in
a particular version then used this option to filter
down the results by the version you interested in.
DEFAULT: Downloads all versions
-g {sequence_data,functional_annotations,taxonomic_annotations,taxonomic_annot_ssu,taxonomic_annot_lsu,stats,non_coding_rna}, --result_group {sequence_data,functional_annotations,taxonomic_annotations,taxonomic_annot_ssu,taxonomic_annot_lsu,stats,non_coding_rna}
Provide a single result group if needed. Supported
result groups are: [sequence_data (all version),
functional_annotations (all version),
taxonomic_annotations (1.0-3.0), taxonomic_annot_ssu
(>=4.0), taxonomic_annot_lsu (>=4.0), stats,
non_coding_rna (>=4.0) DEFAULT: Downloads all result
groups if not provided. (default: None).

How to download all files for a given study accession?

$ mg-toolkit -d bulk_download -a ERP009703

How to download results of a specific version for given study accession?

$ mg-toolkit -d bulk_download -a ERP009703 -v 4.0

How to download specific result file groups (e.g. functional annotations only) for given study accession?

$ mg-toolkit -d bulk_download -a ERP009703 -g functional_annotations

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mg-toolkit-0.6.2.tar.gz (13.5 kB view details)

Uploaded Source

File details

Details for the file mg-toolkit-0.6.2.tar.gz.

File metadata

  • Download URL: mg-toolkit-0.6.2.tar.gz
  • Upload date:
  • Size: 13.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: Python-urllib/3.6

File hashes

Hashes for mg-toolkit-0.6.2.tar.gz
Algorithm Hash digest
SHA256 dd0903e33a37f4117ac4e62563988c35d55ecf2ef0764311ca0a8c161ee53e4e
MD5 4b80f6ceb8a98e70ce294932901f5001
BLAKE2b-256 0dd9943c7c688c4e62b63fabad78af5018bea4d8d8df32b41b58ca8aae2f9436

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page