Skip to main content

MetaPathways is a modular pipeline to build PGDBs from Metagenomic sequences.

Project description

# MetaPathways 2: A master-worker model for environmental Pathway/Genome Database construction on grids and clouds

Niels W. Hanson, Kishori M. Konwar, Shang-Ju Wu, and Steven J. Hallam

## Updates

November 27, 2014: [MetaPathways v2.5 released](https://github.com/hallamlab/metapathways2/releases/tag/v2.5) with upgrades to the pipeline:

  • LAST homology searches with BLAST-equivalent output and E-values

  • Reads per kilobase per million mapped (RPKM) coverage measure for Contig annotations calculated from raw reads (.fastq) or mapping files (.SAM) using [bwa](http://bio-bwa.sourceforge.net)

  • Addition of the [CAZy sequence database](http://www.cazy.org) as a new compatible functional hierachy

  • GUI Keyword-search from annotation subsetting and projection onto different functional hierarcies (KEGG, COG, SEED, MetaCyc, and now CAZy)

See [the release page](https://github.com/hallamlab/metapathways2/releases/tag/v2.5) and [the wiki](https://github.com/hallamlab/metapathways2/wiki) for more information.

## Abstract

The development of high-throughput sequencing technologies over the past decade has generated a tidal wave of environmental sequence information from a variety of natural and human engineered ecosystems. The resulting flood of infor- mation into public databases and archived sequencing projects has exponentially expanded computational resource requirements rendering most local homology-based search methods inefficient. We recently introduced MetaPathways v1.0, a modular annotation and analysis pipeline for constructing environmental Pathway/Genome Databases (ePGDBs) from environmental sequence information capable of using the Sun Grid engine for external resource partitioning. However, a command-line interface and facile task management introduced user activation barriers with concomitant decrease in fault tolerance.

Here we present MetaPathways v2.0 incorporating a graphical user interface (GUI) and refined task management methods. The MetaPathways GUI provides an intuitive display for setup and process monitoring and supports interactive data visualization and sub-setting via a custom Knowledge Engine data structure. A master-worker model is adopted for task management allowing users to scavenge computational results from a number of worker grids in an ad hoc, asynchronous, distributed network that dramatically increases fault tolerance. This model facilitates the use of EC2 instances extending ePGDB construction to the Amazon Elastic Cloud.

## Installation

MetaPathways v2.5 requires Python 2.7 or greater and [Pathway Tools](http://bioinformatics.ai.sri.com/ptools/) developed by SRI International for full functionality.

The MetaPathways Python codebase as well as the compiled GUI binaries for Mac OSX and Ubuntu are self-contained in this GitHub distro. GUI source code can be [obtained here](https://github.com/hallamlab/MetaPathwaysGUI).

Please see the [MetaPathways v2.5 wiki](https://github.com/hallamlab/metapathways2/wiki) for more installation details.

A template [MetaPathways_DBs.zip (Updated: October 2014)](https://www.dropbox.com/s/ye3kpve041e0r39/MetaPathways_DBs.zip?dl=0) contains starter protein and taxonomic databases ### Installation steps and information

  • The folder with the script where MetaPathways.py is referred to as METAPATHWAYS_FOLDER

  • copy over the files template_config.txt and tempalate_params.txt to the folder METAPATHWAYS_FOLDER

  • the LAST, BLAST and other third party exectables should be in a subfolder in METAPATHWAYS_FOLDER

  • the formatted databases should be in a separate folder, we refer to it as METAPATHWAYS_DB folder

  • the template_config.txt files should be pointed to the the actual folders

  • the tempalte_params.txt is use while running the pipeline

  • before running the script the user must run “source <METAPATHWAYS_FOLDER>/MetaPathwaysrc

## Citation

If using MetaPathways v2.0 for reserach work please cite:

Niels W. Hanson, Kishori M. Konwar, Shang-Ju Wu, Steven J. Hallam. MetaPathways v2.0: A master-worker model for environmental Pathway/Genome Database construction on grids and clouds. Proceedings of the 2014 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB 2014), Honolulu, HI, USA, May 21-24, 2014. [doi:10.1109/CIBCB.2014.6845516](http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6845516)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

MetaPathways-3.1.6.tar.gz (179.6 kB view details)

Uploaded Source

File details

Details for the file MetaPathways-3.1.6.tar.gz.

File metadata

  • Download URL: MetaPathways-3.1.6.tar.gz
  • Upload date:
  • Size: 179.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/40.2.0 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.5.6

File hashes

Hashes for MetaPathways-3.1.6.tar.gz
Algorithm Hash digest
SHA256 1030d4adafd368a1b2790771496caa10cf66a96ab17ea51c0177b9b870daa121
MD5 dc17ad71ca2698fd40cc2cf7bf04e69c
BLAKE2b-256 0a83f33c3af1f61a79de3d32904cbe18559dca4ddd7c679f410e47b1fe1f4de4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page