Skip to main content

Flexible multi-omic pipeline system

Project description

Install with Bioconda Github Unit Tests Read the Docs Codacy grade Codecov

YMP is a tool that makes it easy to process large amounts of NGS read data. It comes “batteries included” with everything needed to preprocess your reads (QC, trimming, contaminant removal), assemble metagenomes, annotate assemblies, or assemble and quantify RNA-Seq transcripts, offering a choice of tools for each of those procecssing stages. When your needs exceed what the stock YMP processing stages provide, you can easily add your own, using YMP to drive novel tools, tools specific to your area of research, or tools you wrote yourself.

Note:

Intrigued, but think YMP doesn’t exactly fit your needs?

Missing processing stages for your favorite tool? Found a bug?

Open an issue, create a PR, or better yet, join the team!

The YMP documentation is available at readthedocs.

Features:

batteries included

YMP comes with a large number of Stages implementing common read processing steps. These stages cover the most common topics, including quality control, filtering and sorting of reads, assembly of metagenomes and transcripts, read mapping, community profiling, visualisation and pathway analysis.

For a complete list, check the documentation or the source.

get started quickly

Simply point YMP at a folder containing read files, at a mapping file, a list of URLs or even an SRA RunTable and YMP will configure itself. Use tab expansion to complete your desired series of stages to be applied to your data. YMP will then proceed to do your bidding, downloading raw read files and reference databases as needed, installing requisite software environments and scheduling the execution of tools either locally or on your cluster.

explore alternative workflows

Not sure which assembler works best for your data, or what the effect of more stringent quality trimming would be? YMP is made for this! By keeping the output of each stage in a folder named to match the stack of applied stages, YMP can manage many variant workflows in parallel, while minimizing the amount of duplicate computation and storage.

go beyond the beaten path

Built on top of Bioconda and Snakemake, YMP is easily extended with your own Snakefiles, allowing you to integrate any type of processing you desire into YMP, including your own, custom made tools. Within the YMP framework, you can also make use of the extensions to the Snakemake language provided by YMP (default values, inheritance, recursive wildcard expansion, etc.), making writing rules less error prone and repetative.

Background

Bioinformatical data processing workflows can easily get very complex, even convoluted. On the way from the raw read data to publishable results, a sizeable collection of tools needs to be applied, intermediate outputs verified, reference databases selected, and summary data produced. A host of data files must be managed, processed individually or aggregated by host or spatial transect along the way. And, of course, to arrive at a workflow that is just right for a particular study, many alternative workflow variants need to be evaluated. Which tools perform best? Which parameters are right? Does re-ordering steps make a difference? Should the data be assembled individually, grouped, or should a grand co-assembly be computed? Which reference database is most appropriate?

Answering these questions is a time consuming process, justifying the plethora of published ready made pipelines each providing a polished workflow for a typical study type or use case. The price for the convenience of such a polished pipeline is the lack of flexibility - they are not meant to be adapted or extended to match the needs of a particular study. Workflow management systems on the other hand offer great flexibility by focussing on the orchestration of user defined workflows, but typicially require significant initial effort as they come without predefined workflows.

YMP strives to walk the middle ground between these. It brings everything needed to classic metagenome and RNA-Seq workflows, yet built on the workflow management system Snakemake, it can be easily expanded by simply adding Snakemake rules files. Designed around the needs of processing primarily multi-omic NGS read data, it brings a framework for handling read file meta data, provisioning reference databases, and organizing rules into semantic stages.

Working with the Github Development Version

Installing from GitHub

  1. Clone the repository:

    git clone  --recurse-submodules https://github.com/epruesse/ymp.git

    Or, if your have github ssh keys set up:

    git clone --recurse-submodules git@github.com:epruesse/ymp.git
  2. Create and activate conda environment:

    conda env create -n ymp --file environment.yaml
    source activate ymp
  3. Install YMP into conda environment:

    pip install -e .
  4. Verify that YMP works:

    source activate ymp
    ymp --help

Updating Development Version

Usually, all you need to do is a pull:

git pull
git submodule update --recursive --remote

If environments where updated, you may want to regenerate the local installations and clean out environments no longer used to save disk space:

source activate ymp
ymp env update
ymp env clean
# alternatively, you can just delete existing envs and let YMP
# reinstall as needed:
# rm -rf ~/.ymp/conda*
conda clean -a

If you see errors before jobs are executed, the core requirements may have changed. To update the YMP conda environment, enter the folder where you installed YMP and run the following:

source activate ymp
conda env update --file environment.yaml

If something changed in setup.py, a re-install may be necessary:

source activate ymp
pip install -U -e .

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ymp-0.3.2.tar.gz (1.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ymp-0.3.2-py36-none-any.whl (1.6 MB view details)

Uploaded Python 3.6

File details

Details for the file ymp-0.3.2.tar.gz.

File metadata

  • Download URL: ymp-0.3.2.tar.gz
  • Upload date:
  • Size: 1.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ymp-0.3.2.tar.gz
Algorithm Hash digest
SHA256 5ae961a45d4c07930e9e660a27569b0af72efc0938dd5422673ffd5144bc30d1
MD5 cf5e60651589d7b67218f7ba371cfb9f
BLAKE2b-256 c5a4a6239ce90b0188236ef33c44399290edf22939093d856e4906ccdf82f23a

See more details on using hashes here.

File details

Details for the file ymp-0.3.2-py36-none-any.whl.

File metadata

  • Download URL: ymp-0.3.2-py36-none-any.whl
  • Upload date:
  • Size: 1.6 MB
  • Tags: Python 3.6
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ymp-0.3.2-py36-none-any.whl
Algorithm Hash digest
SHA256 55542b895ef36d3bd188e3bccb42ded66fbb5cfacd5403c651ab696206d6f085
MD5 94f9b2fb9c9315b40ef43c7f4b075aff
BLAKE2b-256 0c7af05246b43ce7ce6ead0b8d08e4c774c3b869ea67523b96301c4ad82b19a2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page