Skip to main content

MONSDA, Modular Organizer of Nextflow and Snakemake driven hts Data Analysis

Project description

MONSDA

Documentation Status install with bioconda PyPI Latest Release PRs DOI

Welcome to MONSDA, Modular Organizer of Nextflow and Snakemake driven hts Data Analysis

Automating HTS analysis from data download, preprocessing and mapping to postprocessing/analysis and track generation centered on a single config file. MONSDA can create Snakemake and Nextflow workflows based on user defined configuration. These workflows can either be saved to disk for manual inspection and execution or automatically executed.

For details on Snakemake and Nextflow and their features please refer to the corresponding Snakemake or Nextflow documentation.

In general it is necessary to write a configuration file containing information on paths, files to process and settings beyond default for mapping tools and others. The template on which analysis is based can be found in the config directory.

For MONSDA to be as FAIR as possible, one needs to use conda or the faster drop-in replacement mamba or conda-libmamba-solver which is a new (experimental) solver for the conda package manager and speeds up conda without the need to install mamba. For details on either please refer to the corresponding conda or mamba or conda-libmamba-solver manual.

This workflow collection makes heavy use of conda and especially the bioconda channel.

Install MONSDA via conda or pip

To install via conda/mamba simply run

conda install -c bioconda -c conda-forge monsda

To install via pip you first need to create the MONSDA environment as found in the envs directory of this repository like so:

conda env create -n monsda -f MONSDA/envs/monsda.yaml

The envs directory holds all the environments needed to run the pipelines in the workflows directory, these will be installed automatically alongside MONSDA.

For that activate the monsda environment and run pip

conda activate monsda
pip install MONSDA

More information can be found in the official documentation

How does it work

This repository hosts the executable MONSDA.py which acts a wrapper around Snakemake and the config.json file. The config.json holds all the information that is needed to run the jobs and will be parsed by MONSDA.py and split into sub-configs that can later be found in the directory SubSnakes or SubFlows respectively.

To successfully run an analysis pipeline, a few steps have to be followed:

  • Directory structure: The structure for the directories is dictated by the condition-tree in the config file
  • Config file: This is the central part of the analysis. Depending on this file MONSDA.py will determine processing steps and generate according config and Snakemake/Nextflow workflow files to run each subworkflow until all processing steps are done.

Run the pipeline

Run

monsda

to see the help and available options that will be passed through to Snakemake or Nextflow.

and

monsda_configure

To spin up the configurator that guides you through the creation of config.json files.

Once a config.json is available you can start a Snakemake run with

monsda -j ${THREADS} --configfile ${CONFIG}.json --directory ${PWD} --conda-frontend mamba --conda-prefix ${PATH_TO_conda_envs}

and add additional arguments for Snakemake as you see fit.

For a Nextflow run use

monsda --nextflow -j ${THREADS} --configfile ${CONFIG}.json --directory ${PWD}

and add additional arguments for Nextflow as you see fit.

Run on workload manager

####SLURM

You can either use the slurm profile adapted from Snakemake-Profiles that can be found in the profile_Snakemake directory, or go through the process of manually creating one, either using the cookiecutter example in the Snakemake-Profiles repository or on your own. For Nextflow a minimalist's example profile can be found under profile_Nextflow.

Then run

monsda -j ${THREADS} --configfile ${CONFIG}.json --directory ${PWD} --conda-frontend mamba --profile ${SLURMPROFILE} --conda-prefix ${PATH_TO_conda_envs}

or

export NXF_EXECUTOR=slurm; monsda --nextflow -j ${THREADS} --configfile ${CONFIG}.json --directory ${PWD}

respectively.

For other workload managers please refer to the documentation of Snakemake and Nextflow.

Contribute

If you like this project, are missing features, want to contribute or file bugs please leave an issue or contact me directly.

To contribute new tools feel free to adopt existing ones, there should be a number of examples available that cover implementation details for almost all sorts of tools. If you need to add new python/groovy functions for processing of options or parameters add them to the corresponding file in the MONSDA directory. New environments go into the envs directory, new subworkflows into the workflows directory. Do not forget to also extend the template.json and add some documentation.

PRs always welcome.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

monsda-1.2.8.tar.gz (312.0 kB view details)

Uploaded Source

Built Distribution

MONSDA-1.2.8-py3-none-any.whl (449.7 kB view details)

Uploaded Python 3

File details

Details for the file monsda-1.2.8.tar.gz.

File metadata

  • Download URL: monsda-1.2.8.tar.gz
  • Upload date:
  • Size: 312.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.2

File hashes

Hashes for monsda-1.2.8.tar.gz
Algorithm Hash digest
SHA256 dcfb363649328568fef304d25bb0278155f94e954b207d8f841a452fdaf67024
MD5 862cf9562fc1714dc84d27157faf50b2
BLAKE2b-256 c632a3939245c81c7552b1a487c8aacf11d1cd2b4d6b7ce612fa74218f49780c

See more details on using hashes here.

File details

Details for the file MONSDA-1.2.8-py3-none-any.whl.

File metadata

  • Download URL: MONSDA-1.2.8-py3-none-any.whl
  • Upload date:
  • Size: 449.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.2

File hashes

Hashes for MONSDA-1.2.8-py3-none-any.whl
Algorithm Hash digest
SHA256 50828da20b75bb0014afd1e2153efe3097b2cd8df6b97d2970ab721f0f520de7
MD5 ad1cb712a0f78da02b712f83a5ae0801
BLAKE2b-256 e2b466c569d2050828ea1f0e5b66f2ba846c78b9cccdf85a7d5ac657d22fa6d9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page