Skip to main content

Merge barcoded or non barcoded fastq files generated by Nanopore runs

Project description

https://badge.fury.io/py/sequana-nanomerge.svg JOSS (journal of open source software) DOI https://github.com/sequana/nanomerge/actions/workflows/main.yml/badge.svg

This is is the nanomerge pipeline from the Sequana project

Overview:

merge fastq files generated by Nanopore run and generated raw data QC.

Input:

individual fastq files generated by nanopore demultiplexing

Output:

merged fastq files for each barcode (or unique sample)

Status:

production

Citation:

Cokelaer et al, (2017), ‘Sequana’: a Set of Snakemake NGS pipelines, Journal of Open Source Software, 2(16), 352, JOSS DOI doi:10.21105/joss.00352

Installation

You can install the packages using pip:

pip install sequana_nanomerge --upgrade

An optional requirements is pycoQC, which can be install with conda/mamba using e.g.:

conda install pycoQC.

Usage

sequana_nanomerge --help

If you data is barcoded, they are usually in sub-directories barcoded/barcodeXY:

sequana_nanomerge --input-directory DATAPATH/barcoded --samplesheet samplesheet.csv
    --summary summary.txt --input-pattern '*/*fastq.gz'

otherwise all fastq files are in DATAPATH/:

sequana_nanomerge --input-directory DATAPATH --samplesheet samplesheet.csv
    --summary summary.txt --input-pattern '*fastq.gz'

The –summary is optional and takes as input the output of albacore demultiplexing. usually a file called sequencing_summary.txt

Note that the different between the two is the extra */ before the *.fastq.gz pattern since barcoded files are in individual subdirectories..

In both bases, the command creates a directory with the pipeline and configuration file. You will then need to execute the pipeline:

cd nanomerge
sh nanomerge.sh  # for a local run

This launch a snakemake pipeline. If you are familiar with snakemake, you can retrieve the pipeline itself and its configuration files and then execute the pipeline yourself with specific parameters:

snakemake -s nanomerge.rules -c config.yaml --cores 4 --stats stats.txt

Or use sequanix interface.

COncerning the sample sheet, whther your data is barcoded or not, it should be a CSV file

barcode,project,sample
barcode01,main,A
barcode02,main,B
barcode03,main,C

For a non-barcoded run, you must provide a file where the barcode column can be set (empty) or just removed:

barcode,project,sample
,main,A

or:

project,sample
main,A

Usage with apptainer::

With apptainer, initiate the working directory as follows:

sequana_nanomerge --use-apptainer

Images are downloaded in the working directory but you can store then in a directory globally (e.g.):

sequana_nanomerge --use-apptainer --apptainer-prefix ~/.sequana/apptainers

and then:

cd nanomerge
sh nanomerge.sh

if you decide to use snakemake manually, do not forget to add apptainer options:

snakemake -s nanomerge.rules -c config.yaml --cores 4 --stats stats.txt --use-apptainer --apptainer-prefix ~/.sequana/apptainers --apptainer-args "-B /home:/home"

Requirements

This pipelines requires the following executable(s), which is optional:

  • pycoQC

https://raw.githubusercontent.com/sequana/nanomerge/main/sequana_pipelines/nanomerge/dag.png

Details

This pipeline runs nanomerge in parallel on the input fastq files (paired or not). A brief sequana summary report is also produced.

Rules and configuration details

Here is the latest documented configuration file to be used with the pipeline. Each rule used in the pipeline may have a section in the configuration file.

Changelog

Version

Description

1.0.0

Stable release ready for production

0.0.1

First release.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sequana_nanomerge-1.0.0.tar.gz (22.0 kB view details)

Uploaded Source

File details

Details for the file sequana_nanomerge-1.0.0.tar.gz.

File metadata

  • Download URL: sequana_nanomerge-1.0.0.tar.gz
  • Upload date:
  • Size: 22.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.6.0 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.8.9

File hashes

Hashes for sequana_nanomerge-1.0.0.tar.gz
Algorithm Hash digest
SHA256 1b19a73a937c6ceaf9e3f3ccfed04344642bc7ae677da07f810a3b53635037dc
MD5 f2b6a8d2062e3bf1aa6ac89cf6edc10b
BLAKE2b-256 497938c3dc77c46dd513af7073d67b8497ac4b2389d46748570a924804c6dc9b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page