Skip to main content

Multi-sample denovo assembly of FastQ sequences (short read)

Project description

https://badge.fury.io/py/sequana-denovo.svg https://github.com/sequana/denovo/actions/workflows/main.yml/badge.svg https://coveralls.io/repos/github/sequana/denovo/badge.svg?branch=main Python 3.8 | 3.9 | 3.10 JOSS (journal of open source software) DOI

This is is the denovo pipeline from the Sequana projet

Overview:

a de-novo assembly pipeline for short-read sequencing data

Input:

A set of FastQ files

Output:

Fasta, VCF, HTML report

Status:

production

Citation:

Cokelaer et al, (2017), ‘Sequana’: a Set of Snakemake NGS pipelines, Journal of Open Source Software, 2(16), 352, JOSS DOI doi:10.21105/joss.00352

Installation

sequana_denovo is based on Python3, just install the package as follows:

pip install sequana --upgrade

You will need third-party software such as fastqc. Please see below for details.

Usage

The following command will scan all files ending in .fastq.gz found in the local directory, create a directory called denovo/ where a snakemake pipeline is stored. Depending on the number of files and their sizes, the process may be long:

::

sequana_denovo –help sequana_denovo –input-directory DATAPATH

This creates a directory with the pipeline and configuration file. You will then need to execute the pipeline:

cd denovo
sh denovo.sh  # for a local run

This launch a snakemake pipeline. If you are familiar with snakemake, you can retrieve the pipeline itself and its configuration files and then execute the pipeline yourself with specific parameters:

snakemake -s denovo.smk -c config.yaml --cores 4 --stats stats.txt

Or use sequanix interface.

Requirements

This pipelines requires the following executable(s):

  • spades

  • busco

  • bwa

  • khmer : there is not executable called kmher but a set of executables (.e.g .normalize-by-median.py)

  • freebayes

  • picard

  • prokka

  • quast

  • spades

  • sambamba

  • samtools

https://raw.githubusercontent.com/sequana/sequana_denovo/main/sequana_pipelines/denovo/dag.png

Details

Snakemake de-novo assembly pipeline dedicates to small genome like bacteria. It is based on SPAdes. The assembler corrects reads and then assemble them using different size of kmer. If the correct option is set, SPAdes corrects mismatches and short INDELs in the contigs using BWA.

The sequencing depth can be normalised with khmer. Digital normalisation converts the existing high coverage regions into a Gaussian distributions centered around a lower sequencing depth. To put it another way, genome regions covered at 200x will be covered at 20x after normalisation. Thus, some reads from high coverage regions are discarded to reduce the quantity of data. Although the coverage is drastically reduce, the assembly will be as good or better than assembling the unnormalised data. Furthermore, SPAdes with normalised data is notably speeder and cost less memory than without digital normalisation. Above all, khmer does this in fixed, low memory and without any reference sequence needed.

The pipeline assess the assembly with several tools and approach. The first one is Quast, a tools for genome assemblies evaluation and comparison. It provides a HTML report with useful metrics like N50, number of mismatch and so on. Furthermore, it creates a viewer of contigs called Icarus.

The second approach is to characterise coverage with sequana coverage and to detect mismatchs and short INDELs with Freebayes.

The last approach but not the least is BUSCO, that provides quantitative measures for the assessment of genome assembly based on expectations of gene content from near-universal single-copy orthologs selected from OrthoDB.

Version

Description

0.10.0

  • use click / include multiqc apptainer

0.9.0

  • Major refactoring to include apptainers, use wrappers

0.8.5

  • add multiqc and use newest version of sequana

0.8.4

  • update pipeline to use new pipetools features

0.8.3

  • fix requirements (spades -> spades.py)

0.8.2

  • fix readtag, update config to account for new coverage setup

0.8.1

0.8.0

First release.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sequana_denovo-0.10.0.tar.gz (105.8 kB view details)

Uploaded Source

Built Distribution

sequana_denovo-0.10.0-py3-none-any.whl (104.0 kB view details)

Uploaded Python 3

File details

Details for the file sequana_denovo-0.10.0.tar.gz.

File metadata

  • Download URL: sequana_denovo-0.10.0.tar.gz
  • Upload date:
  • Size: 105.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.2 CPython/3.8.9 Linux/5.11.15-100.fc32.x86_64

File hashes

Hashes for sequana_denovo-0.10.0.tar.gz
Algorithm Hash digest
SHA256 54877e2a055cef6a514f59d83428ec8aafe79c92706edb40753fd22a363ac385
MD5 ade730a4fa51586daf114ce6962c7d4d
BLAKE2b-256 f7047c2f281a9c941cb18993d0360cafbbb14692742794b56ee1ea2304e89e14

See more details on using hashes here.

File details

Details for the file sequana_denovo-0.10.0-py3-none-any.whl.

File metadata

  • Download URL: sequana_denovo-0.10.0-py3-none-any.whl
  • Upload date:
  • Size: 104.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.2 CPython/3.8.9 Linux/5.11.15-100.fc32.x86_64

File hashes

Hashes for sequana_denovo-0.10.0-py3-none-any.whl
Algorithm Hash digest
SHA256 bd55e8905fcec1572763861ea86d23864d10ba84dd4d3546a5a2fe7dad3838da
MD5 04ef6262f3734312e1d5a985542dd48a
BLAKE2b-256 0c8c68761b4779e6815d43e72877446d87aed1cfd7532b8e21e9a45e69411804

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page