Multi-sample denovo assembly of FastQ sequences (short read)
Project description
This is the denovo pipeline from the Sequana project.
- Overview:
De-novo assembly pipeline for short-read Illumina data (bacterial genomes)
- Input:
A set of paired or single-end FastQ files
- Output:
Assembled FASTA contigs, annotation (GFF/GenBank), variant calls (VCF), HTML reports
- Status:
Production
- Documentation:
This README and https://sequana.readthedocs.io
- Citation:
Cokelaer et al, (2017), ‘Sequana’: a Set of Snakemake NGS pipelines, Journal of Open Source Software, 2(16), 352, https://doi.org/10.21105/joss.00352
Installation
If you already have all requirements, install the package with pip:
pip install sequana_denovo --upgrade
You will need third-party tools (spades, prokka, quast, etc.). Install all dependencies at once:
mamba env create -f environment.yml
Usage
Scan FastQ files in a directory and set up the pipeline (replace DATAPATH with your input directory):
sequana_denovo --input-directory DATAPATH
To skip Prokka annotation:
sequana_denovo --input-directory DATAPATH --skip-prokka
To tune SPAdes memory (default 64 Gb) and digital normalisation:
sequana_denovo --input-directory DATAPATH --spades-memory 32 --digital-normalisation-max-memory-usage 1e9
This creates a denovo/ directory with the pipeline and configuration file. Execute the pipeline locally:
cd denovo sh denovo.sh
If you are familiar with Snakemake, you can also run the pipeline directly:
snakemake -s denovo.rules --cores 4 --stats stats.txt
See .sequana/profile/config.yaml to tune Snakemake behaviour (cores, cluster settings, etc.).
Usage with apptainer
With apptainer, initiate the working directory as follows:
sequana_denovo --input-directory DATAPATH --use-apptainer
Images are downloaded in the working directory. To store them in a shared location:
sequana_denovo --input-directory DATAPATH --use-apptainer --apptainer-prefix ~/.sequana/apptainers
Then run as usual:
cd denovo sh denovo.sh
Requirements
This pipeline requires the following executables (install via bioconda/conda):
spades or unicycler — de-novo assembler (--assembler option)
khmer — digital normalisation (normalize-by-median.py, filter-abund.py, etc.)
quast — assembly quality assessment
prokka — genome annotation (optional, --skip-prokka)
busco — assembly completeness assessment (optional)
checkm-genome — genome completeness and contamination (optional)
bwa + sambamba — read mapping back to assembly
freebayes — variant calling
samtools — BAM/SAM processing
seqkit — contig filtering by length
blast — taxonomic identification of contigs (optional)
multiqc — aggregated HTML report
graphviz — pipeline DAG image
Details
This Snakemake pipeline assembles bacterial (or other small) genomes from short Illumina reads.
Digital normalisation (khmer): optionally reduces sequencing depth to a target coverage level, discarding redundant reads. This lowers memory usage and speeds up assembly without significantly impacting quality.
Assembly: SPAdes (default) or Unicycler. SPAdes uses multiple k-mer sizes and is recommended for most bacterial genomes. Unicycler is designed for hybrid or circular assemblies.
Quality assessment (QUAST): reports assembly statistics (N50, # contigs, total length, GC%, coverage depth) with an interactive Icarus contig browser.
Annotation (Prokka): rapid prokaryotic genome annotation producing GFF, GenBank, and other standard formats.
Coverage analysis (sequana_coverage): reads are mapped back to the assembly with BWA, duplicates flagged with Sambamba, and per-contig coverage profiles computed and visualised.
Variant calling (Freebayes): detects SNPs and small indels between the assembled consensus and the mapped reads.
Completeness (BUSCO / CheckM): optionally assess assembly completeness against conserved single-copy orthologs (BUSCO) or lineage-specific marker genes (CheckM).
Taxonomic identification (BLAST): optionally BLASTs the top contigs against the nt database to identify their taxonomy.
A summary HTML report (summary.html) with per-sample assembly statistics and embedded coverage plots is generated at the end of the run, alongside a MultiQC report.
Rules and configuration details
See the latest documented configuration file for all available parameters.
Changelog
Version |
Description |
|---|---|
0.12.0 |
|
0.11.1 |
|
0.11.0 |
|
0.10.0 |
|
0.9.0 |
|
0.8.5 |
|
0.8.4 |
|
0.8.3 |
|
0.8.2 |
|
0.8.1 |
|
0.8.0 |
First release. |
Contribute & Code of Conduct
To contribute to this project, please take a look at the Contributing Guidelines first. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sequana_denovo-0.12.0.tar.gz.
File metadata
- Download URL: sequana_denovo-0.12.0.tar.gz
- Upload date:
- Size: 108.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.0.1 CPython/3.10.14 Linux/6.14.5-100.fc40.x86_64
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e97dfe3c872747dc2de01ac39ec34b080555a7204cec0c430638cd3c1f5b62db
|
|
| MD5 |
09407a44005254275bff27ed8bee1e77
|
|
| BLAKE2b-256 |
1f0ddc15946ca550c47ddf28014cca6f5b63d68b9e2d8991f1fbe9d6b4f5da2c
|
File details
Details for the file sequana_denovo-0.12.0-py3-none-any.whl.
File metadata
- Download URL: sequana_denovo-0.12.0-py3-none-any.whl
- Upload date:
- Size: 107.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.0.1 CPython/3.10.14 Linux/6.14.5-100.fc40.x86_64
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
32c03564d8d996ba43a27073b8b035f13cfc69fbbecc5e8f4e2c754d55a8a120
|
|
| MD5 |
5c19c741bb0fcbbff86f647e4fea75c3
|
|
| BLAKE2b-256 |
2e07669eb5e88ae95cdc5ead05963cae1f002bf5a80bcb2a40e60ccd5ef19a87
|