A fastqc pipeline from sequana project.
Project description
This is is the fastqc pipeline from the Sequana projet
- Overview:
Runs fastqc and multiqc on a set of Sequencing data to produce control quality reports
- Input:
A set of FastQ files (paired or single-end) compressed or not
- Output:
an HTML file summary.html (individual fastqc reports, mutli-samples report)
- Status:
production
- Wiki:
- Documentation:
This README file, the Wiki from the github repository (link above) and https://sequana.readthedocs.io
- Citation:
Cokelaer et al, (2017), ‘Sequana’: a Set of Snakemake NGS pipelines, Journal of Open Source Software, 2(16), 352, JOSS DOI https://doi:10.21105/joss.00352
Installation
You must install Sequana first (use –upgrade to get the latest version installed):
pip install sequana --upgrade
Then, just install this package:
pip install sequana_fastqc --upgrade
Usage
This command will scan all files ending in .fastq.gz found in the local directory, create a directory called fastqc/ where a snakemake pipeline is launched automatically. Depending on the number of files and their sizes, the process may be long:
sequana_fastqc --run
To know more about the options (e.g., add a different pattern to restrict the execution to a subset of the input files, change the output/working directory, etc):
sequana_pipelines_fastqc --help sequana_pipelines_fastqc --input-directory DATAPATH
This creates a directory fastq. You just need to execute the pipeline:
cd fastqc sh fastqc.sh # for a local run
This launch a snakemake pipeline. If you are familiar with snakemake, you can retrieve the fastqc.rules and config.yaml files and then execute the pipeline yourself with specific parameters:
snakemake -s fastqc.rules --cores 4 --stats stats.txt
Or use sequanix interface.
Please see the Wiki for more examples and features.
Tutorial
You can retrieve test data from sequana_fastqc (https://github.com/sequana/sequana_fastqc) or type:
wget https://raw.githubusercontent.com/sequana/sequana_fastqc/master/sequana_pipelines/fastqc/data/data_R1_001.fastq.gz wget https://raw.githubusercontent.com/sequana/sequana_fastqc/master/sequana_pipelines/fastqc/data/data_R2_001.fastq.gz
then, prepare the pipeline:
sequana_fastqc --input-directory . cd fastqc sh fastq.sh # once done, remove temporary files (snakemake and others) make clean
Just open the HTML entry called summary.html. A multiqc report is also available. You will get expected images such as the following one:
Please see the Wiki for more examples and features.
Requirements
This pipelines requires the following executable(s):
fastqc
falco (optional)
sequana (Python: pip install sequana)
For Linux users, we provide a singularity image available through damona:
pip install damona damona install fastqc # and add the ~/.config/damona/bin path to your binary PATH
Details
This pipeline runs fastqc in parallel on the input fastq files (paired or not) and then execute multiqc. A brief sequana summary report is also produced.
Rules and configuration details
Here is the latest documented configuration file to be used with the pipeline. Each rule used in the pipeline may have a section in the configuration file.
Changelog
Version |
Description |
---|---|
1.1.0 |
|
1.0.1 |
|
1.0.0 |
|
0.9.15 |
|
0.9.14 |
|
0.9.13 |
|
0.9.12 |
|
0.9.11 |
|
0.9.10 |
|
0.9.9 |
|
0.9.8 |
|
0.9.7 |
|
0.9.6 |
add the readtag option |
Contribute & Code of Conduct
To contribute to this project, please take a look at the Contributing Guidelines first. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.