Downsample NGS data sets (FastQ/FastA) using the Sequana framework
Project description
This is the downsampling pipeline from the Sequana project.
- Overview:
Downsample NGS data sets (FastQ or FastA).
- Input:
A set of FastQ or FastA files (single or paired-end).
- Output:
Downsampled FastQ or FastA files.
- Status:
Production
- Citation:
Cokelaer et al, (2017), ‘Sequana’: a Set of Snakemake NGS pipelines, Journal of Open Source Software, 2(16), 352, JOSS DOI https://doi.org/10.21105/joss.00352
Installation
pip install sequana_downsampling --upgrade
You will also need pigz available on your PATH.
Quick Start
1. Set up the pipeline:
sequana_downsampling --input-directory DATAPATH
2. Run the pipeline:
cd downsampling bash downsampling.sh
Usage
sequana_downsampling --help
Key pipeline-specific options:
- --downsampling-input-format
Input format: fastq (default), fasta, or sam.
- --downsampling-method
random (default, keeps a fixed number of reads) or random_pct (keeps a percentage of reads).
- --downsampling-max-entries
Number of reads to keep when using random (default: 1000).
- --downsampling-percent
Percentage of reads to keep when using random_pct (default: 10).
- --downsampling-threads
Number of threads used by pigz to compress output (default: 4).
Examples:
sequana_downsampling --input-directory DATAPATH \
--downsampling-method random --downsampling-max-entries 100
sequana_downsampling --input-directory DATAPATH \
--downsampling-method random_pct --downsampling-percent 10 \
--downsampling-input-format fasta --input-pattern "*.fasta"
Run on a SLURM cluster:
cd downsampling sbatch downsampling.sh
Or drive Snakemake directly:
snakemake -s downsampling.rules --cores 4 --stats stats.txt
Requirements
The following tools must be available (install via conda/bioconda):
mamba env create -f environment.yml
sequana — FastQ/FastA selection (Python API)
pigz — parallel gzip compression of outputs
Pipeline overview
The pipeline randomly selects reads from the input files (single or paired). If the inputs are paired, the one-to-one mapping between R1 and R2 is preserved. FastQ inputs can be gzipped; outputs are gzipped with pigz. FastA inputs and outputs are uncompressed.
Configuration
Here is the latest documented configuration file. Key sections:
downsampling — method (random / random_pct), max_entries, percent, threads, and input_format (fastq / fasta)
Changelog
Version |
Description |
|---|---|
0.10.0 |
|
0.9.0 |
|
0.8.5 |
|
0.8.4 |
|
0.8.3 |
|
0.8.2 |
|
0.8.1 |
|
0.8.0 |
First release. |
Contribute & Code of Conduct
To contribute to this project, please take a look at the Contributing Guidelines first. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sequana_downsampling-0.10.0.tar.gz.
File metadata
- Download URL: sequana_downsampling-0.10.0.tar.gz
- Upload date:
- Size: 6.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1ded10d42878ec9254c49203be0640189d1551f4ee704cdfbe1c72075028e74b
|
|
| MD5 |
a4bb354606aa1a95cb64af9c19ab3336
|
|
| BLAKE2b-256 |
f8605ae81e3b161570bf99dbd98b4320590e2306021ce29094a8d8a39ff4b7ef
|
File details
Details for the file sequana_downsampling-0.10.0-py3-none-any.whl.
File metadata
- Download URL: sequana_downsampling-0.10.0-py3-none-any.whl
- Upload date:
- Size: 8.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
feeff188fb18944785910a155fbe9c98f4652cb2974952b8e9a5f27c0c35b6b9
|
|
| MD5 |
d49711c62c0bca523bef788c4d6501b9
|
|
| BLAKE2b-256 |
451e17865758af4de87224749fc23f89dbe5702f03746b06692e66b6fc79b4b9
|