A Snakemake-based pipeline for amplicon processing
Project description
eDentity-metabarcoding-pipeline
Overview
eDentity is a Snakemake based metabarcoding workflow designed for Illumina/AVITI paired-end data. It automates Vsearch commands to denoise paired-end Fastq sequences and generate Exact Sequence Variants (ESVs). The pipeline is inspired by APSCALE; please cite them if you use this pipeline.
Installation
Install edentity alongside its dependencies with the command below;
conda create -n edentity-env \
python=3.12.8 \
fastp=0.24.0 \
cutadapt=4.9 \
vsearch=2.28.1 \
biopython=1.84 \
multiqc=1.27.1 \
nbitk=0.5.9 \
"edentity>1.4.8" \
polars=1.30.0 \
-c conda-forge -c bioconda -y && \
conda activate edentity-env
Usage
After installation, the pipeline can be run from the command line. Parameters can be provided either directly via command line arguments or through a configuration file.
Using Command Line Arguments
Replace the example parameters with those specific to your project:
edentity --raw_data_dir /path/to/your/raw_fastq_files/ \
--work_dir /path/to/your/work_directory \
--forward_primer pcr primer sequence \
--reverse_primer pcr primer sequence \
--min_length 200 \
--max_length 600
Using a Configuration File
Create a params_config.yaml file and copy the YAML template below into it. Adjust the parameters to your project specifications:
# project specific
raw_data_dir: # "/path/to/your/raw_fastq_files/"
work_dir: # "path/to/your/work_directory"
make_json_reports: False
dataType: "Illumina" # [Illumina, AVITI], one of the two
cpu_cores: 20
# general quality control (Fastp)
average_qual: 25
length_required: 100
n_base_limit: 0
# PE_merging (these are set to vsearch default values)
maxdiffpct: 100
maxdiffs: 10
minovlen: 10
# primer_trimming (cutadapt)
forward_primer:
reverse_primer:
anchoring: False
discard_untrimmed: True
# quality_filtering (vsearch)
min_length: 100
max_length: 600
maxEE: 1
# dereplication (vsearch)
fasta_width: 0
# denoising (vsearch)
alpha: 2
minsize: 4
Then run the pipeline with:
edentity --config_file params_config.yaml
Parameters:
--forward_primer: Forward primer sequence.--reverse_primer: Reverse primer sequence.--raw_data_dir: Directory containing your raw sequencing data.--work_dir: Directory for pipeline outputs and intermediate files.--make_json_reports: Set true to create extended json reports
Configuring Snakemake Parameters via Profile
You can control Snakemake-specific parameters (such as cluster execution, resource limits, and rerun-incomplete ...) using a profile YAML configuration. This is useful for running the pipeline on HPC clusters or customizing workflow execution.
Create a snakemake-profile.yaml file with content like:
executor: local # clusters e.g slurm, lsf, aws-batch ... see snakemake documentation
jobs: "30"
max-jobs-per-second: "10"
max-status-checks-per-second: "10"
local-cores: 44
latency-wait: "30"
printshellcmds: "True"
rerun-incomplete: "False"
keep-incomplete: "True"
conda-cleanup-envs: "False"
dryrun: true
resources:
mem_mb: 16000
threads: 8
executor: Cluster scheduler (e.g., SLURM).jobs: Maximum number of parallel jobs.resources: Default resource limits for jobs.dryrun: Set totrueto perform a dry-run (no jobs will be executed).
For more details on these and other Snakemake parameters, see the Snakemake documentation.
To use this profile, run:
edentity --profile snakemake-profile.yaml --config_file params_config.yaml
Snakemake parameters can also be provided directly via the command line,
but they must be specified in their long form (e.g., --jobs instead of -j).
Command-line parameters take precedence over those defined in the profile configuration file or the default parameters.
For example, you can use both a profile configuration file and override specific parameters via the command line:
edentity --profile snakemake-profile.yaml --config_file params_config.yaml \
--jobs 50 --latency-wait 60 --until merge
In this example:
- The
--config_fileoption specifies the parameters specific to eDentity, such as input directories, primers, and quality control settings. - The
--profileoption specifies the Snakemake profile configuration file, which controls the behavior of Snakemake, such as job execution, resource limits, and cluster settings. - The
--jobs,--latency-wait, and--untilparameters override the corresponding values in the profile configuration file. - Command-line parameters always take priority over the profile or default settings.
For a full list of options params:
edentity --help
Pipeline Output Directory Structure
After successful execution, the pipeline generates a structured set of output directories and files within your specified work_dir. All file names are prefixed with your work_dir. The main components are:
work_dir/
│ ├── Results/
│ │ ├── ESVs_fasta/ # Directory containing FASTA file of ESVs
│ │ └── reports/ # Reports generated by the pipeline
│ │ ├── ESV_table.tsv # Table of Exact Sequence Variants (ESVs)
│ │ ├── summary_report.tsv # Summary statistics for the run
│ │ ├── metabarcoding_run.json # JSON report with run metadata and parameters
│ │ └── multiqc_report/ # Directory containing MultiQC output
│ │ └── multiqc.html # Interactive MultiQC report
├── logs/ # log files for each step of the pipeline
├── edentity_pipeline_settings/ # Stores configuration files used for the pipeline run
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file edentity-1.5.2.tar.gz.
File metadata
- Download URL: edentity-1.5.2.tar.gz
- Upload date:
- Size: 74.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ce19a45758a4f5226598c6e00ee7cfc18e9a6ca0f362ff3a3f573a9b1f605db9
|
|
| MD5 |
25409340213d317cf860007d5e200bed
|
|
| BLAKE2b-256 |
b698111c99495f55e85d0ca0a9063761b14042df260e2817047215d0f52cd729
|
File details
Details for the file edentity-1.5.2-py3-none-any.whl.
File metadata
- Download URL: edentity-1.5.2-py3-none-any.whl
- Upload date:
- Size: 42.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
341776d3f1335e5fc47293451328ca7e9bad7a744ebbde2f57cada199cd8cd9e
|
|
| MD5 |
418dd4446f7ef39f0b2d77a469c7c6b4
|
|
| BLAKE2b-256 |
304fe2bf4bb0eb9b56133b1988cb994baf8acf26e223279125569c8975c1aff6
|