A Snakemake-based pipeline for amplicon processing

These details have not been verified by PyPI

Project description

eDentity-metabarcoding-pipeline

alt text

Overview
Installation
- Using pip
- Using conda
Usage

Overview

eDentity is a Snakemake based metabarcoding workflow designed for Illumina/AVITI paired-end data. It automates Vsearch commands to denoise paired-end Fastq sequences and generate Exact Sequence Variants (ESVs). The pipeline is inspired by APSCALE; please cite them if you use this pipeline.

Installation

Copy the dependencies below into a file (e.g., edentity-env.yaml), then create and activate the environment with:

priority: strict
name: edentity-env
channels:
    - conda-forge
    - bioconda
    - nodefaults
dependencies:
    - conda
    - snakemake
    - pip
    - cutadapt=4.9
    - biopython=1.84
    - fastp=0.24.0
    - multiqc=1.27.1
    - vsearch=2.28.1
    - pip:
        - edentity

Install:

conda env create -f edentity-env.yaml --name edentity-env  && conda activate edentity-env

Usage

After installation, the pipeline can be run from the command line. Parameters can be provided either directly via command line arguments or through a configuration file.

Using Command Line Arguments

Replace the example parameters with those specific to your project:

edentity --raw_data_dir /path/to/your/raw_fastq_files/ \
--work_dir /path/to/your/work_directory \
--forward_primer pcr primer sequence \
--reverse_primer pcr primer sequence \
--min_length 200 \
--max_length 600

Using a Configuration File

Create a params_config.yaml file and copy the YAML template below into it. Adjust the parameters to your project specifications:

# project specific
raw_data_dir: "/path/to/your/raw_fastq_files/"
work_dir: "path/to/your/work_directory"
make_json_reports: False
dataType: "Illumina" # [Illumina, AVITI], one of the two
cpu_cores: 20 

# general quality control (Fastp)
average_qual: 25
length_required: 100
n_base_limit: 0

# PE_merging (these are set to vsearch default values)
maxdiffpct: 100
maxdiffs: 10
minovlen: 10

# primer_trimming (cutadapt)
forward_primer:   
reverse_primer: 
anchoring: False
discard_untrimmed: True

# quality_filtering (vsearch)
min_length: 100
max_length: 600
maxEE: 1

# dereplication (vsearch)
fasta_width: 0

# denoising (vsearch)
alpha: 2
minsize: 4

Then run the pipeline with:

edentity --config_file params_config.yaml

Parameters:

--forward_primer: Forward primer sequence.
--reverse_primer: Reverse primer sequence.
--raw_data_dir: Directory containing your raw sequencing data.
--work_dir: Directory for pipeline outputs and intermediate files.
--make_json_reports: Set true to create extended json reports

Configuring Snakemake Parameters via Profile

You can control Snakemake-specific parameters (such as cluster execution, resource limits, and rerun-incomplete ...) using a profile YAML configuration. This is useful for running the pipeline on HPC clusters or customizing workflow execution.

Create a snakemake-profile.yaml file with content like:

executor: local # clusters e.g slurm, lsf, aws-batch ... see snakemake documentation 
jobs: "30"
max-jobs-per-second: "10"
max-status-checks-per-second: "10"
local-cores: 44
latency-wait: "30"
printshellcmds: "True"
rerun-incomplete: "False"
keep-incomplete: "True"
conda-cleanup-envs: "False"
dryrun: true
resources:
    mem_mb: 16000
    threads: 8

executor: Cluster scheduler (e.g., SLURM).
jobs: Maximum number of parallel jobs.
resources: Default resource limits for jobs.
dryrun: Set to true to perform a dry-run (no jobs will be executed).

For more details on these and other Snakemake parameters, see the Snakemake documentation.

To use this profile, run:

edentity --profile snakemake-profile.yaml --config_file params_config.yaml

You can combine this with your pipeline configuration file for full control over both workflow and execution parameters.

For a full list of options params:

edentity --help

Pipeline Output Directory Structure

After successful execution, the pipeline generates a structured set of output directories and files within your specified work_dir. All file names are prefixed with your work_dir. The main components are:

work_dir/
├── Results/                       # Final processed data and reports
│   ├── ESV_table.tsv              # Table of Exact Sequence Variants (ESVs)
│   ├── summary_report.tsv         # Summary statistics for the run
│   ├── metabarcoding_run.json     # JSON report with run metadata and parameters
│   └── multiqc_report/            # Directory containing MultiQC output
│       └── multiqc.html           # Interactive MultiQC report
├── logs/                          # log files for each step of the pipeline
├── edentity_pipeline_settings/    # Stores configuration files used for the pipeline run

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

1.5.7

May 8, 2026

1.5.6

Mar 2, 2026

1.5.5

Feb 17, 2026

1.5.4

Jan 30, 2026

1.5.3

Jan 15, 2026

1.5.2

Oct 14, 2025

1.5.1

Oct 10, 2025

1.5.0

Oct 2, 2025

1.4.9

Aug 7, 2025

1.4.8

Jul 31, 2025

1.4.7

Jul 14, 2025

1.4.6

Jul 4, 2025

1.4.5

Jun 18, 2025

This version

1.4.4

Jun 6, 2025

1.4.3

Jun 3, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

edentity-1.4.4.tar.gz (76.6 kB view details)

Uploaded Jun 6, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

edentity-1.4.4-py3-none-any.whl (35.0 kB view details)

Uploaded Jun 6, 2025 Python 3

File details

Details for the file edentity-1.4.4.tar.gz.

File metadata

Download URL: edentity-1.4.4.tar.gz
Upload date: Jun 6, 2025
Size: 76.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.18

File hashes

Hashes for edentity-1.4.4.tar.gz
Algorithm	Hash digest
SHA256	`e5aa42daa05f16970ee43d6b59d0d67b18dba9b755feffd3782e76ff51559fa7`
MD5	`5eb6741df69b94cba69e0f774081dc44`
BLAKE2b-256	`6150f15981eb5b72fada143e082feec59f2578d59aabe1cd88f2f59c034bfe70`

See more details on using hashes here.

File details

Details for the file edentity-1.4.4-py3-none-any.whl.

File metadata

Download URL: edentity-1.4.4-py3-none-any.whl
Upload date: Jun 6, 2025
Size: 35.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.18

File hashes

Hashes for edentity-1.4.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a00afcfa621b991be8ca0fd3b2fd5b7b061354e1ecfe2126932232c036518be0`
MD5	`31243d319951d29242d43e8d1d4a8845`
BLAKE2b-256	`9c50711acf53352b29777abfc2dd17a790c2b29ee54814c9c9c1858d8923a654`

See more details on using hashes here.

edentity 1.4.4

Navigation

Verified details

Owner

Unverified details

Meta

Classifiers

Project description

eDentity-metabarcoding-pipeline

Table of Contents

Overview

Installation

Usage

Using Command Line Arguments

Using a Configuration File

Configuring Snakemake Parameters via Profile

Pipeline Output Directory Structure

Project details

Verified details

Owner

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes