Advanced Pipeline for Simple yet Comprehensive AnaLysEs of DNA metabarcoding data - Nanopore application
Project description
apscale
Advanced Pipeline for Simple yet Comprehensive AnaLysEs of DNA metabarcoding data
apscale-nanopore
Introduction
Apscale-nanopore is a modified version of the metabarcoding pipeline apscale and is used for the processing of Oxford Nanopore data.
Programs used:
Input:
- Non-demultiplexed Nanopore sequence data in .fastq format.
- Demultiplexed Nanopore sequence data in .fastq format.
Output:
- read table, taxonomy table, log files, report
Installation
Apscale-nanopore can be installed on all common operating systems (Windows, Linux, MacOS). Apscale-nanopore requires Python 3.10 or higher and can be easily installed via pip in any command line:
pip install apscale_nanopore
To update apscale-blast run:
pip install --upgrade apscale_nanopore
The easiest installation option is the Conda apscale environment. This way, all dependencies will automatically be installed.
Then activate the conda environment.
conda activate apscale
Create project
First, create a new project:
apscale_nanopore create -p PATH/TO/PROJECT
A new project will be created. Follow the instructions and fill out the settings file accordingly.
/YOUR_PROJECT_PATH/My_new_project/ ├───1_raw_data │ └───data ├───2_index_demultiplexing │ └───data ├───3_primer_trimming │ └───data ├───4_quality_filtering │ └───data ├───5_clustering_denoising │ └───data ├───6_read_table │ └───data ├───7_taxonomic_assignment │ └───data ├───8_nanopore_report My_new_project_settings.xlsx
Settings file
Sample index and primer combinations (Example)
| Forward index 5'-3' | Forward primer 5'-3' | Reverse index 5'-3' | Reverse primer 5'-3' | ID |
|---|---|---|---|---|
| AGAACGACTTCCATACTCGTGTGA | RGCHTTYCCHCGWATAAAYAAYATAAG | AGAACGACTTCCATACTCGTGTGA | GRGGRTAWACWGTTCAWCCWGTNCC | Sample_1 |
| AACGAGTCTCTTGGGACCCATAGA | RGCHTTYCCHCGWATAAAYAAYATAAG | AACGAGTCTCTTGGGACCCATAGA | GRGGRTAWACWGTTCAWCCWGTNCC | Sample_2 |
| AGGTCTACCTCGCTAACACCACTG | RGCHTTYCCHCGWATAAAYAAYATAAG | AGGTCTACCTCGCTAACACCACTG | GRGGRTAWACWGTTCAWCCWGTNCC | Sample_3 |
| CGTCAACTGACAGTGGTTCGTACT | RGCHTTYCCHCGWATAAAYAAYATAAG | CGTCAACTGACAGTGGTTCGTACT | GRGGRTAWACWGTTCAWCCWGTNCC | Sample_4 |
| ACCCTCCAGGAAAGTACCTCTGAT | RGCHTTYCCHCGWATAAAYAAYATAAG | ACCCTCCAGGAAAGTACCTCTGAT | GRGGRTAWACWGTTCAWCCWGTNCC | Sample_5 |
| CCAAACCCAACAACCTAGATAGGC | RGCHTTYCCHCGWATAAAYAAYATAAG | CCAAACCCAACAACCTAGATAGGC | GRGGRTAWACWGTTCAWCCWGTNCC | Sample_6 |
Apscale-nanopore settings (Example)
| Step | Category | Variable | Comment |
|---|---|---|---|
| General | cpu count | 7 | Number of cores to use |
| demultiplexing (index) | allowed errors index | 3 | Allowed errors during index demultiplexing |
| primer trimming | allowed errors primer | 4 | Allowed errors during primer trimming |
| quality filtering | minimum length | 54 | Reads below this length will be discarded |
| quality filtering | maximum length | 74 | Reads above this length will be discarded |
| quality filtering | minimum quality | 20 | Reads below this average PHRED quality score will be discarded |
| clustering/denoising | mode | denoised OTUs | Choose clustering/denoising algorithm |
| clustering/denoising | percid | 0.97 | Vsearch clustering percentage identity |
| clustering/denoising | alpha | 1 | Vsearch denoising alpha value |
| clustering/denoising | d | 1 | Swarm's d value |
| read table | minimum reads | 10 | Discard reads below this threshold |
| taxonomic assignment | apscale blast | Yes | Run APSCALE megablast (yes or no) |
| taxonomic assignment | apscale db | ... | Path to local database |
Run apscale-nanopore
Apscale-nanopore operates in four different ways:
1) Raw-data processing of non-demultiplexed data
- Copy your non-demultiplexed .fastq(.gz) files to the "1_raw_data/data" folder.
apscale_nanopore run -p PATH/TO/PROJECT
- Apscale-nanopore will demultiplex all your files according to the demultiplexing sheet.
2) Raw-data processing of demultiplexed data
- Copy your demultiplexed .fastq(.gz) files to the "1_raw_data/data" folder.
apscale_nanopore run -p PATH/TO/PROJECT -sd
- Apscale-nanopore will skip the demultiplexing and immediately start with the raw-data processing.
- Important: Enter the primer sequences (5'-3') in the first row of the demultiplexing sheet. The index columns can be left blank.
3) Live raw-data processing of non-demultiplexed data
- Output your non-demultiplexed .fastq(.gz) files to the "1_raw_data/data" folder during sequencing.
- Apscale-nanopore will automatically scan the folder for incoming files and automatically process them.
- Press Ctrl+C to interupt the live-calling.
apscale_nanopore run -p PATH/TO/PROJECT -l
- Apscale-nanopore will demultiplex all your files according to the demultiplexing sheet.
4) Live raw-data processing of demultiplexed data
- Output your demultiplexed .fastq(.gz) files to the "1_raw_data/data" folder during sequencing.
- Apscale-nanopore will automatically scan the folder for incoming files and automatically process them.
- Press Ctrl+C to interupt the live-calling.
apscale_nanopore run -p PATH/TO/PROJECT -l -sd
- Apscale-nanopore will skip the demultiplexing and immediately start with the raw-data processing.
- Important: Enter the primer sequences (5'-3') in the first row of the demultiplexing sheet. The index columns can be left blank.
Run individual steps
- Apscale can run individual steps (-step X) or all steps after a specific module (-steps X).
Step indices:
- 1 = Index demultiplexing
- 2 = Primer trimming
- 3 = Quality filtering
- 4 = Clustering/denoising
- 5 = Read table
- 6 = Taxonomic assignment
Example: Run "clustering/denoising"
apscale_nanopore run -p PATH/TO/PROJECT -step 4
Example: Run all steps after the "quality filtering":
apscale_nanopore run -p PATH/TO/PROJECT -steps 3
Quality control
A quality control can be conducted for all fastq files. Simply run:
apscale_nanopore qc -p PATH/TO/PROJECT
Bioinformatics Workflow Overview
1) Demultiplexing
Tool: cutadapt
Settings: Allowed errors (default=3)
Demultiplex raw sequencing reads based on barcode sequences to generate sample-specific FASTQ files.
2) Primer Trimming
Tool: cutadapt
Settings: Allowed errors (default=4)
Remove primer sequences from demultiplexed reads to retain only target regions.
3) Quality Filtering
Tools: python, vsearch
Settings: Min. mean Q-Score (default=20), Min. and max. length (fragment-specific)
Filter reads based on:
- Mean PHRED quality score
- Minimum and maximum fragment length
This step ensures only high-quality reads are retained for downstream processing.
4) Clustering / Denoising
Tool: vsearch
Settings: d (default=1), percentage identity (default=0.97), alpha (default=1)
Choose from the following processing strategies:
- Swarm denoising: Local clustering using the Swarm algorithm for fine-scale resolution.
- Swarm OTUs: Swarm denoising followed by similarity clustering.
- ESV denoising: Error-correction to obtain Exact Sequence Variants.
- Denoised OTUs: Denoising followed by similarity clustering.
5) Read Table Construction and Filtering
Tool: python
Settings: minimum reads (default=10)
Construct an abundance table (ESVs/OTUs × samples).
Apply a minimum read threshold to remove low-abundance features.
6) Taxonomic Assignment
Tool: BLASTn via apscale-blast
Settings: Apscale-blast database
Assign taxonomy to representative sequences using a local reference database.
7) Quality Control and Reporting
Tool: python
Generate summary statistics and visual diagnostics.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file apscale_nanopore-1.0.10.tar.gz.
File metadata
- Download URL: apscale_nanopore-1.0.10.tar.gz
- Upload date:
- Size: 42.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0f610c49cf467aa2b4feccca834f41c87a4e666cd2d896865530038d3273f152
|
|
| MD5 |
6bc8ca1d704e5084d70763bbe50ae8b0
|
|
| BLAKE2b-256 |
000ffe59e5546c626f85a5a204449a6a56c974a9c16fe82718ea5078346f9cc0
|
File details
Details for the file apscale_nanopore-1.0.10-py3-none-any.whl.
File metadata
- Download URL: apscale_nanopore-1.0.10-py3-none-any.whl
- Upload date:
- Size: 44.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e6629d1034313572d4c6f3c38a842f75c51aa8d12382c45aa3e31f6e888a1f16
|
|
| MD5 |
96cd8b7182b2013187b0b8e75d5041f3
|
|
| BLAKE2b-256 |
f9adce0489b62cfff8b20ae739f9235fa2e27932a34f78ab0c8e0cc70995c429
|