Pure-Python DADA2 — exact amplicon sequence variant inference from amplicon sequencing data.
Project description
dada2-py
A pure-Python re-implementation of DADA2 (Callahan et al., Nature Methods 2016) for exact amplicon sequence variant (ASV) inference from amplicon sequencing data.
- AnnData / pandas-friendly — drop-in for downstream microbiome analysis
- No
rpy2, no R install, no Bioconductor dependency - Same API as the R dada2 workflow
(
filter_and_trim→learn_errors→dada→merge_pairs→make_sequence_table→remove_bimera_denovo→assign_taxonomy) - ASV-level identity matches R DADA2 on the canonical MiSeq SOP test data
This is a standalone mirror of the canonical implementation that lives in
omicverse. All algorithmic work is developed upstream in omicverse and synced here for users who want DADA2 without the full omicverse stack.
Install
pip install pydada2
Quick-start
from pydada2 import (
filter_and_trim, learn_errors, dada,
merge_pairs, make_sequence_table,
remove_bimera_denovo, assign_taxonomy,
)
# 1) Quality filter + trim
filter_and_trim(
fwd="raw/F.fastq.gz", filt="filt/F.fastq.gz",
rev="raw/R.fastq.gz", filt_rev="filt/R.fastq.gz",
trunc_len=(240, 160), max_ee=(2, 2), trunc_q=2,
)
# 2) Learn the error model
errF = learn_errors("filt/F.fastq.gz")
errR = learn_errors("filt/R.fastq.gz")
# 3) Run the divisive amplicon denoising algorithm
ddF = dada("filt/F.fastq.gz", err=errF)
ddR = dada("filt/R.fastq.gz", err=errR)
# 4) Merge paired-end reads
mergers = merge_pairs(ddF, "filt/F.fastq.gz", ddR, "filt/R.fastq.gz")
# 5) Sample × sequence table
seqtab = make_sequence_table(mergers)
# 6) Chimera removal
seqtab_nochim = remove_bimera_denovo(seqtab, method="consensus")
# 7) Taxonomy assignment
taxa = assign_taxonomy(seqtab_nochim, ref_fasta="silva_nr99.fa.gz")
What's included
| Module | Function | Purpose |
|---|---|---|
pydada2.filter |
filter_and_trim, fastq_filter |
Quality filtering + trimming |
pydada2.io |
derep_fastq, get_uniques, get_sequences |
FASTQ I/O + dereplication |
pydada2.errors |
learn_errors, loess_errfun, inflate_err |
Error-rate model |
pydada2.align |
nwalign, nwhamming |
Needleman-Wunsch ends-free alignment |
pydada2.kmers |
kmer_dist, kord_dist |
k-mer distance pre-screen |
pydada2.dada |
dada |
The divisive amplicon denoising algorithm |
pydada2.paired |
merge_pairs |
Paired-end merging |
pydada2.seqtab |
make_sequence_table, merge_sequence_tables, collapse_no_mismatch |
ASV table assembly |
pydada2.chimeras |
is_bimera_denovo, remove_bimera_denovo |
Chimera detection |
pydada2.taxonomy |
assign_taxonomy, assign_species, add_species |
RDP naive Bayes + exact-match species |
Relationship to R DADA2
This is a Python port of the canonical R/C++ DADA2 package
(benjjneb/dada2). All algorithmic
behaviour is checked against the R reference on the MiSeq SOP test
data (see tests/test_r_parity.py). The R reference is invoked from
/scratch/users/steorra/env/CMAP during testing.
Citation
If you use this package, please cite the original DADA2 paper:
Callahan, B.J. et al. DADA2: High-resolution sample inference from Illumina amplicon data. Nature Methods 13, 581–583 (2016).
and acknowledge omicverse / this repo for the Python port.
License
LGPL-2 — matches upstream R DADA2.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pydada2-0.1.0.tar.gz.
File metadata
- Download URL: pydada2-0.1.0.tar.gz
- Upload date:
- Size: 77.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f213e8e6df9adf79d71263fa94b66820a7137c7f62bb5b58071c343a6c385b0a
|
|
| MD5 |
88445800ed386feb2226d4397163597b
|
|
| BLAKE2b-256 |
719538447dff34320697d87db1380dd52d00dd6c022f563f0e677486a46d6824
|
File details
Details for the file pydada2-0.1.0-py3-none-any.whl.
File metadata
- Download URL: pydada2-0.1.0-py3-none-any.whl
- Upload date:
- Size: 66.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
123dd1247e1c950a3384478c7f09661dc926ce5187b52e03e8d4a26240611e2d
|
|
| MD5 |
a64c4b8f03e98a78c7f185fb874b706a
|
|
| BLAKE2b-256 |
e4d5307d6a7b1655818afc97ed7639547435cb7ec1ca7ba3c01296d906515a1e
|