Amplicon processing protocol
Project description
Performs quality control based on quality, can trim adapters, and remove sequences matching a contaminant database
Handles paired-end read merging
Integrates de novo and reference-based chimera filtering
Clusters sequences and annotates using databases that are downloaded as needed
Generates standard outputs for these data like a newick tree, a tabular OTU table with taxonomy, and .biom.
This workflow is built using Snakemake and makes use of Bioconda to install its dependencies.
Documentation
For complete documentation and install instructions, see:
Install
This protocol leverages the work of Bioconda and depends on conda. For complete setup of these, please see:
https://bioconda.github.io/#using-bioconda
Really, you just need to make sure conda is executable and you’ve set up your channels (numbers 1 and 2). Then:
conda install python>=3.6 click \ pyyaml snakemake>=5.1.4 biopython pip install hundo
Usage
Running samples through annotation requires that input FASTQs be paired-end, named in a semi-conventional style starting sample ID, contain “_R1” (or “_r1”) and “_R2” (or “_r2”) index identifiers, and have an extension “.fastq” or “.fq”. The files may be gzipped and end with “.gz”. By default, both R1 and R2 need to be larger than 10K in size. This cutoff is arbitrary and can be set using --prefilter-file-size.
Using the example data of the mothur SOP located in our tests directory, we can annotate across SILVA using:
cd example hundo annotate \ --filter-adapters qc_references/adapters.fa.gz \ --filter-contaminants qc_references/phix174.fa.gz \ --out-dir mothur_sop_silva \ --database-dir annotation_references \ --reference-database silva \ mothur_sop_data
Dependencies are installed by default in the results directory defined on the command line as --out-dir. If you want to re-use dependencies across many analyses and not have to re-install each time you update the output directory, use Snakemake’s --conda-prefix:
hundo annotate \ --out-dir mothur_sop_silva \ --database-dir annotation_references \ --reference-database silva \ mothur_sop_data \ --conda-prefix /Users/brow015/devel/hundo/example/conda
Output
OTU.biom
Biom table with raw counts per sample and their associated taxonomic assignment formatted to be compatible with downstream tools like phyloseq.
OTU.fasta
Representative DNA sequences of each OTU.
OTU.tree
Newick tree representation of aligned OTU sequences.
OTU.txt
Tab-delimited text table with columns OTU ID, a column for each sample, and taxonomy assignment in the final column as a comma delimited list.
OTU_aligned.fasta
OTU sequences after alignment using Clustal Omega.
all-sequences.fasta
Quality-controlled, dereplicated DNA sequences of all samples. The header of each record identifies the sample of origin and the count resulting from dereplication.
blast-hits.txt
The BLAST assignments per OTU sequence.
summary.html
Captures and summarizes data of the experimental dataset. Things like sequence quality, counts per sample at varying stages of pre-processing, and summarized taxonomic composition per sample across phylum, class, and order.
Citing
Cite this as:
Brown J, Zavoshy N, Brislawn CJ, McCue LA. (2018) Hundo: a Snakemake workflow for microbial community sequence data. PeerJ Preprints 6:e27272v1 https://doi.org/10.7287/peerj.preprints.27272v1
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file hundo-1.2.8.tar.gz
.
File metadata
- Download URL: hundo-1.2.8.tar.gz
- Upload date:
- Size: 29.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3590a665534d2620b8a09e153298d5f1c9da0d0cf278f99fd216b4f42c5379d2 |
|
MD5 | 5d73bbd9ccb92efe3cd5b45b6dd09740 |
|
BLAKE2b-256 | a6235fe76aace1fd3fb4db009a6d9d30ba82712ac3966f880f36c50af70cfd7e |
File details
Details for the file hundo-1.2.8-py3-none-any.whl
.
File metadata
- Download URL: hundo-1.2.8-py3-none-any.whl
- Upload date:
- Size: 42.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 788e1494d132b7774e9213d4e4b38ad7967a32a894344c1ef14f1f5f12252d9d |
|
MD5 | 157f0d1a3440ae9d7f3a0c10490ddf1b |
|
BLAKE2b-256 | b05576e7dd7f431d74886c6f2d86aa397eb3d5ec1e7b3cd7b1c104a29b75359e |