Skip to main content

Amplicon processing protocol

Project description

DOI Documentation Status bioconda-badge

  • Performs quality control based on quality, can trim adapters, and remove sequences matching a contaminant database

  • Handles paired-end read merging

  • Integrates de novo and reference-based chimera filtering

  • Clusters sequences and annotates using databases that are downloaded as needed

  • Generates standard outputs for these data like a newick tree, a tabular OTU table with taxonomy, and .biom.

This workflow is built using Snakemake and makes use of Bioconda to install its dependencies.

Documentation

For complete documentation and install instructions, see:

https://hundo.readthedocs.io

Install

This protocol leverages the work of Bioconda and depends on conda. For complete setup of these, please see:

https://bioconda.github.io/#using-bioconda

Really, you just need to make sure conda is executable and you’ve set up your channels (numbers 1 and 2). Then:

conda install python=3.6 pyyaml snakemake biopython biom-format=2.1.5
pip install hundo

Usage

Running samples through annotation requires that input FASTQs be paired-end, named in a semi-conventional style starting sample ID, contain “_R1” (or “_r1”) and “_R2” (or “_r2”) index identifiers, and have an extension “.fastq” or “.fq”. The files may be gzipped and end with “.gz”. By default, both R1 and R2 need to be larger than 10K in size. This cutoff is arbitrary and can be set using --prefilter-file-size.

Using the example data of the mothur SOP located in our tests directory, we can annotate across SILVA using:

cd example
hundo annotate \
    --filter-adapters qc_references/adapters.fa.gz \
    --filter-contaminants qc_references/phix174.fa.gz \
    --out-dir mothur_sop_silva \
    --database-dir annotation_references \
    --reference-database silva \
    mothur_sop_data

Dependencies are installed by default in the results directory defined on the command line as --out-dir. If you want to re-use dependencies across many analyses and not have to re-install each time you update the output directory, use Snakemake’s --conda-prefix:

hundo annotate \
    --out-dir mothur_sop_silva \
    --database-dir annotation_references \
    --reference-database silva \
    mothur_sop_data \
    --conda-prefix /Users/brow015/devel/hundo/example/conda

Output

OTU.biom

Biom table with raw counts per sample and their associated taxonomic assignment formatted to be compatible with downstream tools like phyloseq.

OTU.fasta

Representative DNA sequences of each OTU.

OTU.tree

Newick tree representation of aligned OTU sequences.

OTU.txt

Tab-delimited text table with columns OTU ID, a column for each sample, and taxonomy assignment in the final column as a comma delimited list.

OTU_aligned.fasta

OTU sequences after alignment using Clustal Omega.

all-sequences.fasta

Quality-controlled, dereplicated DNA sequences of all samples. The header of each record identifies the sample of origin and the count resulting from dereplication.

blast-hits.txt

The BLAST assignments per OTU sequence.

summary.html

Captures and summarizes data of the experimental dataset. Things like sequence quality, counts per sample at varying stages of pre-processing, and summarized taxonomic composition per sample across phylum, class, and order.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hundo-1.1.5.tar.gz (26.9 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

hundo-1.1.5-py3.5.egg (41.8 kB view details)

Uploaded Egg

hundo-1.1.5-py3-none-any.whl (29.9 kB view details)

Uploaded Python 3

File details

Details for the file hundo-1.1.5.tar.gz.

File metadata

  • Download URL: hundo-1.1.5.tar.gz
  • Upload date:
  • Size: 26.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for hundo-1.1.5.tar.gz
Algorithm Hash digest
SHA256 8353be8bef49e603440639e88757bbec61bbac3775d048cab357a6193f66f8e8
MD5 cb49df7ac01bdd4d4a7708055a33d95d
BLAKE2b-256 26b73bb762276f78bd43315dc9db30c30cae7c111f250c10f96ed029b5ac61b9

See more details on using hashes here.

File details

Details for the file hundo-1.1.5-py3.5.egg.

File metadata

  • Download URL: hundo-1.1.5-py3.5.egg
  • Upload date:
  • Size: 41.8 kB
  • Tags: Egg
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for hundo-1.1.5-py3.5.egg
Algorithm Hash digest
SHA256 d49cc33fd176b836c95958b870fb1105054f7d4a7e6868f45d97e923082a92e2
MD5 51d2e6de762ca0990ebc1b88e1826775
BLAKE2b-256 f358687db08e1d2a8fb2d4925db7b38682cdb416a11aeb8d1372dfe9ef2b3aa3

See more details on using hashes here.

File details

Details for the file hundo-1.1.5-py3-none-any.whl.

File metadata

File hashes

Hashes for hundo-1.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 a39e8b2d86c362ae11450fdf186538436182775b915079174d13fd6d8473e42b
MD5 5aeef0373fe6ed666bbbacf95e27c6c7
BLAKE2b-256 27a084449bdc519c0ac2cf5feb7de6269e4971484ba076410f13c432e19a47a9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page