Skip to main content

Amplicon processing protocol

Project description

DOI Documentation Status bioconda-badge

  • Performs quality control based on quality, can trim adapters, and remove sequences matching a contaminant database

  • Handles paired-end read merging

  • Integrates de novo and reference-based chimera filtering

  • Clusters sequences and annotates using databases that are downloaded as needed

  • Generates standard outputs for these data like a newick tree, a tabular OTU table with taxonomy, and .biom.

This workflow is built using Snakemake and makes use of Bioconda to install its dependencies.

Documentation

For complete documentation and install instructions, see:

https://hundo.readthedocs.io

Install

This protocol leverages the work of Bioconda and depends on conda. For complete setup of these, please see:

https://bioconda.github.io/#using-bioconda

Really, you just need to make sure conda is executable and you’ve set up your channels (numbers 1 and 2). Then:

conda install python=3.6 pyyaml snakemake biopython biom-format=2.1.5
pip install hundo

Usage

Running samples through annotation requires that input FASTQs be paired-end, named in a semi-conventional style starting sample ID, contain “_R1” (or “_r1”) and “_R2” (or “_r2”) index identifiers, and have an extension “.fastq” or “.fq”. The files may be gzipped and end with “.gz”. By default, both R1 and R2 need to be larger than 10K in size. This cutoff is arbitrary and can be set using --prefilter-file-size.

Using the example data of the mothur SOP located in our tests directory, we can annotate across SILVA using:

cd example
hundo annotate \
    --filter-adapters qc_references/adapters.fa.gz \
    --filter-contaminants qc_references/phix174.fa.gz \
    --out-dir mothur_sop_silva \
    --database-dir annotation_references \
    --reference-database silva \
    mothur_sop_data

Dependencies are installed by default in the results directory defined on the command line as --out-dir. If you want to re-use dependencies across many analyses and not have to re-install each time you update the output directory, use Snakemake’s --conda-prefix:

hundo annotate \
    --out-dir mothur_sop_silva \
    --database-dir annotation_references \
    --reference-database silva \
    mothur_sop_data \
    --conda-prefix /Users/brow015/devel/hundo/example/conda

Output

OTU.biom

Biom table with raw counts per sample and their associated taxonomic assignment formatted to be compatible with downstream tools like phyloseq.

OTU.fasta

Representative DNA sequences of each OTU.

OTU.tree

Newick tree representation of aligned OTU sequences.

OTU.txt

Tab-delimited text table with columns OTU ID, a column for each sample, and taxonomy assignment in the final column as a comma delimited list.

OTU_aligned.fasta

OTU sequences after alignment using Clustal Omega.

all-sequences.fasta

Quality-controlled, dereplicated DNA sequences of all samples. The header of each record identifies the sample of origin and the count resulting from dereplication.

blast-hits.txt

The BLAST assignments per OTU sequence.

summary.html

Captures and summarizes data of the experimental dataset. Things like sequence quality, counts per sample at varying stages of pre-processing, and summarized taxonomic composition per sample across phylum, class, and order.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hundo-1.1.6.tar.gz (26.9 kB view details)

Uploaded Source

Built Distributions

hundo-1.1.6-py3.5.egg (41.8 kB view details)

Uploaded Source

hundo-1.1.6-py3-none-any.whl (30.0 kB view details)

Uploaded Python 3

File details

Details for the file hundo-1.1.6.tar.gz.

File metadata

  • Download URL: hundo-1.1.6.tar.gz
  • Upload date:
  • Size: 26.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for hundo-1.1.6.tar.gz
Algorithm Hash digest
SHA256 375f16f32ddeeb7bc593281d974c1627a054ee4c35b017dd078d6fbeb865b010
MD5 0fbab5a85417efb3c8f9897c0e625d96
BLAKE2b-256 9a2cff1443d6c5a1b03d29c88275fbc2584a2fd203314419b245d27f63bfffcf

See more details on using hashes here.

Provenance

File details

Details for the file hundo-1.1.6-py3.5.egg.

File metadata

  • Download URL: hundo-1.1.6-py3.5.egg
  • Upload date:
  • Size: 41.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for hundo-1.1.6-py3.5.egg
Algorithm Hash digest
SHA256 2e36cdea06219981f4b770775f5da691e5997890cfa090ef8a64b073e5474946
MD5 9c6ebe985d5afaa288622046e614f9ab
BLAKE2b-256 62eb5e94c40fedfb1d7ac9403a23a20f294d98071877f795118b07fcbbab2a71

See more details on using hashes here.

Provenance

File details

Details for the file hundo-1.1.6-py3-none-any.whl.

File metadata

File hashes

Hashes for hundo-1.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 516e3bacfe3a12262b7485a053e6431bec9fe92c116802de65bbdf1709dda667
MD5 1f14e8843f50bb073203ca92696fc875
BLAKE2b-256 58bd7f94f6d46872afb559eb7b82b3ffd91863d9e9f00d5e4beb1a8a4316f51d

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page