Skip to main content

Amplicon processing protocol

Project description

DOI Documentation Status bioconda-badge

  • Performs quality control based on quality, can trim adapters, and remove sequences matching a contaminant database

  • Handles paired-end read merging

  • Integrates de novo and reference-based chimera filtering

  • Clusters sequences and annotates using databases that are downloaded as needed

  • Generates standard outputs for these data like a newick tree, a tabular OTU table with taxonomy, and .biom.

This workflow is built using Snakemake and makes use of Bioconda to install its dependencies.

Documentation

For complete documentation and install instructions, see:

https://hundo.readthedocs.io

Install

This protocol leverages the work of Bioconda and depends on conda. For complete setup of these, please see:

https://bioconda.github.io/#using-bioconda

Really, you just need to make sure conda is executable and you’ve set up your channels (numbers 1 and 2). Then:

conda install python>=3.6 click \
    pyyaml snakemake>=5.1.4 biopython
pip install hundo

Usage

Running samples through annotation requires that input FASTQs be paired-end, named in a semi-conventional style starting sample ID, contain “_R1” (or “_r1”) and “_R2” (or “_r2”) index identifiers, and have an extension “.fastq” or “.fq”. The files may be gzipped and end with “.gz”. By default, both R1 and R2 need to be larger than 10K in size. This cutoff is arbitrary and can be set using --prefilter-file-size.

Using the example data of the mothur SOP located in our tests directory, we can annotate across SILVA using:

cd example
hundo annotate \
    --filter-adapters qc_references/adapters.fa.gz \
    --filter-contaminants qc_references/phix174.fa.gz \
    --out-dir mothur_sop_silva \
    --database-dir annotation_references \
    --reference-database silva \
    mothur_sop_data

Dependencies are installed by default in the results directory defined on the command line as --out-dir. If you want to re-use dependencies across many analyses and not have to re-install each time you update the output directory, use Snakemake’s --conda-prefix:

hundo annotate \
    --out-dir mothur_sop_silva \
    --database-dir annotation_references \
    --reference-database silva \
    mothur_sop_data \
    --conda-prefix /Users/brow015/devel/hundo/example/conda

Output

OTU.biom

Biom table with raw counts per sample and their associated taxonomic assignment formatted to be compatible with downstream tools like phyloseq.

OTU.fasta

Representative DNA sequences of each OTU.

OTU.tree

Newick tree representation of aligned OTU sequences.

OTU.txt

Tab-delimited text table with columns OTU ID, a column for each sample, and taxonomy assignment in the final column as a comma delimited list.

OTU_aligned.fasta

OTU sequences after alignment using Clustal Omega.

all-sequences.fasta

Quality-controlled, dereplicated DNA sequences of all samples. The header of each record identifies the sample of origin and the count resulting from dereplication.

blast-hits.txt

The BLAST assignments per OTU sequence.

summary.html

Captures and summarizes data of the experimental dataset. Things like sequence quality, counts per sample at varying stages of pre-processing, and summarized taxonomic composition per sample across phylum, class, and order.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hundo-1.1.21.tar.gz (28.1 kB view details)

Uploaded Source

Built Distributions

hundo-1.1.21-py3.6.egg (53.5 kB view details)

Uploaded Source

hundo-1.1.21-py3-none-any.whl (31.6 kB view details)

Uploaded Python 3

File details

Details for the file hundo-1.1.21.tar.gz.

File metadata

  • Download URL: hundo-1.1.21.tar.gz
  • Upload date:
  • Size: 28.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.10.0 pkginfo/1.4.1 requests/2.18.4 setuptools/38.4.0 requests-toolbelt/0.8.0 tqdm/4.23.1 CPython/3.6.4

File hashes

Hashes for hundo-1.1.21.tar.gz
Algorithm Hash digest
SHA256 b387f2144cf36966a235a613829b9621ea8af06a4eb233c33912ea022f8641b5
MD5 307940b8c4fc5b09aac9d77bb581cb25
BLAKE2b-256 d896dec9420994be130a2c3d925fbf20e8b45a6336d6216c1b2bcf3c8994ef3a

See more details on using hashes here.

Provenance

File details

Details for the file hundo-1.1.21-py3.6.egg.

File metadata

  • Download URL: hundo-1.1.21-py3.6.egg
  • Upload date:
  • Size: 53.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.10.0 pkginfo/1.4.1 requests/2.18.4 setuptools/38.4.0 requests-toolbelt/0.8.0 tqdm/4.23.1 CPython/3.6.4

File hashes

Hashes for hundo-1.1.21-py3.6.egg
Algorithm Hash digest
SHA256 c4203d835c320c1141aebeb5b713a1b7ace80495a6525713613faeead825f445
MD5 b36140fd21fc0464e1203a72503add06
BLAKE2b-256 d3c3d5e11b6b03ac05d7dbb9e61ce69bcfccc32b4e2c851f4828c17ac6974047

See more details on using hashes here.

Provenance

File details

Details for the file hundo-1.1.21-py3-none-any.whl.

File metadata

  • Download URL: hundo-1.1.21-py3-none-any.whl
  • Upload date:
  • Size: 31.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.10.0 pkginfo/1.4.1 requests/2.18.4 setuptools/38.4.0 requests-toolbelt/0.8.0 tqdm/4.23.1 CPython/3.6.4

File hashes

Hashes for hundo-1.1.21-py3-none-any.whl
Algorithm Hash digest
SHA256 1409fc7bdfcaaab54525bd0a0e26153692b0ef745631ecb48801b0a9799dc5b9
MD5 fbe1170a57f2729d3da9069182daf3e5
BLAKE2b-256 d06ec007b42346b9467ca2b42ef7142544b258efebeec6603199a7568767252f

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page