Skip to main content

Rapidly trim sequences down to their Internally Transcribed Spacer (ITS) regions

Project description

https://travis-ci.org/USDA-ARS-GBRU/itsxpress.svg?branch=master Documentation Status https://codecov.io/gh/USDA-ARS-GBRU/itsxpress/branch/master/graph/badge.svg https://api.codacy.com/project/badge/Grade/7e2a4c97cde74bccb3e84b706d7a2aa5 https://zenodo.org/badge/DOI/10.5281/zenodo.1304349.svg

Author

  • Adam R. Rivers, US Department of Agriculture, Agricultural Research Service

Citation

Rivers AR, Weber KC, Gardner TG et al. ITSxpress: Software to rapidly trim internally transcribed spacer sequences with quality scores for marker gene analysis [version 1; referees: awaiting peer review]. F1000Research 2018, 7:1418 (doi: 10.12688/f1000research.15704.1)

Introduction

The internally transcribed spacer region is a region between highly conserved the small subunit (SSU) of rRNA and the large subunit (LSU) of the rRNA. In Eukaryotes it contains the 5.8s genes and two variable length spacer regions. In amplicon sequencing studies it is common practice to trim off the conserved (SSU, 5,8S or LSU) regions. Bengtsson-Palme et al. (2013) published software the software package ITSx to do this.

ITSxpress is designed to support the calling of exact sequence variants rather than OTUs. This newer method of sequence error-correction requires quality score data from each sequence, so each input sequence must be trimmed. ITSXpress makes this possible by taking FASTQ data, de-replicating the sequences then identifying the start and stop sites using HMMSearch. Results are parsed and the trimmed files are returned. The ITS 1, ITS2 or the entire ITS region including the 5.8s rRNA gene can be selected. ITSxpress uses the hmm model from ITSx so results are comparable.

ITSxpress is also available as a QIIME2 Plugin

Installation

ITSxpress can be installed from:

  1. Bioconda: (preferred method because it handles dependencies):

conda install itsxpress
  1. Pip: https://pypi.org/project/itsxpress/:

pip install itsxpress
  1. The Github repository: https://github.com/USDA-ARS-GBRU/itsxpress

git clone https://github.com/USDA-ARS-GBRU/itsxpress.git

Dependencies

The software requires Vsearch, BBtools, Hmmer >= 3.1b and Biopython. Bioconda takes care of this for you so it is the preferred installation method.

Usage

-h, --help

Show this help message and exit.

--fastq

A .fastq, .fq, .fastq.gz or .fq.gz file. Interleaved or not. Required.

--single_end

A flag to specify that the fastq file is single-ended (not paired). single-ended (not paired). Default is false.

--fastq2

A .fastq, .fq, .fastq.gz or .fq.gz file representing read 2 if present, optional.

--outfile

The trimmed FASTQ file, if it ends in gz it will be gzipped.

--outfile2

The trimmed FASTQ read 2 file, if it ends in gz it will be gzipped. If used, reads will be retuned as unmerged pairs rather than than merged.

--tempdir

Specify the temp file directory. Default is None.

--keeptemp

Should intermediate files be kept? Default is false.

--region

Options : {ITS2, ITS1, ALL}

--taxa

Select the taxonomic group sequenced: {Alveolata, Bryophyta, Bacillariophyta, Amoebozoa, Euglenozoa, Fungi, Chlorophyta, Rhodophyta, Phaeophyceae, Marchantiophyta, Metazoa, Oomycota, Haptophyceae, Raphidophyceae, Rhizaria, Synurophyceae, Tracheophyta, Eustigmatophyceae, All}. Default Fungi.

--cluster_id

The percent identity for clustering reads range [0.98-1.0], set to 1 for exact dereplication. Default 0.995.

--log

Log file. Default is ITSxpress.log.

--threads

Number of processor threads to use. Default is 1.

Examples

Use case 1: Trimming the ITS2 region from a fungal amplicon sequencing dataset with forward and reverse gzipped FASTQ files using two cpu threads. Return a single merged file for use in Deblur.

itsxpress --fastq r1.fastq.gz --fastq2 r2.fastq.gz --region ITS2 \
--taxa Fungi --log logfile.txt --outfile trimmed_reads.fastq.gz --threads 2

ITSxpress can take gzipped or un-gzipped FASTQ files and it can write gzipped or un-gzipped FASTQ files. It expects FASTQ files to end in: .fq, .fastq, .fq.gz or fastq.gz.

Use case 2: Trimming the ITS2 region from a fungal amplicon sequencing dataset with forward and reverse gzipped FASTQ files using two cpu threads. Return a forward and reverse read files for use in Dada2.

itsxpress --fastq r1.fastq.gz --fastq2 r2.fastq.gz --region ITS2 \
--taxa Fungi --log logfile.txt --outfile trimmed_reads.fastq.gz --threads 2

ITSxpress can take gzipped or un-gzipped FASTQ files and it can write gzipped or un-gzipped FASTQ files. It expects FASTQ files to end in: .fq, .fastq, .fq.gz or fastq.gz.

Use case 3: Trimming the ITS2 region from a fungal amplicon sequencing dataset with an interleaved gzipped FASTQ files using two cpu threads. Return a single merged file for use in Deblur.

itsxpress --fastq interleaved.fastq.gz  --region ITS2 --taxa Fungi \
--log logfile.txt --outfile trimmed_reads.fastq.gz --threads 2

Use case 4: Trimming the ITS2 region from a fungal amplicon sequencing dataset with an single-ended gzipped FASTQ files using two cpu threads.

itsxpress --fastq single-end.fastq.gz --single_end --region ITS2 --taxa Fungi \
--log logfile.txt --outfile trimmed_reads.fastq.gz --threads 2

Single ended data is less common and may come from a dataset where the reads have already been merged.

Use case 5: Trimming the ITS1 region from a Alveolata amplicon sequencing dataset with an interleaved gzipped FASTQ files using 40 cpu threads.

itsxpress --fastq interleaved.fastq.gz --region ITS1 --taxa Alveolata \
--log logfile.txt --outfile trimmed_reads.fastq.gz --threads 40

License information

This software is a work of the United States Department of Agriculture, Agricultural Research Service and is released under a Creative Commons CC0 public domain attribution.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

itsxpress-1.7.1.tar.gz (2.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

itsxpress-1.7.1-py3-none-any.whl (1.6 MB view details)

Uploaded Python 3

File details

Details for the file itsxpress-1.7.1.tar.gz.

File metadata

  • Download URL: itsxpress-1.7.1.tar.gz
  • Upload date:
  • Size: 2.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/40.2.0 requests-toolbelt/0.8.0 tqdm/4.25.0 CPython/3.5.5

File hashes

Hashes for itsxpress-1.7.1.tar.gz
Algorithm Hash digest
SHA256 2e04503cb1c4261e75672dfb7c2999ce27fac539d8918ec5ebd53392216d2164
MD5 1afd7f6de76a1484e5e2abf86fd078cd
BLAKE2b-256 60b6ae28b83b30411da759b054355b97f46f22ae21cf453d046f5ead16408a4c

See more details on using hashes here.

File details

Details for the file itsxpress-1.7.1-py3-none-any.whl.

File metadata

  • Download URL: itsxpress-1.7.1-py3-none-any.whl
  • Upload date:
  • Size: 1.6 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/40.2.0 requests-toolbelt/0.8.0 tqdm/4.25.0 CPython/3.5.5

File hashes

Hashes for itsxpress-1.7.1-py3-none-any.whl
Algorithm Hash digest
SHA256 d28784f7b1d10a99f27a1dcdf9789f8f8e61a629e5be157020a6b14a9cc3a182
MD5 e2d172489dd44cca6c24b8558f992f28
BLAKE2b-256 1625e716865df5608de93db17a03ec45b280fcdea970d6bfdfe4aa9b330566f3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page