Rapidly trim sequences down to their Internally Transcribed Spacer (ITS) regions
Project description
Introduction
The internally transcribed spacer region is a region between highly conserved the small subunit (SSU) of rRNA and the large subunit (LSU) of the rRNA. In Eukaryotes it contains the 5.8s genes and two variable length spacer regions. In amplicon sequening studies it is common practice to trim off the conserved (SSU, 5,8S or LSU) regions. Bengtsson-Palme et al. (2013) published software the software package ITSx to do this.
ITSxpress is designed to support the calling of exact sequence variants rather than OTUs. This newer method of sequence error-correction requires quality score data from each sequence, so each input sequence must be trimmed. ITSXpress makes this possible by taking FASTQ data, de-replicating the sequences then identifying the start and stop sites using HMMSearch. Results are parsed and the trimmed files are returned. The ITS 1, ITS2 or the entire ITS region including the 5.8s rRNA gene can be selected. ITSxpress uses the hmm model from ITSx so results are comprable.
Installation
ITSxpress can be installed from:
Preferred method - Bioconda (to be done):
conda install itsxpress
Pip:
pip install itsxpress
The Github repository: https://github.com/USDA-ARS-GBRU/itsxpress
git clone https://github.com/USDA-ARS-GBRU/itsxpress.git
Dependencies
The software requires Vsearch, BBtools, Hmmer and Biopython. Bioconda takes care of this for you so it is the preferred installation method.
Usage
- -h, --help
Show this help message and exit.
- --fastq
A .fastq, .fq, .fastq.gz or .fq.gz file. Interleaved or not.
- --single_end
A flag to specify if the fastq file is inteleaved. single-ended (not paired). Default is false.
- --fastq2
A .fastq, .fq, .fastq.gz or .fq.gz file representing read 2, optional.
- --outfile
The trimmed Fastq file, if it ends in gz it will be gzipped.
- --tempdir
Specify the temp file directory.
- --keeptemp
Should intermediate files be kept?
- --region
Options : {ITS2, ITS1, ALL}
- --taxa
Select the taxonomic group sequenced: {Alveolata, Bryophyta, Bacillariophyta, Amoebozoa, Euglenozoa, Fungi, Chlorophyta, Rhodophyta, Phaeophyceae, Marchantiophyta, Metazoa, Microsporidia, Oomycota, Haptophyceae, Raphidophyceae, Rhizaria, Synurophyceae, Tracheophyta, Eustigmatophyceae, Apusozoa, Parabasalia}
- --log
Log file
- --threads
Number of processor threads to use
Examples
Use case 1: Trimming the ITS2 region from a fungal amplicon sequencing dataset with forward and reverse gzipped fastq files using two cpu threads.
itsxpress --fastq r1.fastq.gz --fastq2 r2.fastq.gz --region ITS2 --taxa Fungi \
--log logfile.txt --outfile trimmed_reads.fastq.gz --threads 2
ITSxpress can take gzipped or ungzipped fastq files and it can write gzipped or ungzipped fastq files. It expects fastq files to end in : .fq, .fastq, .fq.gz or fastq.gz
Use case 2: Trimming the ITS2 region from a fungal amplicon sequencing dataset with an interleaved gzipped fastq files using two cpu threads.
itsxpress --fastq interleaved.fastq.gz --region ITS2 --taxa Fungi \
--log logfile.txt --outfile trimmed_reads.fastq.gz --threads 2
Use case 3: Trimming the ITS2 region from a fungal amplicon sequencing dataset with an interleaved gzipped fastq files using two cpu threads.
itsxpress --fastq single-end.fastq.gz --single_end --region ITS2 --taxa Fungi \
--log logfile.txt --outfile trimmed_reads.fastq.gz --threads 2
Single ended data is less common and may come from a dataset where the reads have already been merged.
Use case 4: Trimming the ITS1 region from a Microsporidia amplicon sequencing dataset with an interleaved gzipped fastq files using 40 cpu threads.
itsxpress --fastq interleaved.fastq.gz --region ITS1 --taxa Microsporidia \
--log logfile.txt --outfile trimmed_reads.fastq.gz --threads 40
License information
This software is a work of the United States Department of Agriculture, Agricultural Research Service. 17 U.S.C. Section 105 states that “Copyright protection under this title is not available for any work of the United States Government”. While I anticipate that this work will be released under a CC0 public domain attribution, only the USDA ARS Office of Technology transfer has the authority to make that determination.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for itsxpress-1.5.6-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 76b8b8716250113c90dde647d63c2013cff5d1210b070b348533ebe6e0184e6b |
|
MD5 | dfe377a0d5aa1ca563d8bc52bcb78618 |
|
BLAKE2b-256 | 34beb8f6eb2ebb9e1fea150ccdfa2236e607626d7b1e63ab0edf76de59973714 |