Utilities to work with paired sequence files
Copyright 2012 Lance Parsons <firstname.lastname@example.org>
BSD 2-Clause License http://www.opensource.org/licenses/BSD-2-Clause - See LICENSE.txt
Install BioPython version 1.57 or above (required for paired_sequence_match.py):
pip install BioPython
pip install paired_sequence_utils
Takes two sequence files as input and matches up paired sequences, outputting them separately from orphan sequences. Useful when paired reads are in two separate files and were filtered separately. By default, paired reads are output interleaved with another (read 1 and read 2 of a pair, then read 1 and read 2 of a second pair, etc.). If the paired output file is specified twice, the first read is output to the in the first file, the second read of a pair is output in the second file.
Output paired reads interleaved to STDOUT and the single reads to STDERR:
paired_sequence_match.py read1.fastq read2.fastq > paired_reads.fastq 2>single_reads.fastq
Output paired reads to separate files:
paired_sequence_match.py read1.fastq read2.fastq -p read1_paired.fastq -p read2_paired.fastq -s single_reads.fastq
NOTE: This script requires BioPython (http://biopython.org) version 1.57 or above
Split multiple fastq files by matching barcodes in one of the sequence files. Barcodes in the tab-delimited barcodes.txt file are matched against the beginning of the specified index read By default, barcodes must match exactly, but –mistmatches can be set higher if desired If input files are gzipped, the output is as well. Compression can be forced with the –gzip option.
Split a an Illumina paired-end run where the index read is read 2, the forward read is read 1, and the reverse read is read 3:
barcode_splitter.py --bcfile barcodes.txt read1.fastq read2_index.fastq read3.fastq --idxread 2 --suffix .fastq
TODO: Figure out how to actually get changelog content.
Changelog content for this version goes here.