Divide amplicon sequences
Helper for join and split fastq files.
Required python 3.6 or above.
# Linux ## split python3 join_and_split.py split -m fastq_file ## join python3 join_and_split.py join -f forward.fastq -r reverse.fastq # Windows ## split python join_and_split.py split -m fastq_file ## join python join_and_split.py join -f forward.fastq -r reverse.fastq
Use -t to set linker text, by default the program use "JOINTEXT".
When split, "fastq_file" could be multiple files, use "*.fastq" (include quotation mark) to represent all ".fastq" files in current folder.
Divide NGS data by barcode and primer.
- Python 3.5 or above
- vsearch (Optional)
To install Biopython and regex, run as administrator:
pip install biopython regex
Support ambiguous base.
Extend vsearch options. Improve output
Use regex instead of BLAST. Faster and easier.
Parallel version, use BLAST.
Single core version. Use BLAST.
It can handle merged pair-end sequence like this:
Or just handle one direction:
Sequences will be divided by barcode according to given barcode file. If barcode is wrong even only one base, it will be dropped.
Some one adds sequence between barcode and primer, if you do not have it, just set adapter length to zero by "--adapter 0". The default value is 14.
Use "-m" to set barcode mode, like "8*1", means barcode with length 5 repeats only 1 times. The default is "5*2", i.e., 5-base barcode repeats twice.
Note that the forward and reverse barcode may be different sequence, but they SHOULD FOLLOW THE SAME MODE!
Use "-s" or "--strict" to use strict version. If set, the program will check barcode in head and tail is equal or not and whether barcode in tail (3') is correct. If not, it will only check barcode in head (5') of sequence.
Barcode file looks like this:
The barcode-f means barcode in 5' direction and barcode-r means barcode in 3' direction. All sequences should be forward.
If forward and reverse barcode are same, you can omit the reverse barcode in the table.
To avoid potential error, please do not use space in sample info.
And notice that here it use English comma to seperate two fields rather than Chinese comma.
Primer file looks like this:
You can use Microsoft Excel to prepare these two files and save as CSV format, or use any text editor you prefer.
Make sure you don't miss the first line.
If you use PBS task submitting system, you can use this script to submit the task, and you can finish the work from combine two direction sequence by flash and join_fastq.py to divide them.
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Hashes for divide_seq-5.22-py3-none-any.whl