Separate masked and unmasked parts of sequences in FASTX files.
Project description
splitmasked
splitmasked
splits sequence records in FAST(A/Q) files based on their masking
status. What constitutes masking can be defined with the --maskchar
option
(eg. N
or lowercase
). Both masked and unmasked parts can be retained and
written to separate output files.
Installation
pip install splitmasked
Usage
splitmasked \
--maskchar lowercase \
--minlength_masked 100 \
--minlength_unmasked 20 \
--outfile_masked /dev/null \
--outfile_unmasked unmasked.fastq \
input.fastq
Examples
Input
@Seq1 comment1
aaaaaTTTTTTAAgatgatgatgAATGAA
+
AAAAAAAAAAAAAAAAAAAAAAAAAAAAA
@Seq2 comment2
ATGATAGAgagagtTTTATA
+
HHHHHHHHHHHHHHHHHHHH
Output
With --maskchar lowercase
:
unmasked.fastq
@Seq1_part2 comment1
TTTTTTAA
+
AAAAAAAA
@Seq1_part4 comment1
AATGAA
+
AAAAAA
@Seq2_part1 comment2
ATGATAGA
+
HHHHHHHH
@Seq2_part3 comment2
TTTATA
+
HHHHHH
masked.fastq
@Seq1_part1 comment1
aaaaa
+
AAAAA
@Seq1_part3 comment1
gatgatgatg
+
AAAAAAAAAA
@Seq2_part2 comment2
gagagt
+
HHHHHH
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
splitmasked-0.1.1.tar.gz
(5.0 kB
view hashes)
Built Distribution
Close
Hashes for splitmasked-0.1.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 590ca763d2a3fe6b5568e0bbfaa088876baead7b0080887694ddca8075a83305 |
|
MD5 | 5058cfb1fe66b1da6f2026bb48e25a0b |
|
BLAKE2b-256 | 05dd56ddee84f4b47539660671839b968baded3c092d76f714b23b1b3d2b11f8 |