Separate masked and unmasked parts of sequences in FASTX files.
Project description
splitmasked
splitmasked
splits sequence records in FAST(A/Q) files based on their masking
status. What constitutes masking can be defined with the --maskchar
option
(eg. N
or lowercase
). Both masked and unmasked parts can be retained and
written to separate output files.
Installation
pip install splitmasked
Usage
splitmasked \
--maskchar lowercase \
--minlength_masked 100 \
--minlength_unmasked 20 \
--outfile_masked /dev/null \
--outfile_unmasked unmasked.fastq \
input.fastq
Examples
Input
@Seq1 comment1
aaaaaTTTTTTAAgatgatgatgAATGAA
+
AAAAAAAAAAAAAAAAAAAAAAAAAAAAA
@Seq2 comment2
ATGATAGAgagagtTTTATA
+
HHHHHHHHHHHHHHHHHHHH
Output
With --maskchar lowercase
:
unmasked.fastq
@Seq1_part2 comment1
TTTTTTAA
+
AAAAAAAA
@Seq1_part4 comment1
AATGAA
+
AAAAAA
@Seq2_part1 comment2
ATGATAGA
+
HHHHHHHH
@Seq2_part3 comment2
TTTATA
+
HHHHHH
masked.fastq
@Seq1_part1 comment1
aaaaa
+
AAAAA
@Seq1_part3 comment1
gatgatgatg
+
AAAAAAAAAA
@Seq2_part2 comment2
gagagt
+
HHHHHH
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
splitmasked-0.1.0.tar.gz
(5.0 kB
view hashes)
Built Distribution
Close
Hashes for splitmasked-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cf408823de7083ad107f6a306f7be9b3a04fb2c8c792671c850065fc5c9bcab4 |
|
MD5 | 4aa736b5d4d5a1d3ea76251a9d6bd9da |
|
BLAKE2b-256 | 84aa0c8fdeadbfd09ab1c27c6d7dfbc8decff804bfa62abbfcb5b8a6695b2c75 |