Skip to main content

Separate masked and unmasked parts of sequences in FASTX files.

Project description

splitmasked

pytest-badge

splitmasked splits sequence records in FAST(A/Q) files based on their masking status. What constitutes masking can be defined with the --maskchar option (eg. N or lowercase). Both masked and unmasked parts can be retained and written to separate output files.

Installation

pip install splitmasked

Usage

splitmasked \
    --maskchar lowercase \
    --minlength_masked 100 \
    --minlength_unmasked 20 \
    --outfile_masked /dev/null \
    --outfile_unmasked unmasked.fastq \
    input.fastq

Examples

Input

@Seq1 comment1
aaaaaTTTTTTAAgatgatgatgAATGAA
+
AAAAAAAAAAAAAAAAAAAAAAAAAAAAA
@Seq2 comment2
ATGATAGAgagagtTTTATA
+
HHHHHHHHHHHHHHHHHHHH

Output

With --maskchar lowercase:

unmasked.fastq

@Seq1_part2 comment1
TTTTTTAA
+
AAAAAAAA
@Seq1_part4 comment1
AATGAA
+
AAAAAA
@Seq2_part1 comment2
ATGATAGA
+
HHHHHHHH
@Seq2_part3 comment2
TTTATA
+
HHHHHH

masked.fastq

@Seq1_part1 comment1
aaaaa
+
AAAAA
@Seq1_part3 comment1
gatgatgatg
+
AAAAAAAAAA
@Seq2_part2 comment2
gagagt
+
HHHHHH

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

splitmasked-0.1.1.tar.gz (5.0 kB view hashes)

Uploaded Source

Built Distribution

splitmasked-0.1.1-py3-none-any.whl (5.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page