Samfile long-read filtering script.
Project description
GERENUQ
A simple commandline tool and python functions for filtering long reads from bam, sam and paf red alignment files according to various user-defined parameters.
Installation
Using Conda
$ conda install -c abahcheli gerenuq
Using Pip
$ pip install gerenuq
Using Docker
$ docker pull abahcheli/gerenuq
Manual
$ git clone https://github.com/abahcheli/gerenuq
$ cd gerenuq
$ python setup.py install
Usage
gerenuq
$ Required inputs:
-i / --input <input raw samfile>
-o / --output <output filtered samfile>
Optional inputs:
-l / --length <minimum read length for cutoff (default 1000)>
-m / --matchlength <sequence identity, also known as minimum ratio of matches to read length (default 0.5)>
-s / --score <minimum score for the whole alignment (default 1)>
-q / --lengthscore <minimum ratio of length to score, may be considered as the fraction of bases that have a positive score (default 2)>
-t / --threads <number of processes to run (default 1)>
gerenuq_filter_file(input_file, output_file, min_score = 1, min_len_to_score = 2, min_length = 1000, min_match_to_length = 0.5)
'''
Filters minimap2-mapped reads by mapping score, length, match-to-length and length-to-score ratios. Paf format files only filter by query cutoff.
Requires input_file in bam, sam or paf format and output_file (output in the same format as input).
'''
gerenuq_filter_read_list(read_list, format='sam', min_score = 1, min_len_to_score = 2, min_length = 1000, min_match_to_length = 0.5)
'''
Filters minimap2-mapped reads by mapping score, length, match-to-length and length-to-score ratios. Paf format files only filter by query cutoff.
Requires read_list as list of mapped read lines from sam or paf file (in tsv format). Returns a list of reads in sam or paf (tsv) format that passed filtering parameters. Headers will be ignored and not returned.
'''
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
gerenuq-0.2.2.tar.gz
(6.9 kB
view hashes)
Built Distribution
gerenuq-0.2.2-py3-none-any.whl
(13.4 kB
view hashes)