minimap2 PAF file reader
Project description
readpaf
readpaf is a fast parser for minimap2 PAF (Pairwise mApping Format) files. It is written in pure python with no required dependencies unless a pandas DataFrame is required.
Installation
Minimal install:
pip install readpaf
With optional pandas
dependency:
pip install readpaf[pandas]
Direct download
As readpaf is a self contained module it can be installed by downloading just the module. The latest version is available from:https://raw.githubusercontent.com/alexomics/read-paf/main/readpaf.py
or a specific version can be downloaded from a release/tag like so:
https://raw.githubusercontent.com/alexomics/read-paf/v0.0.5/readpaf.py
PyPI is the recommended install method.
Usage
readpaf only has one user function, parse_paf
that accepts of file-like object; this
is any object in python that has a file-oriented API (sys.stdin
, stdout
from subprocess,
io.StringIO
, open files from gzip
or open
).
The following script demonstrates how minimap2 output can be piped into readpaf
from readpaf import parse_paf
from sys import stdin
for record in parse_paf(stdin):
print(record.query_name, record.target_name)
readpaf can also generate a pandas DataFrame:
from readpaf import parse_paf
with open("test.paf", "r") as handle:
df = parse_paf(handle, dataframe=True)
Functions
readpaf has a single user function
parse_paf
parse_paf(file_like=file_handle, fields=list, na_values=list, na_rep=numeric, dataframe=bool)
Parameters:
- file_like: A file like object, such as
sys.stdin
, a file handle from open or io.StringIO objects - fields: A list of 13 field names to use for the PAF file, default:
"query_name", "query_length", "query_start", "query_end", "strand", "target_name", "target_length", "target_start", "target_end", "residue_matches", "alignment_block_length", "mapping_quality", "tags"
These are based on the PAF specification. - na_values: A list of values to interpret as NaN. This is only applied to numeric fields, default:
["*"]
- na_rep: Value to use when a NaN value specified in
na_values
is found. This should ideally be0
to match minimap2's output default:0
- dataframe: bool, if True, return a pandas.DataFrame with the tags expanded into separate Series
If used as an iterator, then each object returned is a named tuple representing a single line in the PAF file.
Each named tuple has field names as specified by the fields
parameter.
The SAM-like tags are converted into their specified types and stored in a dictionary with the tag name as the key and the value a named tuple with fields name
, type
, and value
.
When print
or str
are called on PAF
record (named tuple) a formated PAF string is returned, which is useful for writing records to a file.
The PAF
record also has a method blast_identity
which calculates the blast identity for that record.
If used to generate a pandas DataFrame, then each row represents a line in the PAF file and the SAM-like tags are expanded into individual series.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file readpaf-0.0.10.tar.gz
.
File metadata
- Download URL: readpaf-0.0.10.tar.gz
- Upload date:
- Size: 5.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2a9e941b7d212d7402bb70838c0ff48257e3af00d64cce7cb8977fa2a773a696 |
|
MD5 | 4a2c82dbd2af36160378e7e92c3ef4e9 |
|
BLAKE2b-256 | 1a6d488b0ca4d813d3fc8b151416f2afa934cb16b4c8ab5fffd03f728b993e67 |
File details
Details for the file readpaf-0.0.10-py2.py3-none-any.whl
.
File metadata
- Download URL: readpaf-0.0.10-py2.py3-none-any.whl
- Upload date:
- Size: 6.1 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4f2c5301aef061a0e996676ca5d00ab2d67e5ec713e7805152980d4e9bbc4524 |
|
MD5 | 6cfcef6ee6f5217dfb6b749eb655b5d5 |
|
BLAKE2b-256 | b7bccceefca2aaa33539cf91258bd0a9f4c2e3c2ccda61888bc5b6ed26cb52d9 |