Skip to main content

minimap2 PAF file reader

Project description

readpaf

Build PyPI

readpaf is a fast parser for minimap2 PAF (Pairwise mApping Format) files. It is written in pure python with no required dependencies unless a pandas DataFrame is required.

Installation

Minimal install:

pip install readpaf

With optional pandas dependency:

pip install readpaf[pandas]
Direct download As readpaf is a self contained module it can be installed by downloading just the module. The latest version is available from:
https://raw.githubusercontent.com/alexomics/read-paf/main/readpaf.py

or a specific version can be downloaded from a release/tag like so:

https://raw.githubusercontent.com/alexomics/read-paf/v0.0.5/readpaf.py

PyPI is the recommended install method.

Usage

readpaf only has one user function, parse_paf that accepts of file-like object; this is any object in python that has a file-oriented API (sys.stdin, stdout from subprocess, io.StringIO, open files from gzip or open).

The following script demonstrates how minimap2 output can be piped into readpaf

from readpaf import parse_paf
from sys import stdin

for record in parse_paf(stdin):
    print(record.query_name, record.target_name)

readpaf can also generate a pandas DataFrame:

from readpaf import parse_paf

with open("test.paf", "r") as handle:
    df = parse_paf(handle, dataframe=True)

Functions

readpaf has a single user function

parse_paf

parse_paf(file_like=file_handle, fields=list, na_values=list, na_rep=numeric, dataframe=bool)

Parameters:

  • file_like: A file like object, such as sys.stdin, a file handle from open or io.StringIO objects
  • fields: A list of 13 field names to use for the PAF file, default:
    "query_name", "query_length", "query_start", "query_end", "strand",
    "target_name", "target_length", "target_start", "target_end",
    "residue_matches", "alignment_block_length", "mapping_quality", "tags"
    
    These are based on the PAF specification.
  • na_values: A list of values to interpret as NaN. This is only applied to numeric fields, default: ["*"]
  • na_rep: Value to use when a NaN value specified in na_values is found. This should ideally be 0 to match minimap2's output default: 0
  • dataframe: bool, if True, return a pandas.DataFrame with the tags expanded into separate Series

If used as an iterator, then each object returned is a named tuple representing a single line in the PAF file. Each named tuple has field names as specified by the fields parameter. The SAM-like tags are converted into their specified types and stored in a dictionary with the tag name as the key and the value a named tuple with fields name, type, and value. When print or str are called on PAF record (named tuple) a formated PAF string is returned, which is useful for writing records to a file. The PAF record also has a method blast_identity which calculates the blast identity for that record.

If used to generate a pandas DataFrame, then each row represents a line in the PAF file and the SAM-like tags are expanded into individual series.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

readpaf-0.0.10.tar.gz (5.9 kB view details)

Uploaded Source

Built Distribution

readpaf-0.0.10-py2.py3-none-any.whl (6.1 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file readpaf-0.0.10.tar.gz.

File metadata

  • Download URL: readpaf-0.0.10.tar.gz
  • Upload date:
  • Size: 5.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.10

File hashes

Hashes for readpaf-0.0.10.tar.gz
Algorithm Hash digest
SHA256 2a9e941b7d212d7402bb70838c0ff48257e3af00d64cce7cb8977fa2a773a696
MD5 4a2c82dbd2af36160378e7e92c3ef4e9
BLAKE2b-256 1a6d488b0ca4d813d3fc8b151416f2afa934cb16b4c8ab5fffd03f728b993e67

See more details on using hashes here.

File details

Details for the file readpaf-0.0.10-py2.py3-none-any.whl.

File metadata

  • Download URL: readpaf-0.0.10-py2.py3-none-any.whl
  • Upload date:
  • Size: 6.1 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.10

File hashes

Hashes for readpaf-0.0.10-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 4f2c5301aef061a0e996676ca5d00ab2d67e5ec713e7805152980d4e9bbc4524
MD5 6cfcef6ee6f5217dfb6b749eb655b5d5
BLAKE2b-256 b7bccceefca2aaa33539cf91258bd0a9f4c2e3c2ccda61888bc5b6ed26cb52d9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page