Skip to main content

Minimap2 python binding

Project description

Mappy provides a convenient interface to minimap2, a fast and accurate C program to align genomic and transcribe nucleotide sequences.

Installation

Mappy depends on zlib. It can be installed with pip:

pip install --user mappy

or from the minimap2 github repo (Cython required):

git clone https://github.com/lh3/minimap2
cd minimap2
python setup.py install

Usage

The following Python script demonstrates the key functionality of mappy:

import mappy as mp
a = mp.Aligner("test/MT-human.fa")  # load or build index
if not a: raise Exception("ERROR: failed to load/build index")
for name, seq, qual in mp.fastx_read("test/MT-orang.fa"): # read a fasta/q sequence
        for hit in a.map(seq): # traverse alignments
                print("{}\t{}\t{}\t{}".format(hit.ctg, hit.r_st, hit.r_en, hit.cigar_str))

APIs

Mappy implements two classes and one global function.

Class mappy.Aligner

mappy.Aligner(fn_idx_in, preset=None, ...)

This constructor accepts the following arguments:

  • fn_idx_in: index or sequence file name. Minimap2 automatically tests the file type. If a sequence file is provided, minimap2 builds an index. The sequence file can be optionally gzip’d.

  • preset: minimap2 preset. Currently, minimap2 supports the following presets: sr for single-end short reads; map-pb for PacBio read-to-reference mapping; map-ont for Oxford Nanopore read mapping; splice for long-read spliced alignment; asm5 for assembly-to-assembly alignment; asm10 for full genome alignment of closely related species. Note that the Python module does not support all-vs-all read overlapping.

  • k: k-mer length, no larger than 28

  • w: minimizer window size, no larger than 255

  • min_cnt: mininum number of minimizers on a chain

  • min_chain_score: minimum chaing score

  • bw: chaining and alignment band width

  • best_n: max number of alignments to return

  • n_threads: number of indexing threads; 3 by default

  • fn_idx_out: name of file to which the index is written

mappy.Aligner.map(seq)

This method aligns seq against the index. It is a generator, yielding a series of mappy.Alignment objects.

Class mappy.Alignment

This class describes an alignment. An object of this class has the following properties:

  • ctg: name of the reference sequence the query is mapped to

  • ctg_len: total length of the reference sequence

  • r_st and r_en: start and end positions on the reference

  • q_st and q_en: start and end positions on the query

  • strand: +1 if on the forward strand; -1 if on the reverse strand

  • mapq: mapping quality

  • blen: length of the alignment, including both alignment matches and gaps but excluding ambiguous bases.

  • mlen: length of the matching bases in the alignment, excluding ambiguous base matches.

  • NM: number of mismatches, gaps and ambiguous poistions in the alignment

  • trans_strand: transcript strand. +1 if on the forward strand; -1 if on the reverse strand; 0 if unknown

  • is_primary: if the alignment is primary (typically the best and the first to generate)

  • cigar_str: CIGAR string

  • cigar: CIGAR returned as an array of shape (n_cigar,2). The two numbers give the length and the operator of each CIGAR operation.

An Alignment object can be converted to a string with str() in the following format:

q_st  q_en  strand  ctg  ctg_len  r_st  r_en  mlen  blen  mapq  cg:Z:cigar_str

It is effectively the PAF format without the QueryName and QueryLength columns (the first two columns in PAF).

Function mappy.fastx_read

mappy.fastx_read(fn)

This generator function opens a FASTA/FASTQ file and yields a (name,seq,qual) tuple for each sequence entry. The input file may be optionally gzip’d.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mappy-2.7.tar.gz (144.7 kB view details)

Uploaded Source

File details

Details for the file mappy-2.7.tar.gz.

File metadata

  • Download URL: mappy-2.7.tar.gz
  • Upload date:
  • Size: 144.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for mappy-2.7.tar.gz
Algorithm Hash digest
SHA256 c690b8a5aff8dc450d4cb7d5b2013330e0fa6340612fa5f7aaa74f0f5085e4a8
MD5 af12d477a0d26a2e64c6118e6fa312a7
BLAKE2b-256 a3910a6ab427f06c4c1ac8e8e06a198d1389a0aeb61546ab84a8d37fa48f5f79

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page