mappy·PyPI

Minimap2 python binding

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 5 - Production/Stable
Intended Audience
- Science/Research
License
- OSI Approved :: MIT License
Operating System
- POSIX
Programming Language
- C
- Cython
- Python :: 2.7
- Python :: 3
Topic
- Scientific/Engineering :: Bio-Informatics

Project description

Mappy provides a convenient interface to minimap2, a fast and accurate C program to align genomic and transcribe nucleotide sequences.

Installation

Mappy depends on zlib. It can be installed with pip:

pip install --user mappy

or from the minimap2 github repo (Cython required):

git clone https://github.com/lh3/minimap2
cd minimap2
python setup.py install

Usage

The following Python script demonstrates the key functionality of mappy:

import mappy as mp
a = mp.Aligner("test/MT-human.fa")  # load or build index
if not a: raise Exception("ERROR: failed to load/build index")
s = a.seq("MT_human", 100, 200)     # retrieve a subsequence from the index
print(mp.revcomp(s))                # reverse complement
for name, seq, qual in mp.fastx_read("test/MT-orang.fa"): # read a fasta/q sequence
        for hit in a.map(seq): # traverse alignments
                print("{}\t{}\t{}\t{}".format(hit.ctg, hit.r_st, hit.r_en, hit.cigar_str))

APIs

Mappy implements two classes and two global function.

Class mappy.Aligner

mappy.Aligner(fn_idx_in=None, preset=None, ...)

This constructor accepts the following arguments:

fn_idx_in: index or sequence file name. Minimap2 automatically tests the file type. If a sequence file is provided, minimap2 builds an index. The sequence file can be optionally gzip’d. This option has no effect if seq is set.
seq: a single sequence to index. The sequence name will be set to N/A.
preset: minimap2 preset. Currently, minimap2 supports the following presets: sr for single-end short reads; map-pb for PacBio read-to-reference mapping; map-ont for Oxford Nanopore read mapping; splice for long-read spliced alignment; asm5 for assembly-to-assembly alignment; asm10 for full genome alignment of closely related species. Note that the Python module does not support all-vs-all read overlapping.
k: k-mer length, no larger than 28
w: minimizer window size, no larger than 255
min_cnt: mininum number of minimizers on a chain
min_chain_score: minimum chaing score
bw: chaining and alignment band width (initial chaining and extension)
bw_long: chaining and alignment band width (RMQ-based rechaining and closing gaps)
best_n: max number of alignments to return
n_threads: number of indexing threads; 3 by default
extra_flags: additional flags defined in minimap.h
fn_idx_out: name of file to which the index is written. This parameter has no effect if seq is set.
scoring: scoring system. It is a tuple/list consisting of 4, 6 or 7 positive integers. The first 4 elements specify match scoring, mismatch penalty, gap open and gap extension penalty. The 5th and 6th elements, if present, set long-gap open and long-gap extension penalty. The 7th sets a mismatch penalty involving ambiguous bases.

mappy.Aligner.map(seq, seq2=None, cs=False, MD=False)

This method aligns seq against the index. It is a generator, yielding a series of mappy.Alignment objects. If seq2 is present, mappy performs paired-end alignment, assuming the two ends are in the FR orientation. Alignments of the two ends can be distinguished by the read_num field (see Class mappy.Alignment below). Argument cs asks mappy to generate the cs tag; MD is similar. These two arguments might slightly degrade performance and are not enabled by default.

mappy.Aligner.seq(name, start=0, end=0x7fffffff)

This method retrieves a (sub)sequence from the index and returns it as a Python string. None is returned if name is not present in the index or the start/end coordinates are invalid.

mappy.Aligner.seq_names

This property gives the array of sequence names in the index.

Class mappy.Alignment

This class describes an alignment. An object of this class has the following properties:

ctg: name of the reference sequence the query is mapped to
ctg_len: total length of the reference sequence
r_st and r_en: start and end positions on the reference
q_st and q_en: start and end positions on the query
strand: +1 if on the forward strand; -1 if on the reverse strand
mapq: mapping quality
blen: length of the alignment, including both alignment matches and gaps but excluding ambiguous bases.
mlen: length of the matching bases in the alignment, excluding ambiguous base matches.
NM: number of mismatches, gaps and ambiguous positions in the alignment
trans_strand: transcript strand. +1 if on the forward strand; -1 if on the reverse strand; 0 if unknown
is_primary: if the alignment is primary (typically the best and the first to generate)
read_num: read number that the alignment corresponds to; 1 for the first read and 2 for the second read
cigar_str: CIGAR string
cigar: CIGAR returned as an array of shape (n_cigar,2). The two numbers give the length and the operator of each CIGAR operation.
MD: the MD tag as in the SAM format. It is an empty string unless the MD argument is applied when calling mappy.Aligner.map().
cs: the cs tag.

An Alignment object can be converted to a string with str() in the following format:

q_st  q_en  strand  ctg  ctg_len  r_st  r_en  mlen  blen  mapq  cg:Z:cigar_str

It is effectively the PAF format without the QueryName and QueryLength columns (the first two columns in PAF).

Miscellaneous Functions

mappy.fastx_read(fn, read_comment=False)

This generator function opens a FASTA/FASTQ file and yields a (name,seq,qual) tuple for each sequence entry. The input file may be optionally gzip’d. If read_comment is True, this generator yields a (name,seq,qual,comment) tuple instead.

mappy.revcomp(seq)

Return the reverse complement of DNA string seq. This function recognizes IUB code and preserves the letter cases. Uracil U is complemented to A.

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 5 - Production/Stable
Intended Audience
- Science/Research
License
- OSI Approved :: MIT License
Operating System
- POSIX
Programming Language
- C
- Cython
- Python :: 2.7
- Python :: 3
Topic
- Scientific/Engineering :: Bio-Informatics

Release history Release notifications | RSS feed

This version

2.30

Jun 15, 2025

2.29

Apr 18, 2025

2.28

Mar 27, 2024

2.27

Mar 12, 2024

2.26

Apr 29, 2023

2.25

Apr 27, 2023

2.24

Dec 26, 2021

2.23

Nov 18, 2021

2.22

Aug 7, 2021

2.21

Jul 7, 2021

2.20

May 27, 2021

2.19

May 27, 2021

2.18

Apr 9, 2021

2.17

May 5, 2019

2.16

Feb 28, 2019

2.15

Jan 10, 2019

2.14

Nov 6, 2018

2.13

Oct 11, 2018

2.12

Aug 6, 2018

2.11

Jun 21, 2018

2.10

Mar 27, 2018

2.9

Feb 24, 2018

2.8

Feb 1, 2018

2.7

Jan 9, 2018

2.6

Dec 12, 2017

2.5

Nov 11, 2017

2.4

Nov 6, 2017

2.3

Oct 23, 2017

2.2

Sep 18, 2017

2.2rc1 pre-release

Sep 17, 2017

2.2rc0 pre-release

Sep 17, 2017

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mappy-2.30.tar.gz (143.8 kB view details)

Uploaded Jun 15, 2025 Source

File details

Details for the file mappy-2.30.tar.gz.

File metadata

Download URL: mappy-2.30.tar.gz
Upload date: Jun 15, 2025
Size: 143.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for mappy-2.30.tar.gz
Algorithm	Hash digest
SHA256	`a25448004558a28cb0d74fb1e55b6ffe9a78aa15dd6b2763630fbbabbaa97a27`
MD5	`29efd3180bcb9eb2c31cf28809dba7ab`
BLAKE2b-256	`782521f1816701c7366cc2d1318c37fe374edd242e966ec3ad152f1742cef7a2`

See more details on using hashes here.

mappy 2.30

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Installation

Usage

APIs

Class mappy.Aligner

Class mappy.Alignment

Miscellaneous Functions

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes