Minimap2 python binding
Project description
Mappy provides a convenient interface to minimap2, a fast and accurate C program to align genomic and transcribe nucleotide sequences.
Installation
Mappy depends on zlib. It can be installed with pip:
pip install --user mappy
or from the minimap2 github repo (Cython required):
git clone https://github.com/lh3/minimap2
cd minimap2
python setup.py install
Usage
The following Python script demonstrates the key functionality of mappy:
import mappy as mp
a = mp.Aligner("test/MT-human.fa") # load or build index
if not a: raise Exception("ERROR: failed to load/build index")
s = a.seq("MT_human", 100, 200) # retrieve a subsequence from the index
print(mp.revcomp(s)) # reverse complement
for name, seq, qual in mp.fastx_read("test/MT-orang.fa"): # read a fasta/q sequence
for hit in a.map(seq): # traverse alignments
print("{}\t{}\t{}\t{}".format(hit.ctg, hit.r_st, hit.r_en, hit.cigar_str))
APIs
Mappy implements two classes and two global function.
Class mappy.Aligner
mappy.Aligner(fn_idx_in=None, preset=None, ...)
This constructor accepts the following arguments:
fn_idx_in: index or sequence file name. Minimap2 automatically tests the file type. If a sequence file is provided, minimap2 builds an index. The sequence file can be optionally gzip’d. This option has no effect if seq is set.
seq: a single sequence to index. The sequence name will be set to
N/A
.preset: minimap2 preset. Currently, minimap2 supports the following presets: sr for single-end short reads; map-pb for PacBio read-to-reference mapping; map-ont for Oxford Nanopore read mapping; splice for long-read spliced alignment; asm5 for assembly-to-assembly alignment; asm10 for full genome alignment of closely related species. Note that the Python module does not support all-vs-all read overlapping.
k: k-mer length, no larger than 28
w: minimizer window size, no larger than 255
min_cnt: mininum number of minimizers on a chain
min_chain_score: minimum chaing score
bw: chaining and alignment band width (initial chaining and extension)
bw_long: chaining and alignment band width (RMQ-based rechaining and closing gaps)
best_n: max number of alignments to return
n_threads: number of indexing threads; 3 by default
extra_flags: additional flags defined in minimap.h
fn_idx_out: name of file to which the index is written. This parameter has no effect if seq is set.
scoring: scoring system. It is a tuple/list consisting of 4, 6 or 7 positive integers. The first 4 elements specify match scoring, mismatch penalty, gap open and gap extension penalty. The 5th and 6th elements, if present, set long-gap open and long-gap extension penalty. The 7th sets a mismatch penalty involving ambiguous bases.
mappy.Aligner.map(seq, seq2=None, cs=False, MD=False)
This method aligns seq
against the index. It is a generator, yielding
a series of mappy.Alignment
objects. If seq2
is present, mappy
performs paired-end alignment, assuming the two ends are in the FR orientation.
Alignments of the two ends can be distinguished by the read_num
field
(see Class mappy.Alignment below). Argument cs
asks mappy to generate
the cs
tag; MD
is similar. These two arguments might slightly
degrade performance and are not enabled by default.
mappy.Aligner.seq(name, start=0, end=0x7fffffff)
This method retrieves a (sub)sequence from the index and returns it as a Python
string. None
is returned if name
is not present in the index or
the start/end coordinates are invalid.
mappy.Aligner.seq_names
This property gives the array of sequence names in the index.
Class mappy.Alignment
This class describes an alignment. An object of this class has the following properties:
ctg: name of the reference sequence the query is mapped to
ctg_len: total length of the reference sequence
r_st and r_en: start and end positions on the reference
q_st and q_en: start and end positions on the query
strand: +1 if on the forward strand; -1 if on the reverse strand
mapq: mapping quality
blen: length of the alignment, including both alignment matches and gaps but excluding ambiguous bases.
mlen: length of the matching bases in the alignment, excluding ambiguous base matches.
NM: number of mismatches, gaps and ambiguous positions in the alignment
trans_strand: transcript strand. +1 if on the forward strand; -1 if on the reverse strand; 0 if unknown
is_primary: if the alignment is primary (typically the best and the first to generate)
read_num: read number that the alignment corresponds to; 1 for the first read and 2 for the second read
cigar_str: CIGAR string
cigar: CIGAR returned as an array of shape
(n_cigar,2)
. The two numbers give the length and the operator of each CIGAR operation.MD: the
MD
tag as in the SAM format. It is an empty string unless theMD
argument is applied when callingmappy.Aligner.map()
.cs: the
cs
tag.
An Alignment
object can be converted to a string with str()
in
the following format:
q_st q_en strand ctg ctg_len r_st r_en mlen blen mapq cg:Z:cigar_str
It is effectively the PAF format without the QueryName and QueryLength columns (the first two columns in PAF).
Miscellaneous Functions
mappy.fastx_read(fn, read_comment=False)
This generator function opens a FASTA/FASTQ file and yields a
(name,seq,qual)
tuple for each sequence entry. The input file may be
optionally gzip’d. If read_comment
is True, this generator yields
a (name,seq,qual,comment)
tuple instead.
mappy.revcomp(seq)
Return the reverse complement of DNA string seq
. This function
recognizes IUB code and preserves the letter cases. Uracil U
is
complemented to A
.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for ont_mappy-2.28-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e2b92990024d1630eb5862a0abb1fe8eac72a95fba1cbeafeffea290f2f345d8 |
|
MD5 | 2e8d24916b5739a43a1221365ac183d1 |
|
BLAKE2b-256 | 0d3eccaa173f918bd606dce85107ca7b9d165e89d189f2eb4db79bd55acb1b5b |
Hashes for ont_mappy-2.28-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2b2f23ebf6d981f4b4f8630ccfbda37da4ff4faf8b2b5748e82ab35e1f60c5a7 |
|
MD5 | 2f0ea1b5ee569ab44e03451e4d43ae0d |
|
BLAKE2b-256 | bc0486ec4d54812d04284f240f26cac1d04047d2187b38969897a5915e62c684 |
Hashes for ont_mappy-2.28-cp312-cp312-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d9a26c7b309feea83955c9f7fd658d8f20970cec2265a2265808740376fd757f |
|
MD5 | c1a1468211e68e6420d25ec09c8742d8 |
|
BLAKE2b-256 | 0b419397d5e8527386b1bb0d40783bb25fef245e4d79c99b02d7b99e30033731 |
Hashes for ont_mappy-2.28-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 68b62e6afa1df78e1e6396ff4995041f086add6b6da131aa50ed81fbaf7ef54d |
|
MD5 | 1ecfe3c39a63494870e09796cab39a17 |
|
BLAKE2b-256 | 6d223083b5fb044582dec9835fa0a59ff3413894c2c6c198e18a0b01d4202be5 |
Hashes for ont_mappy-2.28-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3e16807c2e75dbbb70f402e3d0172d9a7c7f41bb03d8e5e1005882448fc9ddbb |
|
MD5 | 187fb56811b3962d3a1f64e91ea1542d |
|
BLAKE2b-256 | 47b68fbf1536753b42d94fb354282d08a6dd44a5970e62ede3a6a0a7b550f9d7 |
Hashes for ont_mappy-2.28-cp311-cp311-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9b859a3b1be5c785c3aa5b88f076a45060fce16d0fe3647c16246c1628fdd698 |
|
MD5 | b48bdcb2fe7876bb7d38113917f867be |
|
BLAKE2b-256 | 3bd443d6a8b0c41747a4eb9dbbc9b03eb0c963907f22c21c90679e1b975d9d60 |
Hashes for ont_mappy-2.28-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 62184ab48256b96f16e82a7aa5562bad22dfc602890ad8f4c94a0c6e03115218 |
|
MD5 | fabc5a9b0ca6a597cd3d1468915a13b0 |
|
BLAKE2b-256 | 03e114696aecf5f3ccb537a1bfdabe948e8fae7b5334a562b024ec31f617e474 |
Hashes for ont_mappy-2.28-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 894d8252fe33e9c7f30d460b76de9ea68ee2ad93515e87a37f17a01d169d197f |
|
MD5 | 1f72f04c0cdcd9c28446181e445bffc7 |
|
BLAKE2b-256 | 320fd85262e5c8d0edfdf9be17f3fb12b7659c458902d44dd21e960f8e23be4d |
Hashes for ont_mappy-2.28-cp310-cp310-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4a5f3c2a301c323cc4d2c13e7606bcc35d88b28bc0858b81479c79022b971a36 |
|
MD5 | 9b8e1382c7dc331deb1c8736366aa7c5 |
|
BLAKE2b-256 | 3153f4073a3a09d40b41cc6a7cce58cdfe189510742ab784f3b6e634714a7710 |
Hashes for ont_mappy-2.28-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 67fb3b05284f4feda549535aec9fae8109e355af1f73ec9e233b9cce6a23eea8 |
|
MD5 | 35e8c82f4957d5402b95d0e76a44e53c |
|
BLAKE2b-256 | 22d89ba1d5546a3aea6998246b1ed0072923cdf2a73f3c32443d6be6013d035c |
Hashes for ont_mappy-2.28-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 72f2f8ff5867398fc44abc5b4950a68d2419537cc90e75c590b7f9d80e5483a6 |
|
MD5 | fa063691ef9a58795c2c4cf64c2e2400 |
|
BLAKE2b-256 | 9ac226ee9dca7e348a63e904eda1bc5ca6b58e1af7ea28cbece7a5fb53f14f65 |
Hashes for ont_mappy-2.28-cp39-cp39-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 063a1546dc198f6de45846ef7bbd728777fb66220c833852d8bf800dc8f3d1f4 |
|
MD5 | 9babbaf62f344afe676cdfa00fe2605f |
|
BLAKE2b-256 | 82b30650787458f5c05c2828fb91e09910c690e113d0f164f68f565e54b85117 |
Hashes for ont_mappy-2.28-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3152abf8a6d765325ef47f1e32312dee16dccb5364516d21e4b954becf0207e7 |
|
MD5 | d80b3f0ca08909b0f6c5a69bf9e7f674 |
|
BLAKE2b-256 | d8f24bdcf98ce4a72c01d16872aacaa45a471700e896e365803c43c5ddd3fec8 |
Hashes for ont_mappy-2.28-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 276a315430d1a49bc51d9b129bfdafe36a749275af37b5a05bd0134d5fda2fb4 |
|
MD5 | 8446fa50fc3a68c36cff3f63f3fc40c1 |
|
BLAKE2b-256 | e7b45c14f41e1d968742beef2e87d528fcb962aa391757d4fc2c416d84e60899 |
Hashes for ont_mappy-2.28-cp38-cp38-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ca8defc3d7bc8f06e191f6914dee526f3a52bafc1a4321cfb9f16428c183aedb |
|
MD5 | bbbebfd942cf48326d482f00cf21e9d5 |
|
BLAKE2b-256 | a18dd5516d950f015b3cd74cc2966902607622b61cc67b66c97fb04fd96cdaf0 |