Skip to main content

Minimap2 python binding

Project description

Mappy provides a convenient interface to minimap2, a fast and accurate C program to align genomic and transcribe nucleotide sequences.

Installation

Mappy depends on zlib. It can be installed with pip:

pip install --user mappy

or from the minimap2 github repo (Cython required):

git clone https://github.com/lh3/minimap2
cd minimap2
python setup.py install

Usage

The following Python script demonstrates the key functionality of mappy:

import mappy as mp
a = mp.Aligner("test/MT-human.fa")  # load or build index
if not a: raise Exception("ERROR: failed to load/build index")
s = a.seq("MT_human", 100, 200)     # retrieve a subsequence from the index
print(mp.revcomp(s))                # reverse complement
for name, seq, qual in mp.fastx_read("test/MT-orang.fa"): # read a fasta/q sequence
        for hit in a.map(seq): # traverse alignments
                print("{}\t{}\t{}\t{}".format(hit.ctg, hit.r_st, hit.r_en, hit.cigar_str))

APIs

Mappy implements two classes and two global function.

Class mappy.Aligner

mappy.Aligner(fn_idx_in=None, preset=None, ...)

This constructor accepts the following arguments:

  • fn_idx_in: index or sequence file name. Minimap2 automatically tests the file type. If a sequence file is provided, minimap2 builds an index. The sequence file can be optionally gzip’d. This option has no effect if seq is set.

  • seq: a single sequence to index. The sequence name will be set to N/A.

  • preset: minimap2 preset. Currently, minimap2 supports the following presets: sr for single-end short reads; map-pb for PacBio read-to-reference mapping; map-ont for Oxford Nanopore read mapping; splice for long-read spliced alignment; asm5 for assembly-to-assembly alignment; asm10 for full genome alignment of closely related species. Note that the Python module does not support all-vs-all read overlapping.

  • k: k-mer length, no larger than 28

  • w: minimizer window size, no larger than 255

  • min_cnt: mininum number of minimizers on a chain

  • min_chain_score: minimum chaing score

  • bw: chaining and alignment band width (initial chaining and extension)

  • bw_long: chaining and alignment band width (RMQ-based rechaining and closing gaps)

  • best_n: max number of alignments to return

  • n_threads: number of indexing threads; 3 by default

  • extra_flags: additional flags defined in minimap.h

  • fn_idx_out: name of file to which the index is written. This parameter has no effect if seq is set.

  • scoring: scoring system. It is a tuple/list consisting of 4, 6 or 7 positive integers. The first 4 elements specify match scoring, mismatch penalty, gap open and gap extension penalty. The 5th and 6th elements, if present, set long-gap open and long-gap extension penalty. The 7th sets a mismatch penalty involving ambiguous bases.

mappy.Aligner.map(seq, seq2=None, cs=False, MD=False)

This method aligns seq against the index. It is a generator, yielding a series of mappy.Alignment objects. If seq2 is present, mappy performs paired-end alignment, assuming the two ends are in the FR orientation. Alignments of the two ends can be distinguished by the read_num field (see Class mappy.Alignment below). Argument cs asks mappy to generate the cs tag; MD is similar. These two arguments might slightly degrade performance and are not enabled by default.

mappy.Aligner.seq(name, start=0, end=0x7fffffff)

This method retrieves a (sub)sequence from the index and returns it as a Python string. None is returned if name is not present in the index or the start/end coordinates are invalid.

mappy.Aligner.seq_names

This property gives the array of sequence names in the index.

Class mappy.Alignment

This class describes an alignment. An object of this class has the following properties:

  • ctg: name of the reference sequence the query is mapped to

  • ctg_len: total length of the reference sequence

  • r_st and r_en: start and end positions on the reference

  • q_st and q_en: start and end positions on the query

  • strand: +1 if on the forward strand; -1 if on the reverse strand

  • mapq: mapping quality

  • blen: length of the alignment, including both alignment matches and gaps but excluding ambiguous bases.

  • mlen: length of the matching bases in the alignment, excluding ambiguous base matches.

  • NM: number of mismatches, gaps and ambiguous positions in the alignment

  • trans_strand: transcript strand. +1 if on the forward strand; -1 if on the reverse strand; 0 if unknown

  • is_primary: if the alignment is primary (typically the best and the first to generate)

  • read_num: read number that the alignment corresponds to; 1 for the first read and 2 for the second read

  • cigar_str: CIGAR string

  • cigar: CIGAR returned as an array of shape (n_cigar,2). The two numbers give the length and the operator of each CIGAR operation.

  • MD: the MD tag as in the SAM format. It is an empty string unless the MD argument is applied when calling mappy.Aligner.map().

  • cs: the cs tag.

An Alignment object can be converted to a string with str() in the following format:

q_st  q_en  strand  ctg  ctg_len  r_st  r_en  mlen  blen  mapq  cg:Z:cigar_str

It is effectively the PAF format without the QueryName and QueryLength columns (the first two columns in PAF).

Miscellaneous Functions

mappy.fastx_read(fn, read_comment=False)

This generator function opens a FASTA/FASTQ file and yields a (name,seq,qual) tuple for each sequence entry. The input file may be optionally gzip’d. If read_comment is True, this generator yields a (name,seq,qual,comment) tuple instead.

mappy.revcomp(seq)

Return the reverse complement of DNA string seq. This function recognizes IUB code and preserves the letter cases. Uracil U is complemented to A.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ont_mappy-2.28.tar.gz (136.8 kB view details)

Uploaded Source

Built Distributions

ont_mappy-2.28-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (780.9 kB view details)

Uploaded CPython 3.12 manylinux: glibc 2.17+ x86-64

ont_mappy-2.28-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (717.0 kB view details)

Uploaded CPython 3.12 manylinux: glibc 2.17+ ARM64

ont_mappy-2.28-cp312-cp312-macosx_11_0_arm64.whl (155.9 kB view details)

Uploaded CPython 3.12 macOS 11.0+ ARM64

ont_mappy-2.28-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (784.1 kB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

ont_mappy-2.28-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (721.8 kB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ ARM64

ont_mappy-2.28-cp311-cp311-macosx_11_0_arm64.whl (155.1 kB view details)

Uploaded CPython 3.11 macOS 11.0+ ARM64

ont_mappy-2.28-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (756.8 kB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

ont_mappy-2.28-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (693.3 kB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ ARM64

ont_mappy-2.28-cp310-cp310-macosx_11_0_arm64.whl (155.0 kB view details)

Uploaded CPython 3.10 macOS 11.0+ ARM64

ont_mappy-2.28-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (755.6 kB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

ont_mappy-2.28-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (692.6 kB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ ARM64

ont_mappy-2.28-cp39-cp39-macosx_11_0_arm64.whl (155.2 kB view details)

Uploaded CPython 3.9 macOS 11.0+ ARM64

ont_mappy-2.28-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (756.1 kB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

ont_mappy-2.28-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (692.2 kB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ ARM64

ont_mappy-2.28-cp38-cp38-macosx_11_0_arm64.whl (63.4 kB view details)

Uploaded CPython 3.8 macOS 11.0+ ARM64

File details

Details for the file ont_mappy-2.28.tar.gz.

File metadata

  • Download URL: ont_mappy-2.28.tar.gz
  • Upload date:
  • Size: 136.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.8.10

File hashes

Hashes for ont_mappy-2.28.tar.gz
Algorithm Hash digest
SHA256 bce4dd39761b07c1d7e5c2b8f2fb0566efca2881bd0d23723510ecc0e326a654
MD5 af8c6cdc39e685b8f52d0a7ebe9cb699
BLAKE2b-256 0171ad7d3801ade1a2ff5942e2c5cccab262cdc4f96646903e2175544867245d

See more details on using hashes here.

File details

Details for the file ont_mappy-2.28-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for ont_mappy-2.28-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 e2b92990024d1630eb5862a0abb1fe8eac72a95fba1cbeafeffea290f2f345d8
MD5 2e8d24916b5739a43a1221365ac183d1
BLAKE2b-256 0d3eccaa173f918bd606dce85107ca7b9d165e89d189f2eb4db79bd55acb1b5b

See more details on using hashes here.

File details

Details for the file ont_mappy-2.28-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for ont_mappy-2.28-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 2b2f23ebf6d981f4b4f8630ccfbda37da4ff4faf8b2b5748e82ab35e1f60c5a7
MD5 2f0ea1b5ee569ab44e03451e4d43ae0d
BLAKE2b-256 bc0486ec4d54812d04284f240f26cac1d04047d2187b38969897a5915e62c684

See more details on using hashes here.

File details

Details for the file ont_mappy-2.28-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for ont_mappy-2.28-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 d9a26c7b309feea83955c9f7fd658d8f20970cec2265a2265808740376fd757f
MD5 c1a1468211e68e6420d25ec09c8742d8
BLAKE2b-256 0b419397d5e8527386b1bb0d40783bb25fef245e4d79c99b02d7b99e30033731

See more details on using hashes here.

File details

Details for the file ont_mappy-2.28-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for ont_mappy-2.28-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 68b62e6afa1df78e1e6396ff4995041f086add6b6da131aa50ed81fbaf7ef54d
MD5 1ecfe3c39a63494870e09796cab39a17
BLAKE2b-256 6d223083b5fb044582dec9835fa0a59ff3413894c2c6c198e18a0b01d4202be5

See more details on using hashes here.

File details

Details for the file ont_mappy-2.28-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for ont_mappy-2.28-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 3e16807c2e75dbbb70f402e3d0172d9a7c7f41bb03d8e5e1005882448fc9ddbb
MD5 187fb56811b3962d3a1f64e91ea1542d
BLAKE2b-256 47b68fbf1536753b42d94fb354282d08a6dd44a5970e62ede3a6a0a7b550f9d7

See more details on using hashes here.

File details

Details for the file ont_mappy-2.28-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for ont_mappy-2.28-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 9b859a3b1be5c785c3aa5b88f076a45060fce16d0fe3647c16246c1628fdd698
MD5 b48bdcb2fe7876bb7d38113917f867be
BLAKE2b-256 3bd443d6a8b0c41747a4eb9dbbc9b03eb0c963907f22c21c90679e1b975d9d60

See more details on using hashes here.

File details

Details for the file ont_mappy-2.28-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for ont_mappy-2.28-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 62184ab48256b96f16e82a7aa5562bad22dfc602890ad8f4c94a0c6e03115218
MD5 fabc5a9b0ca6a597cd3d1468915a13b0
BLAKE2b-256 03e114696aecf5f3ccb537a1bfdabe948e8fae7b5334a562b024ec31f617e474

See more details on using hashes here.

File details

Details for the file ont_mappy-2.28-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for ont_mappy-2.28-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 894d8252fe33e9c7f30d460b76de9ea68ee2ad93515e87a37f17a01d169d197f
MD5 1f72f04c0cdcd9c28446181e445bffc7
BLAKE2b-256 320fd85262e5c8d0edfdf9be17f3fb12b7659c458902d44dd21e960f8e23be4d

See more details on using hashes here.

File details

Details for the file ont_mappy-2.28-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for ont_mappy-2.28-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 4a5f3c2a301c323cc4d2c13e7606bcc35d88b28bc0858b81479c79022b971a36
MD5 9b8e1382c7dc331deb1c8736366aa7c5
BLAKE2b-256 3153f4073a3a09d40b41cc6a7cce58cdfe189510742ab784f3b6e634714a7710

See more details on using hashes here.

File details

Details for the file ont_mappy-2.28-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for ont_mappy-2.28-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 67fb3b05284f4feda549535aec9fae8109e355af1f73ec9e233b9cce6a23eea8
MD5 35e8c82f4957d5402b95d0e76a44e53c
BLAKE2b-256 22d89ba1d5546a3aea6998246b1ed0072923cdf2a73f3c32443d6be6013d035c

See more details on using hashes here.

File details

Details for the file ont_mappy-2.28-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for ont_mappy-2.28-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 72f2f8ff5867398fc44abc5b4950a68d2419537cc90e75c590b7f9d80e5483a6
MD5 fa063691ef9a58795c2c4cf64c2e2400
BLAKE2b-256 9ac226ee9dca7e348a63e904eda1bc5ca6b58e1af7ea28cbece7a5fb53f14f65

See more details on using hashes here.

File details

Details for the file ont_mappy-2.28-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for ont_mappy-2.28-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 063a1546dc198f6de45846ef7bbd728777fb66220c833852d8bf800dc8f3d1f4
MD5 9babbaf62f344afe676cdfa00fe2605f
BLAKE2b-256 82b30650787458f5c05c2828fb91e09910c690e113d0f164f68f565e54b85117

See more details on using hashes here.

File details

Details for the file ont_mappy-2.28-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for ont_mappy-2.28-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 3152abf8a6d765325ef47f1e32312dee16dccb5364516d21e4b954becf0207e7
MD5 d80b3f0ca08909b0f6c5a69bf9e7f674
BLAKE2b-256 d8f24bdcf98ce4a72c01d16872aacaa45a471700e896e365803c43c5ddd3fec8

See more details on using hashes here.

File details

Details for the file ont_mappy-2.28-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for ont_mappy-2.28-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 276a315430d1a49bc51d9b129bfdafe36a749275af37b5a05bd0134d5fda2fb4
MD5 8446fa50fc3a68c36cff3f63f3fc40c1
BLAKE2b-256 e7b45c14f41e1d968742beef2e87d528fcb962aa391757d4fc2c416d84e60899

See more details on using hashes here.

File details

Details for the file ont_mappy-2.28-cp38-cp38-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for ont_mappy-2.28-cp38-cp38-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 ca8defc3d7bc8f06e191f6914dee526f3a52bafc1a4321cfb9f16428c183aedb
MD5 bbbebfd942cf48326d482f00cf21e9d5
BLAKE2b-256 a18dd5516d950f015b3cd74cc2966902607622b61cc67b66c97fb04fd96cdaf0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page