Minimap2 python binding
Project description
Mappy provides a convenient interface to minimap2, a fast and accurate C program to align genomic and transcribe nucleotide sequences.
Installation
Mappy depends on zlib. It can be installed with pip:
pip install --user mappy
or from the minimap2 github repo (Cython required):
git clone https://github.com/lh3/minimap2
cd minimap2
python setup.py install
Usage
The following Python script demonstrates the key functionality of mappy:
import mappy as mp
a = mp.Aligner("test/MT-human.fa") # load or build index
if not a: raise Exception("ERROR: failed to load/build index")
for name, seq, qual in mp.fastx_read("test/MT-orang.fa"): # read a fasta/q sequence
for hit in a.map(seq): # traverse alignments
print("{}\t{}\t{}\t{}".format(hit.ctg, hit.r_st, hit.r_en, hit.cigar_str))
APIs
Mappy implements two classes and one global function.
Class mappy.Aligner
mappy.Aligner(fn_idx_in, preset=None, ...)
This constructor accepts the following arguments:
fn_idx_in: index or sequence file name. Minimap2 automatically tests the file type. If a sequence file is provided, minimap2 builds an index. The sequence file can be optionally gzip’d.
preset: minimap2 preset. Currently, minimap2 supports the following presets: sr for single-end short reads; map-pb for PacBio read-to-reference mapping; map-ont for Oxford Nanopore read mapping; splice for long-read spliced alignment; asm5 for assembly-to-assembly alignment; asm10 for full genome alignment of closely related species. Note that the Python module does not support all-vs-all read overlapping.
k: k-mer length, no larger than 28
w: minimizer window size, no larger than 255
min_cnt: mininum number of minimizers on a chain
min_chain_score: minimum chaing score
bw: chaining and alignment band width
best_n: max number of alignments to return
n_threads: number of indexing threads; 3 by default
fn_idx_out: name of file to which the index is written
mappy.Aligner.map(seq)
This method aligns seq against the index. It is a generator, yielding
a series of mappy.Alignment objects.
Class mappy.Alignment
This class describes an alignment. An object of this class has the following properties:
ctg: name of the reference sequence the query is mapped to
ctg_len: total length of the reference sequence
r_st and r_en: start and end positions on the reference
q_st and q_en: start and end positions on the query
strand: +1 if on the forward strand; -1 if on the reverse strand
mapq: mapping quality
NM: number of mismatches and gaps in the alignment
blen: length of the alignment, including both alignment matches and gaps
trans_strand: transcript strand. +1 if on the forward strand; -1 if on the reverse strand; 0 if unknown
is_primary: if the alignment is primary (typically the best and the first to generate)
cigar_str: CIGAR string
cigar: CIGAR returned as an array of shape
(n_cigar,2). The two numbers give the length and the operator of each CIGAR operation.
An Alignment object can be converted to a string with str() in
the following format:
q_st q_en strand ctg ctg_len r_st r_en blen-NM blen mapq cg:Z:cigar_str
It is effectively the PAF format without the QueryName and QueryLength columns (the first two columns in PAF).
Function mappy.fastx_read
mappy.fastx_read(fn)
This generator function opens a FASTA/FASTQ file and yields a
(name,seq,qual) tuple for each sequence entry. The input file may be
optionally gzip’d.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file mappy-2.2.tar.gz.
File metadata
- Download URL: mappy-2.2.tar.gz
- Upload date:
- Size: 123.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3e997012313e32317719b60ee3092e64d8e29d132f21ac9821f187523e409f1d
|
|
| MD5 |
dfc2aefe98376124beb81ce7dcefeccb
|
|
| BLAKE2b-256 |
588bfb2bd61a688b23eceb48b3706ec858a2c903a8cfca3ba9d1a85fd86481f6
|