A multithreaded python wrapper for rust bindings of minimap2.
Project description
Mappy-rs
A multi-threaded minimap2 aligner for python. Built for readfish compatibility.
Heavily leaning on and inspired by Joeseph Guhlin's minimap2-rs repository. They also have a more heavily featured python client, which also provides multithreaded alignment. This client provides a more simple streaming interface for use in pipelines.
pip install mappy-rs
Developers
Start with some Docs on Py03 - https://pyo3.rs/latest/
If you wish to contribute, have a look at CONTRIBUTING.md
In order to build an importable module:
python -m venv .env
source -m .env/bin/activate
pip install ".[tests]"
To run the tests:
# Python
pytest
# Rust
cargo t --no-default-features
Then in your python shell of choice:
import mappy_rs
aligner = mappy_rs.Aligner("resources/test/test.mmi")
The current iteration of mappy-rs
serves as a drop in for mappy
, implementing all the same methods. However if this is the use case, you may well be better off using mappy
, as the extra level of Rust between your python and C++ may well add slightly slower performance.
Multithreading
In order to use multi threading, one must first enable it.
import mappy_rs
aligner = mappy_rs.Aligner("resources/test/test.mmi")
# Use 10 threads
aligner.enable_threading(10)
Enabling threading makes the map_batch
method available.
This method requires a list or iterable of dictionaries, which can have any number of keys and depth, but must contain the key seq
with a string value in the top-level dictionary.
Currently, the maximum batch size to be iterated in one call is 20000.
For example:
import mappy_rs
aligner = mappy_rs.Aligner("resources/test/test.mmi")
aligner.enable_threading(10)
seqs = [
{"seq": "ACGTAGCATCGAGACTACGA", "Other_random_key": "banter"},
{"seq": "ACGTAGCATCGAGACTACGA", "Other_random_key": "banter"},
]
for (mapping, data) in aligner.map_batch(seqs):
print(list(mapping))
print(data)
Benchmarks
A simple benchmark against classic mappy, and mappy_rs with incrementing numbers of threads, run on a 2018 Macbook.
Device
Property | Value |
---|---|
Model Name | MacBook Pro |
Model Identifier | MacBookPro15,2 |
Processor Name | Quad-Core Intel Core i7 |
Processor Speed | 2.7 GHz |
Number of Processors | 1 |
Total Number of Cores | 4 |
L2 Cache (per Core) | 256 KB |
L3 Cache | 8 MB |
Hyper-Threading Technology | Enabled |
Memory | 16 GB |
Results
Name (time in s) | Min | Max | Mean | StdDev | Median | IQR | Outliers | OPS | Rounds | Iterations |
---|---|---|---|---|---|---|---|---|---|---|
test_benchmark_multi[5] | 26.8900 (1.0) | 30.0969 (1.0) | 28.0622 (1.0) | 1.2614 (1.0) | 27.9017 (1.0) | 1.6081 (1.35) | 1;0 | 0.0356 (1.0) | 5 | 1 |
test_benchmark_multi[4] | 28.5573 (1.06) | 43.4543 (1.44) | 32.3371 (1.15) | 6.2815 (4.98) | 29.7480 (1.07) | 5.2148 (4.37) | 1;1 | 0.0309 (0.87) | 5 | 1 |
test_benchmark_multi[3] | 31.6497 (1.18) | 36.9986 (1.23) | 33.5103 (1.19) | 2.0542 (1.63) | 32.8415 (1.18) | 1.9576 (1.64) | 1;0 | 0.0298 (0.84) | 5 | 1 |
test_benchmark_multi[2] | 43.2616 (1.61) | 86.3859 (2.87) | 53.8572 (1.92) | 18.3339 (14.53) | 45.9328 (1.65) | 14.6382 (12.26) | 1;1 | 0.0186 (0.52) | 5 | 1 |
test_classic_mappy[mappy_al] | 78.5566 (2.92) | 82.8876 (2.75) | 79.6177 (2.84) | 1.8343 (1.45) | 78.8350 (2.83) | 1.1938 (1.0) | 1;1 | 0.0126 (0.35) | 5 | 1 |
test_classic_mappy[mappy_al_rs] | 83.7239 (3.11) | 87.9675 (2.92) | 85.4424 (3.04) | 1.6806 (1.33) | 85.6335 (3.07) | 2.3310 (1.95) | 2;0 | 0.0117 (0.33) | 5 | 1 |
test_benchmark_multi[1] | 84.8418 (3.16) | 94.0907 (3.13) | 86.7404 (3.09) | 4.1096 (3.26) | 84.8749 (3.04) | 2.4310 (2.04) | 1;1 | 0.0115 (0.32) | 5 | 1 |
Changelog
0.0.6
- Lowered backoff time for
map_batch
to 50 milliseconds, with 6 attempts. Each attempt will double the previous back off time. - Improved error handling for
map_batch
, now will raise aRuntimeError
if the backoff time is exceeded. Also prevented loggingInternal error returning data, the receiver iterator has finished. sending on a disconnected channel 2870
to stderr excessively. - Added tests for mapping more than 50000 reads, using
back_off=True
andback_off=False
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for mappy_rs-0.0.7a0-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3317f38202d653421719501b4997960051bf78ced0da60723fa4401d95833e20 |
|
MD5 | 4364bbaabb5dc98e5356bf171f50eb71 |
|
BLAKE2b-256 | 586f385ea04dcef83a0de20ec78716c268ceb3318ba424a64a1d0048e3546562 |
Hashes for mappy_rs-0.0.7a0-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0f439c8ae5cccc1a01567016eff3a3a7a8b51cf2da9b6a8e2414860509707c1f |
|
MD5 | 587d79dae4eed9eb6ec0e1058a748fa6 |
|
BLAKE2b-256 | e60e246342c7e2fee39d6dcab4bccdfe826a92b535fc5c817c6da9dfa797b6c5 |
Hashes for mappy_rs-0.0.7a0-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d1b1a9191bf280b640bb540adcc9574dc66baa089d898f29d5ab117684b8ba99 |
|
MD5 | 1db103a8fd1013eac10c0311c63f37ff |
|
BLAKE2b-256 | 7cf976b095ebeac64383036af181dd308772263b9ebac8582b8725087e6084de |
Hashes for mappy_rs-0.0.7a0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 78b7b17df81c1b821c0c081acb168554f5c54e6f5e0188cfa138ccd493cdb599 |
|
MD5 | d86090747f1886c1a0fdd63e74cefd6c |
|
BLAKE2b-256 | 5c4fdd0b444a0a61ef97b1d59e7ca749684f4229015b83e7986cf25ea29cda39 |
Hashes for mappy_rs-0.0.7a0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | dfd205fea5d6372500bd3761529653b62f9ebad20ff45901f436a6c20abeae90 |
|
MD5 | 16205942e7b8a9bef8a4334ae692d532 |
|
BLAKE2b-256 | 892cb76be126f4888872f04afc32d2be9ff16d927ad87b779bb22a181dea575c |
Hashes for mappy_rs-0.0.7a0-cp311-cp311-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 82a905a829b994e9e6bc6795697ed7dfc4bec03b044bb76ed42ea5fdabfb804f |
|
MD5 | 09de89e471ea3eb236fc8ef380c8fb71 |
|
BLAKE2b-256 | a571031d8180f7fbe6ee5f6fa1fbeaeaf11c97b4fa896868a3fe796a2294dd4c |
Hashes for mappy_rs-0.0.7a0-cp311-cp311-macosx_10_7_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c47687f5334ae08edc62c07f33b9373d94f0c11619b9d775e6a09ef8247c506c |
|
MD5 | 7a12904fe96b3b4b71592ff2a9c7fdd6 |
|
BLAKE2b-256 | 62085a6d76906c8b57f85223ad6c26127be1065662cc453c534eeceb16cdbe71 |
Hashes for mappy_rs-0.0.7a0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e0c66b7be2d259fd64fae25bacf3083b781ba715d3cffa33fefdaaaaca44782d |
|
MD5 | 1c8d461b264cc7ec796f03693211fade |
|
BLAKE2b-256 | 0b7c57a0556d88920c246b0496257245eb07612e15e4ddb8370cd4baf7b5c35b |
Hashes for mappy_rs-0.0.7a0-cp310-cp310-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7778a06dcb6dc4557e3ca507125ddc7137591832eb02ae99737ad1c4071cca14 |
|
MD5 | 6c091bf2dc7c73811a680a0b5b64c58a |
|
BLAKE2b-256 | ae9a92e3127b08bbd9edf3bc1f9f55e121b036b5d4dede6ec95ce8aa677151c0 |
Hashes for mappy_rs-0.0.7a0-cp310-cp310-macosx_10_7_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 86728a334a2402b67ef23e36331d6444872078ecc0a6da65d039f72e6aa52b3e |
|
MD5 | 30c2844578394cbaea8a076abb06a8b9 |
|
BLAKE2b-256 | b95773f985600f42da3c295ea6cb3972b4cc65d0d6aed108b4d6c57a58f96925 |
Hashes for mappy_rs-0.0.7a0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 07d4987b26a5ff5bccbb72519bc86f0327d96485201687c297cd9945b1574b53 |
|
MD5 | 6fca3f330356b79fb04af22a44538c32 |
|
BLAKE2b-256 | 99aa77bd2c96d1bfd624725b016a3275686d345c2e75e032e3e7143a710b57d4 |
Hashes for mappy_rs-0.0.7a0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 788bf7f1dfcebf7fce34ce0589eab7f06a196933e2fb68b03750e5bbc4e9a591 |
|
MD5 | d901f2581549df8f85f17999594226d5 |
|
BLAKE2b-256 | 786402a0dc8bd38b00683758ab5c2f83a7532b48044d96508a61748670c3cc94 |