A multithreaded python wrapper for rust bindings of minimap2.
Project description
Mappy-rs
A multi-threaded minimap2 aligner for python. Built for readfish compatibility.
Heavily leaning on and inspired by Joeseph Guhlin's minimap2-rs repository. They also have a more heavily featured python client, however this one simply multi threads and maps.
pip install mappy-rs
Developers
Start with some Docs on Py03 - https://pyo3.rs/latest/
If you wish to contribute, have a look at CONTRIBUTING.md
In order to build an importable module:
python -m venv .env
source -m .env/bin/activate
pip install ".[tests]"
To run the tests:
# Python
pytest
# Rust
cargo t --no-default-features
Then in your python shell of choice:
import mappy_rs
aligner = mappy_rs.Aligner("resources/test/test.mmi")
The current iteration of mappy-rs
serves as a drop in for mappy
, implementing all the same methods. However if this is the use case, you may well be better off using mappy
, as the extra level of Rust betwene your python and C++ may well add slighly slower performance.
Multithreading
In order to use multi threading, one must first enable it.
import mappy_rs
aligner = mappy_rs.Aligner("resources/test/test.mmi")
# Use 10 threads
aligner.enable_threading(10)
Enabling threading makes the map_batch
method available.
This method requires a list or iterable of dictionaries, which can have any number of keys and depth, but must contain the key seq
with a string value in the top-level dictionary.
Currently, the maximum batch size tobe iterated in one call is 20000.
For example:
import mappy_rs
aligner = mappy_rs.Aligner("resources/test/test.mmi")
aligner.enable_threading(10)
seqs = [
{"seq": "ACGTAGCATCGAGACTACGA", "Other_random_key": "banter"},
{"seq": "ACGTAGCATCGAGACTACGA", "Other_random_key": "banter"},
]
for (mapping, data) in aligner.map_batch(seqs):
print(list(mapping))
print(data)
Benchmarks
A simple benchmark against classic mappy, and mappy_rs with incrementing numbers of threads, run on a 2018 Macbook.
Device
Property | Value |
---|---|
Model Name | MacBook Pro |
Model Identifier | MacBookPro15,2 |
Processor Name | Quad-Core Intel Core i7 |
Processor Speed | 2.7 GHz |
Number of Processors | 1 |
Total Number of Cores | 4 |
L2 Cache (per Core) | 256 KB |
L3 Cache | 8 MB |
Hyper-Threading Technology | Enabled |
Memory | 16 GB |
Results
Name (time in s) | Min | Max | Mean | StdDev | Median | IQR | Outliers | OPS | Rounds | Iterations |
---|---|---|---|---|---|---|---|---|---|---|
test_benchmark_multi[5] | 26.8900 (1.0) | 30.0969 (1.0) | 28.0622 (1.0) | 1.2614 (1.0) | 27.9017 (1.0) | 1.6081 (1.35) | 1;0 | 0.0356 (1.0) | 5 | 1 |
test_benchmark_multi[4] | 28.5573 (1.06) | 43.4543 (1.44) | 32.3371 (1.15) | 6.2815 (4.98) | 29.7480 (1.07) | 5.2148 (4.37) | 1;1 | 0.0309 (0.87) | 5 | 1 |
test_benchmark_multi[3] | 31.6497 (1.18) | 36.9986 (1.23) | 33.5103 (1.19) | 2.0542 (1.63) | 32.8415 (1.18) | 1.9576 (1.64) | 1;0 | 0.0298 (0.84) | 5 | 1 |
test_benchmark_multi[2] | 43.2616 (1.61) | 86.3859 (2.87) | 53.8572 (1.92) | 18.3339 (14.53) | 45.9328 (1.65) | 14.6382 (12.26) | 1;1 | 0.0186 (0.52) | 5 | 1 |
test_classic_mappy[mappy_al] | 78.5566 (2.92) | 82.8876 (2.75) | 79.6177 (2.84) | 1.8343 (1.45) | 78.8350 (2.83) | 1.1938 (1.0) | 1;1 | 0.0126 (0.35) | 5 | 1 |
test_classic_mappy[mappy_al_rs] | 83.7239 (3.11) | 87.9675 (2.92) | 85.4424 (3.04) | 1.6806 (1.33) | 85.6335 (3.07) | 2.3310 (1.95) | 2;0 | 0.0117 (0.33) | 5 | 1 |
test_benchmark_multi[1] | 84.8418 (3.16) | 94.0907 (3.13) | 86.7404 (3.09) | 4.1096 (3.26) | 84.8749 (3.04) | 2.4310 (2.04) | 1;1 | 0.0115 (0.32) | 5 | 1 |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for mappy_rs-0.0.2-pp39-pypy39_pp73-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2bd676b8921c7eb41fce20bccb4f0ebd0c2462b6a1913b6473e8a01caac55ce8 |
|
MD5 | 8f828ba583518002fbdd7bbef16ff2a0 |
|
BLAKE2b-256 | 95f1069e8b7ae15187a922b7b8fb7ba43aab52612e49d12be746a7f7b4355232 |
Hashes for mappy_rs-0.0.2-pp38-pypy38_pp73-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2ad76e1c5fc12024892d431cb20127a8aba3a195dd30039627cda64bbabfa5fb |
|
MD5 | 031d3e49719e91a2e6d1980b5c070e1d |
|
BLAKE2b-256 | c1799c1757e7760010129bc7265e0afe715b512aaaf7caf21a2e8e3a2a64ff87 |
Hashes for mappy_rs-0.0.2-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 168b2555f6cf0ac30366e6a4d6b978676c055eece6c0dc33d910701dd5cc4bd9 |
|
MD5 | 72857cacfa68bf99c4a33685dd536b54 |
|
BLAKE2b-256 | 8300d7e9798d3438d40329b2e526d19be96f8a89a9e36d85d232f628ecbcbae4 |
Hashes for mappy_rs-0.0.2-cp310-cp310-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7a417002be82faa42134ca2f11aaca6102e85012892e304dd9e2820805d0b756 |
|
MD5 | 28643eb068c50ee73975e4a72fae8485 |
|
BLAKE2b-256 | 576c861543d42cfcc4c0e0652acb2b7a1c198a684eaf70358b96a4bb598af12b |
Hashes for mappy_rs-0.0.2-cp39-cp39-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 868c0ae94a62eac2325921ea6ad013c841f5ce0f593dff6f3159013cc41b1da2 |
|
MD5 | e2bfb86565c5fec03b18d7a63a093091 |
|
BLAKE2b-256 | 5b38fc7d69bd16b10a096f27cdd81406af9c3f33eff945d42b3f50493474f039 |
Hashes for mappy_rs-0.0.2-cp38-cp38-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0f116a8b220b065fa4f9cc012a25e34201f378c1030bf01c58282282c2a93622 |
|
MD5 | c4be5ea9af3dd1789833f4284ffaefed |
|
BLAKE2b-256 | c6163516f24f2a4278d424a0b5f289af50b920a944100fdc95c6c91363944456 |