PyO3 bindings and Python interface to lightmotif, a library for platform-accelerated biological motif scanning using position weight matrices.
Project description
🎼🧬 lightmotif
A lightweight platform-accelerated library for biological motif scanning using position weight matrices.
🗺️ Overview
Motif scanning with position weight matrices (also known as position-specific scoring matrices) is a robust method for identifying motifs of fixed length inside a biological sequence. They can be used to identify transcription factor binding sites in DNA, or protease cleavage site in polypeptides. Position weight matrices are often viewed as sequence logos:
The lightmotif
library provides a Python module to run very efficient
searches for a motif encoded in a position weight matrix. The position
scanning combines several techniques to allow high-throughput processing
of sequences:
- Compile-time definition of alphabets and matrix dimensions.
- Sequence symbol encoding for fast table look-ups, as implemented in HMMER[1] or MEME[2]
- Striped sequence matrices to process several positions in parallel, inspired by Michael Farrar[3].
- Vectorized matrix row look-up using
permute
instructions of AVX2.
🔧 Installing
lightmotif
can be installed directly from PyPI,
which hosts some pre-built wheels for most mainstream platforms, as well as the
code required to compile from source with Rust:
$ pip install lightmotif
In the event you have to compile the package from source, all the required Rust libraries are vendored in the source distribution, and a Rust compiler will be setup automatically if there is none on the host machine.
💡 Example
The motif interface should be mostly compatible with the
Bio.motifs
module from Biopython. The notable difference is that
the calculate
method of PSSM objects expects a striped sequence instead.
import lightmotif
# Create a count matrix from an iterable of sequences
motif = lightmotif.create(["GTTGACCTTATCAAC", "GTTGATCCAGTCAAC"])
# Create a PSSM with 0.1 pseudocounts and uniform background frequencies
pwm = motif.counts.normalize(0.1)
pssm = pwm.log_odds()
# Encode the target sequence into a striped matrix
seq = "ATGTCCCAACAACGATACCCCGAGCCCATCGCCGTCATCGGCTCGGCATGCAGATTCCCAGGCG"
encoded = lightmotif.EncodedSequence(seq)
striped = encoded.stripe()
# Compute scores using the fastest backend implementation for the host machine
scores = pssm.calculate(sseq)
⏱️ Benchmarks
Benchmarks use the MX000001
motif from PRODORIC[4], and the
complete genome of an
Escherichia coli K12 strain.
Benchmarks were run on a i7-10710U CPU running @1.10GHz, compiled with --target-cpu=native
.
lightmotif (avx2): 26,528,740 ns/iter (+/- 14,817,953) = 166.9 MiB/s
lightmotif (generic): 654,599,309 ns/iter (+/- 81,292,868) = 6.8 MiB/s
Bio.motifs: 526,309,061 ns/iter (+/- 45,603,991) = 8.4 MiB/s
💭 Feedback
⚠️ Issue Tracker
Found a bug ? Have an enhancement request ? Head over to the GitHub issue tracker if you need to report or ask something. If you are filing in on a bug, please include as much information as you can about the issue, and try to recreate the same bug in a simple, easily reproducible situation.
📋 Changelog
This project adheres to Semantic Versioning and provides a changelog in the Keep a Changelog format.
⚖️ License
This library is provided under the open-source MIT license.
This project was developed by Martin Larralde during his PhD project at the European Molecular Biology Laboratory in the Zeller team.
📚 References
- [1] Eddy, Sean R. ‘Accelerated Profile HMM Searches’. PLOS Computational Biology 7, no. 10 (20 October 2011): e1002195. doi:10.1371/journal.pcbi.1002195.
- [2] Grant, Charles E., Timothy L. Bailey, and William Stafford Noble. ‘FIMO: Scanning for Occurrences of a given Motif’. Bioinformatics 27, no. 7 (1 April 2011): 1017–18. doi:10.1093/bioinformatics/btr064.
- [3] Farrar, Michael. ‘Striped Smith–Waterman Speeds Database Searches Six Times over Other SIMD Implementations’. Bioinformatics 23, no. 2 (15 January 2007): 156–61. doi:10.1093/bioinformatics/btl582.
- [4] Dudek, Christian-Alexander, and Dieter Jahn. ‘PRODORIC: State-of-the-Art Database of Prokaryotic Gene Regulation’. Nucleic Acids Research 50, no. D1 (7 January 2022): D295–302. doi:10.1093/nar/gkab1110.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for lightmotif-0.1.0-pp39-pypy39_pp73-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6a88d30967857679b26b1b6ee5a25df12862e9fbfd63a3d2591d9d5a91f13a06 |
|
MD5 | 2f2af8bdd74814ca5bfefbaeb808f432 |
|
BLAKE2b-256 | 6639473668be9b63f91ae5a5dd369ca6e93f443b5d7457b3707d9e7d0540bf86 |
Hashes for lightmotif-0.1.0-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a1b56160681b849bf7383ff8614fe63faf2163ec7795b971edf2c7c79e90f2e2 |
|
MD5 | 863ed782a90df54db87e7d653b1a59ec |
|
BLAKE2b-256 | f31ddd114403c2e2366079c509dfe72ada890b65bbb3182f67e26738e380d3dc |
Hashes for lightmotif-0.1.0-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c0657dcc4119884e734fcc72bd9cf2c4eb19b323c3dd1930691f797ba11ba7b6 |
|
MD5 | f9de7ad3c35c6cd43184bb507abb2ee7 |
|
BLAKE2b-256 | 2c31be0d7f26e6c4583087de1a22d0baaf5b6c89b3932fccd9cc8a3fe212f95b |
Hashes for lightmotif-0.1.0-pp39-pypy39_pp73-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 44caa23c7e00af713fd1749af79d1ec839e9d7bc05fe0083856df7de2565dd04 |
|
MD5 | 4a54e37d3c0a90f036052036a0c4ff76 |
|
BLAKE2b-256 | 644a06bc20945cd96bca1750c0f482aac3093925860ea4d6d965323a1066a0d5 |
Hashes for lightmotif-0.1.0-pp38-pypy38_pp73-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 002001bbfe89166efe593e94ca813dfa5cea4c00d86f448a4c058a007153c94d |
|
MD5 | d2aa8c84f340db60ca188041bf652358 |
|
BLAKE2b-256 | 1cfe53d1165550de37ab7b951393d320a7fcc96627f9f7dfa76caa7bc3543fa5 |
Hashes for lightmotif-0.1.0-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 44746b6fc441cf660775f8cee38869bd4a869fa7b8887a6c7972b53fe74e5da7 |
|
MD5 | 66bc191a5beaa62fe254f8472ea83616 |
|
BLAKE2b-256 | c3bd8bf31146f1ae5a985df24690082ae1496dab66dd6b266ababb7b693bca3c |
Hashes for lightmotif-0.1.0-pp38-pypy38_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | becf1e737d80942f108c357554282d64bf1f88e10e8bed1aeb0afe03f4cdf87d |
|
MD5 | 44ef02e868bf5fce0a981df1475d51b5 |
|
BLAKE2b-256 | 640487e0136f2fd9d7f93ff992e244e6d4f1a0a7765f642095dba7a315a4cefb |
Hashes for lightmotif-0.1.0-pp38-pypy38_pp73-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2b2f7e80fd95e617cffbc6e99835b58b1ebf2a8a019694a61570abe323a2fd18 |
|
MD5 | 41ed4f61c0dd61271bafcc8f6d198ed9 |
|
BLAKE2b-256 | 809de142a6b253255877ff82b6c1519d70b8387210da9458e3cdc234e93a84af |
Hashes for lightmotif-0.1.0-pp37-pypy37_pp73-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3b237d58925d6eee5b0b13cc7890d7717c8c290122a666c5a1fbb47ac8d4da2d |
|
MD5 | ab3597525988be7e864918a359e67caf |
|
BLAKE2b-256 | 1fd412970a896934a9224502e92ed2ddbfabf23e945f2ef50ddfe163001c2ef1 |
Hashes for lightmotif-0.1.0-pp37-pypy37_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0d4528957687e4d6f6846c21a87ff3faefa92d9ce19fe6acddab84abc1599723 |
|
MD5 | 71233efe35bc15903e3bdf6944ec1e4f |
|
BLAKE2b-256 | a0d46f3defbee011cb2dc4331a7b0e2e7677c896aa6a425ac49b180eddf27ba4 |
Hashes for lightmotif-0.1.0-pp37-pypy37_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d2cebc4690b62cf5279a6c83558f62590a2389602e3a06dca95a978cab7952e6 |
|
MD5 | 6bc960b169696580fc243e3142cea706 |
|
BLAKE2b-256 | 744e926eb7fdebf1c6710e96bdb27d04b4feb6c412450ea42e79c4ec466eec4c |
Hashes for lightmotif-0.1.0-pp37-pypy37_pp73-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6cbac069d69e6790db088cc297611ea7d3cebfac01105fc6242b8f62d3949b39 |
|
MD5 | d6ab8c4ecb0008158dbe0cb86da11939 |
|
BLAKE2b-256 | 4f5a9c920f80e8f9e116c85e55b2b3e2b068fa532357373fdcc5aa1da7d28450 |
Hashes for lightmotif-0.1.0-cp311-cp311-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ddc315575ec123e1e98b9fd4c80119e01b8d000034a6231d63aedc5a4e9cbe6a |
|
MD5 | 728e22803c0fb867b0efcfa5197f4e8e |
|
BLAKE2b-256 | 2ee7e6f66ce2670c19c4161cf6e0b1abd0c1d2469cce26daa60b0b1e3b1834c1 |
Hashes for lightmotif-0.1.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | da1c2532642557b3203013ad670de35697436fc84bb1fe2a980e52b05da515ea |
|
MD5 | 7925ad1cf877f75706a6967f5621a27a |
|
BLAKE2b-256 | 9303b1e6dfc17bcc7d2abda12a949b6d3393ae03384d8e21a695372781d8626e |
Hashes for lightmotif-0.1.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a67018336c286442aed2507ca9dbd418f893b212c516635eb7dda055952f68ee |
|
MD5 | ae8f105a921da90526bb4be36ae9a820 |
|
BLAKE2b-256 | 60a55458df16c5828217fafb628162790ccd4ef16c7636cf6f7757c0b6b0d9de |
Hashes for lightmotif-0.1.0-cp311-cp311-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 60367f9c623b7e70e150b3211ce73529bb9c7010e2cefb75ec3033168b4bc991 |
|
MD5 | 2b27b52eb656489db46edbb8445c4955 |
|
BLAKE2b-256 | 3dc65f6295c56aa7f3fd5a5a8b0804ab502f1ad43c5039c86c5bc0f267136269 |
Hashes for lightmotif-0.1.0-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d3fb494aa5fe9ec4541ebab1de0ce2071b47472e558d602a917f8f17d9f20425 |
|
MD5 | 4a76cc8e568198911571da7aa0174800 |
|
BLAKE2b-256 | 7bc6ffe5dced44c1a55f4a4e5a8fa5e51d5a7a524bae36496d372e295e8bbac6 |
Hashes for lightmotif-0.1.0-cp310-cp310-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 45c8ddf31b5d54bdf4838206dc0d303e6218d6570666536d6b1f29de6adb6a7a |
|
MD5 | ce93ebed3b68a0a279296e9fdcf0881f |
|
BLAKE2b-256 | 29841acf6112400ab15a9f29bd7a29084fec78c411ee49a42e243f160ee68f0c |
Hashes for lightmotif-0.1.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4c21cb5439654be6234dab29a20e1256a2e1fcc20754a189ae294aa057cea072 |
|
MD5 | bcf3a294857471cca788724d4c7808f1 |
|
BLAKE2b-256 | b3c03f8813bce1662a0e5790a62866d8305f6d879bb0b67dfb212699c32a2c4a |
Hashes for lightmotif-0.1.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 43ad13cbc45ea9f0838399219c84a76027fd90d97d678cad945b6071e0cfd8e1 |
|
MD5 | 32fc895f46faaeaf9dbe9549d62ce748 |
|
BLAKE2b-256 | 987b486de72182fa185a69b961f4b5a8e873ea1344c7a77bdebaddc532b6739f |
Hashes for lightmotif-0.1.0-cp310-cp310-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b32c6b773243efbdf84abd95ba003af47193b2a3238952bfc7e7135e8486636c |
|
MD5 | decbf44b46b02fac033ad962dd5c1767 |
|
BLAKE2b-256 | 0de11f1489c8b8f7f1781dd9d41a8426456db8e5b1fb8a950d2322ff3876d699 |
Hashes for lightmotif-0.1.0-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c03580d2d4da2eb9f5ee733df6b39840663e8e3b793bf034c7ddd005fb373192 |
|
MD5 | ea1023301c024ceaef79c47e3135678c |
|
BLAKE2b-256 | dac3dd652eb9a00b5cf34373a7a7d7a59f5ef079ddebcbdd67fa4d0c7557ac2c |
Hashes for lightmotif-0.1.0-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ab59b68fde1018c3205ecc7bfc8d2b63d300d920711be087d562e953d7d0a729 |
|
MD5 | 77a036eab862c1cd69cf5d0d58d8a5a3 |
|
BLAKE2b-256 | ec0e36ef58ec08403a8b4020578a845db25fae9c533dd5e24dc600ddd25b7241 |
Hashes for lightmotif-0.1.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9eccfd351ced877a3de5dabb98ff7d8d8c9df7b8158fcb4f3d3117e7a663ac84 |
|
MD5 | f6ee04026930c603e50a693fd248e674 |
|
BLAKE2b-256 | 5938de5ac981a9b41dee16454c8cec54f89895a0172fd74d465bed90d64c96f6 |
Hashes for lightmotif-0.1.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e028945a3530f727a7cd52c2374d98759257497930a170f886d1a9e5251884bb |
|
MD5 | a29a4610626e75451d8e79f79ac5aa7e |
|
BLAKE2b-256 | 7594cdaa093ff9fb24bf1d3e5f60621351a270c1d042d11d1af6123811628076 |
Hashes for lightmotif-0.1.0-cp39-cp39-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 378276e43cff3c48b8838a71bd7d3b20e7b0c815650bfa4d5ee1c72114d89cb0 |
|
MD5 | 242590737cc2b82e795ba566bc40963f |
|
BLAKE2b-256 | 1889987fe13810d36e010f63caf0d8c4bc2418fa4833bbcd498206c389f9005c |
Hashes for lightmotif-0.1.0-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 45689301f6093c8feac51b74e5d069dc33996979836bb3f28bfd6506acfad31a |
|
MD5 | 63498ff39e1a534fd105902f3cf1427f |
|
BLAKE2b-256 | f1747ec0d3fa2a6ff2e36bb41542a7c01e5cd8b2facb68e9d15e657eddd9d836 |
Hashes for lightmotif-0.1.0-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 02efc8a765336eed49cbf4be83f62fda0dd9a337597b8760128aadd4aeec6054 |
|
MD5 | dd6deeb5b3e8d42c1a419b6bbc1a1b5d |
|
BLAKE2b-256 | 0e785f96c8ed17eb1c6012e6e2a5f66a4ca5a4a768976364633cadb8d5f7c604 |
Hashes for lightmotif-0.1.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b9c5d0bbbae74ee2f402db5c0f3235758cdf41742c3c5c9be6843d7ee6367310 |
|
MD5 | 3d4317ae87039cf7a7fc7dc9e8983ad7 |
|
BLAKE2b-256 | ce43b09c46d4f0526886d2172b3e55abd4664f680b0b13ca2f1018b1dd36945b |
Hashes for lightmotif-0.1.0-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 75e0e1c87fe338b1ef160aff5562c96ae0a3bc14ba52754f003065bd7d5442ed |
|
MD5 | ca8f59fa750dc57a918829b0b109e9d6 |
|
BLAKE2b-256 | 80e90528c78141e61c0c239fd8d8f949300fc9514be54159ddc983f5477139ec |
Hashes for lightmotif-0.1.0-cp38-cp38-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | db832fd79ce9b3ed91ec99c63bf6ee9268e5e16a28d26eb8354a27adb535b7a4 |
|
MD5 | 5f8d57bb5b72a9b5bc3ba922878db8a2 |
|
BLAKE2b-256 | ef597a5dd0671e472ab4116844da140611a05dcad14d8efe7c16f15f8eab5970 |
Hashes for lightmotif-0.1.0-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ebc1a295364028243be1c39e75455ec4eeffc32809bb9c3b7eb9b3cfd1e285f7 |
|
MD5 | fefc607c9872a14707d60ce02a0c224c |
|
BLAKE2b-256 | 130cf50bb0c1f68296166f40832e4be76e6711fec5783ba094f4352410b72993 |
Hashes for lightmotif-0.1.0-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8ea4c55b40c0f009adfa2ea0b049cc2f7b3b6c9dcdbf0cf4f896ed502cf304b2 |
|
MD5 | e7a75d144f397b04086c9201242086ab |
|
BLAKE2b-256 | 061549435c87189e02d46203c76eaac8395c7d0ba2140f0801a4d254c9d65a7d |
Hashes for lightmotif-0.1.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 21faf38f74cf7bed20ee79cc57edf73c0b2edf04cd6a3a92593b2797cc01648b |
|
MD5 | 050152762f19f46fa7c69d8bfde749d0 |
|
BLAKE2b-256 | c0c0a89f4e678cd34e010ed392c6b5df10a4b8ce859f0c977acd462f40a7e2ad |
Hashes for lightmotif-0.1.0-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 526438b3fc54c1b22738e849c6197a2469e47c741143a2e5d35ec3f9cd25efd0 |
|
MD5 | cf756cc5790440aa479c607e786b6bbc |
|
BLAKE2b-256 | 8947c346c35811fb167a83597fd88c075703c4805c370d108f5c0d296bfde7af |
Hashes for lightmotif-0.1.0-cp37-cp37m-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | eb0e29e885080a762b18d13db8b6ff43713630724acce998008a1ee8535ab9df |
|
MD5 | c7855daa63e1ef2ff216cf0dfe08d384 |
|
BLAKE2b-256 | 9af8d2903d37e162f2b19dda2513ab2a36c98d715cd98d431ad8ed2e583fa6bb |