PyO3 bindings and Python interface to lightmotif, a library for platform-accelerated biological motif scanning using position weight matrices.
Project description
🎼🧬 lightmotif
A lightweight platform-accelerated library for biological motif scanning using position weight matrices.
🗺️ Overview
Motif scanning with position weight matrices (also known as position-specific scoring matrices) is a robust method for identifying motifs of fixed length inside a biological sequence. They can be used to identify transcription factor binding sites in DNA, or protease cleavage site in polypeptides. Position weight matrices are often viewed as sequence logos:
The lightmotif
library provides a Python module to run very efficient
searches for a motif encoded in a position weight matrix. The position
scanning combines several techniques to allow high-throughput processing
of sequences:
- Compile-time definition of alphabets and matrix dimensions.
- Sequence symbol encoding for fast table look-ups, as implemented in HMMER[1] or MEME[2]
- Striped sequence matrices to process several positions in parallel, inspired by Michael Farrar[3].
- Vectorized matrix row look-up using
permute
instructions of AVX2.
This is the Python version, there is a Rust crate available as well.
🔧 Installing
lightmotif
can be installed directly from PyPI,
which hosts some pre-built wheels for most mainstream platforms, as well as the
code required to compile from source with Rust:
$ pip install lightmotif
In the event you have to compile the package from source, all the required Rust libraries are vendored in the source distribution, and a Rust compiler will be setup automatically if there is none on the host machine.
💡 Example
The motif interface should be mostly compatible with the
Bio.motifs
module from Biopython. The notable difference is that
the calculate
method of PSSM objects expects a striped sequence instead.
import lightmotif
# Create a count matrix from an iterable of sequences
motif = lightmotif.create(["GTTGACCTTATCAAC", "GTTGATCCAGTCAAC"])
# Create a PSSM with 0.1 pseudocounts and uniform background frequencies
pwm = motif.counts.normalize(0.1)
pssm = pwm.log_odds()
# Encode the target sequence into a striped matrix
seq = "ATGTCCCAACAACGATACCCCGAGCCCATCGCCGTCATCGGCTCGGCATGCAGATTCCCAGGCG"
encoded = lightmotif.EncodedSequence(seq)
striped = encoded.stripe()
# Compute scores using the fastest backend implementation for the host machine
scores = pssm.calculate(sseq)
⏱️ Benchmarks
Benchmarks use the MX000001
motif from PRODORIC[4], and the
complete genome of an
Escherichia coli K12 strain.
Benchmarks were run on a i7-10710U CPU running @1.10GHz, compiled with --target-cpu=native
.
lightmotif (avx2): 9,125,495 ns/iter (+/- 6,392,241) = 485.1 MiB/s
Bio.motifs: 284,696,651 ns/iter (+/- 6,454,945) = 15.5 MiB/s
💭 Feedback
⚠️ Issue Tracker
Found a bug ? Have an enhancement request ? Head over to the GitHub issue tracker if you need to report or ask something. If you are filing in on a bug, please include as much information as you can about the issue, and try to recreate the same bug in a simple, easily reproducible situation.
📋 Changelog
This project adheres to Semantic Versioning and provides a changelog in the Keep a Changelog format.
⚖️ License
This library is provided under the open-source MIT license.
This project was developed by Martin Larralde during his PhD project at the European Molecular Biology Laboratory in the Zeller team.
📚 References
- [1] Eddy, Sean R. ‘Accelerated Profile HMM Searches’. PLOS Computational Biology 7, no. 10 (20 October 2011): e1002195. doi:10.1371/journal.pcbi.1002195.
- [2] Grant, Charles E., Timothy L. Bailey, and William Stafford Noble. ‘FIMO: Scanning for Occurrences of a given Motif’. Bioinformatics 27, no. 7 (1 April 2011): 1017–18. doi:10.1093/bioinformatics/btr064.
- [3] Farrar, Michael. ‘Striped Smith–Waterman Speeds Database Searches Six Times over Other SIMD Implementations’. Bioinformatics 23, no. 2 (15 January 2007): 156–61. doi:10.1093/bioinformatics/btl582.
- [4] Dudek, Christian-Alexander, and Dieter Jahn. ‘PRODORIC: State-of-the-Art Database of Prokaryotic Gene Regulation’. Nucleic Acids Research 50, no. D1 (7 January 2022): D295–302. doi:10.1093/nar/gkab1110.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for lightmotif-0.2.0-pp39-pypy39_pp73-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 38be27b10a2feaa0dec762de982ec86928dedc4817fb061875fd47de987d3ae0 |
|
MD5 | 34e256188c5eca530cfabc04aff5004d |
|
BLAKE2b-256 | 9cd5d90b569fd13bda0a56452f0d2822dbee4dc51f6232bdb98026337501514b |
Hashes for lightmotif-0.2.0-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 39d7d4ee849507911bda11f4bb8a8dc7b4842118bbf95c5fc6b58451bff9e50e |
|
MD5 | 00eb65565d1e74ad7c666ff3b18f1a2f |
|
BLAKE2b-256 | b62f91d88785ca568e3c810c8f319379b8fc6c759e573d1a00e7970fae923030 |
Hashes for lightmotif-0.2.0-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 54a8a9a3835232deb1ce4c97eb40502db345282d83e8cb08ee55c17ed01a4785 |
|
MD5 | e89878a66df335e4548ca0723be1ec30 |
|
BLAKE2b-256 | 21caacd14bd25b1afdb5dca0d6f44079b243e7c29822dca1cba2574f5ab5ba8c |
Hashes for lightmotif-0.2.0-pp39-pypy39_pp73-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1bf53f8f3bf00bc04607ec17ebd509e6a397fd4484cba7378eccb0b48753ecd8 |
|
MD5 | 9e09b012faa80d686edc42db1d4b32a8 |
|
BLAKE2b-256 | 825e3b61158b7507760b146e0b58abd1c27d978ed71d41bc01a8a601308c638c |
Hashes for lightmotif-0.2.0-pp38-pypy38_pp73-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e3b114dba19091fb46b02f42b1b00514284464d5033705ce8239e17c74724ea4 |
|
MD5 | 7b2c0ffcccf277bacff08372cce36ce9 |
|
BLAKE2b-256 | 2e10e8c2d940d1156789b3250ddcadc4e46eb9aff8350af5f0b3a1c083e33014 |
Hashes for lightmotif-0.2.0-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | aae4fbbe18df6b555614e77af7e61321f5b767cf9ea1f400ebeb1e906a52a758 |
|
MD5 | 5671191b5a3023cd684d220a348b97be |
|
BLAKE2b-256 | 56fc183352466ea6289bf3de92c736058c8dc14eefa1df1836ac0fc63b58bd35 |
Hashes for lightmotif-0.2.0-pp38-pypy38_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e02e1b9a7f97219ab52eed47acd86c52386c629e71311676892648ae512ba0a9 |
|
MD5 | 3027fc0a33652c213da0a831171c6f75 |
|
BLAKE2b-256 | 302006ca8bcfca36923c4e63daa721b79cdc40c4fd6e6ba177aaeebeb6cc3488 |
Hashes for lightmotif-0.2.0-pp38-pypy38_pp73-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cdf8e5c14dc492da54f325d529806796ead59ec28eb6335fed549cf4e91b79ab |
|
MD5 | 0822a5e586f966c8c5673ebf060f3ee4 |
|
BLAKE2b-256 | 69f4dc101e50b0b4d4641ac6808699bd9c9fa5098ddfc24fe434f7efa1559b6a |
Hashes for lightmotif-0.2.0-pp37-pypy37_pp73-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 588ffe6ae3506d22df14154ae0bf2bc4c0ee24fc94e08492fe2ae8d1b4ef1f89 |
|
MD5 | 2ce46591b7e02443e83916a74417603a |
|
BLAKE2b-256 | 14222bb7e39316c9a1d2a879e8a2c124ae39a48783bd9090e68d3165d6f02cfa |
Hashes for lightmotif-0.2.0-pp37-pypy37_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 90b66ad08edb84c7216507faf73aa264c62341f65055c6667f4d02eb7f0e4dc6 |
|
MD5 | 8c0a6543d44d68132e52afcdac67a243 |
|
BLAKE2b-256 | c3f73727c6904a817f24b8931e4ec70c015af1cf28a68dcd362e07d475729f88 |
Hashes for lightmotif-0.2.0-pp37-pypy37_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2b0f9813df2e8910ec5fb264f865ba270ff297558abb292f998d3f85e70d3d6d |
|
MD5 | cbe398b4374421b51574e51171339134 |
|
BLAKE2b-256 | 3a0cfad623384cf2d8e9a469ae02d855204bee1263baf0b820320e6abc55beaf |
Hashes for lightmotif-0.2.0-pp37-pypy37_pp73-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 29c444af38845ae1c996fb93860c055228719b96128269eb28063a399041215b |
|
MD5 | 6ded5ad993b9cda52f787ab68ce8da7d |
|
BLAKE2b-256 | 096f0d344c8e1a0adaa4490014bf011cb35d54bb16d929a167994691119f8769 |
Hashes for lightmotif-0.2.0-cp311-cp311-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e1362fddc86a4ce641ac199716aa75fcb882d7f103772f9d3946259ef2821030 |
|
MD5 | 3106547f282dc4d3a4b9bff47f7ad625 |
|
BLAKE2b-256 | 97e5cee9001812a2d7de5dc86758e39f385f29f3241533667f30fef5d3caa080 |
Hashes for lightmotif-0.2.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0f325585d4ac3c74a9a633fb04f7fddaa612477695fb3238d9492ea677b1fdad |
|
MD5 | 9f9bad9d54edbaf0d3cd72a095cec3fd |
|
BLAKE2b-256 | 803f9ca880704862c0788ac6e99c865dbc3969d2a88e775d2ce380f0bb283123 |
Hashes for lightmotif-0.2.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9d4a156cd315079c29f4fe7575a870059fe8df42d75b7db8e691b60abac9cb53 |
|
MD5 | e24b2744a03a49668dbe8cd0d3b2b797 |
|
BLAKE2b-256 | eedfdfafed0f57ea1507e37d42853335b1b59f59fb879f204216a3e210482959 |
Hashes for lightmotif-0.2.0-cp311-cp311-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e9aa7a034eb53901d207fd32c6435ddc12d24482037c9015828176ef4f3c2ebc |
|
MD5 | 1439110c9162025991bb7739f962a820 |
|
BLAKE2b-256 | 9eb5085afdc496407a55406915309f59eca6386f0ff044ff853f6a04f57292eb |
Hashes for lightmotif-0.2.0-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 52d2c53d5f23fae197702b53ea14120def1d649d05819224da7a9e07dcd24bb3 |
|
MD5 | 70e5ffaea2273744ea85e4ddf284c278 |
|
BLAKE2b-256 | 5258df255cb503ac175e56447f48fe1c1a1ae050c070bb81e1d05789983edeb3 |
Hashes for lightmotif-0.2.0-cp310-cp310-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 731f8053ed89cd2ed15454b87dc1d2a445994683d50af3e6e14c4da354bad056 |
|
MD5 | 588416a9f4e894bcb5e77bbc0af07929 |
|
BLAKE2b-256 | 0e7136f0a2fbea6e4e9a550596fd573bf02fdb2a0ba68a8bc8af0e6805385542 |
Hashes for lightmotif-0.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 113cdd28612084c95269fbc4affa37368e2fd8e45704bed80bd12e6ab9c99542 |
|
MD5 | 502c3db5035a361a269a27621fed4a55 |
|
BLAKE2b-256 | ad117c743b12d66ac883b90de2655d42952dc16044e29521f47c115634f370fe |
Hashes for lightmotif-0.2.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9332f73a12fc87a99aa16d7db59c024157251896c60a38e88bc6796127414615 |
|
MD5 | 2887dc558e00adc16fc8af0ce2bde8d8 |
|
BLAKE2b-256 | 28ab72dd6157e1b169f24bec64ca636a39ca94344d6137761b19fd83a901dfee |
Hashes for lightmotif-0.2.0-cp310-cp310-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8c49c5b9b4c9817eaca0615234f91446729818f943ded8b638723ec9e90b7fdb |
|
MD5 | c1b6907403f67f9ff44c85ef7339ca04 |
|
BLAKE2b-256 | da396189e5c1b8258caea19c6a133ba43cadd344c25965af3ad7b0a2d85c154e |
Hashes for lightmotif-0.2.0-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 19a281ab12dcf95bcd96ddfbd794eec0d507cf991697c2ae7aaca26a699f40ff |
|
MD5 | c6c82cea9009166a3797b503abfbc638 |
|
BLAKE2b-256 | 056f427d96e56cb1f7f9ef66217cf6d3f35142c03204797944f9e5dec21d1a16 |
Hashes for lightmotif-0.2.0-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cdd56964a10f14b121214809e6aaafaa9e1776db461b37c9cdb5582ba65dc12a |
|
MD5 | 80aa5933d787d11c3414a4af1665f303 |
|
BLAKE2b-256 | 05978a458e843ce31cb4d3abb02d16c778805e7dbbf756e6a4ff91f33f49299e |
Hashes for lightmotif-0.2.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | de770257c90342d62ac292b7d50dd2dbf1eaf739f696d652d9612ce11744d497 |
|
MD5 | 3179f2c91ca03dab650c79c8f595d0f1 |
|
BLAKE2b-256 | aad40642be5b0bf319a6ab474971ef6414c9c62071cb7b61f176faa2510b301e |
Hashes for lightmotif-0.2.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 10ffed0eeb1213268aae6c2fe4c27b693af234ee02a74febc2e463fbd9ea97be |
|
MD5 | 4606aae1573f57908683f40c2c678e3f |
|
BLAKE2b-256 | fa842eeda850513f16af882dca840a2a7b11751714246304aa3cc7e544eb1302 |
Hashes for lightmotif-0.2.0-cp39-cp39-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 07ba0666365abf15725707c8eefa3dc5b82215343950c396f3f39ed8fbb80a89 |
|
MD5 | ed239048929bc06a5f32d77530a29eb4 |
|
BLAKE2b-256 | cef2fc2b1b5dfd7130991a3fd9917b21b945640d9f1f5fa71e5366ca3a435c5b |
Hashes for lightmotif-0.2.0-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f8d9208aed3aad329afc6ed103639f1f7feb414cbb5690397fbc07f6690b1ff2 |
|
MD5 | e3ce6ff6eb7a3881f865c0e84f8d9ddf |
|
BLAKE2b-256 | 082a24ec743d18b4c50967b35550a9ea788b7b7313d89aebf3f937d4536d4a76 |
Hashes for lightmotif-0.2.0-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b337c9ec6994f54bed37033bf3f4f763b08bc1b23b1797f42ccf548220d29b75 |
|
MD5 | e1ff138010b0c684240296d05cae2bfe |
|
BLAKE2b-256 | a926a3137ac6664e3585e83d8312fdf2a1bfe62680191ffd69dd4c064f7708ef |
Hashes for lightmotif-0.2.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7b08bb03448d60431ea7fe6b37e693459bf8ac4d54643f0d1eceb8a1b806e6b8 |
|
MD5 | 4ee4d6b9b2abf6064c92dbfc99ac6a53 |
|
BLAKE2b-256 | 2b3351905d3d19a20d10f3e937591e347e5741363cfa58177c21756d0b9f2e76 |
Hashes for lightmotif-0.2.0-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 870b6c86484d4e6cbf60556241f2dc8dc38ef6c73f1d4e3354d3fa5c0348b237 |
|
MD5 | 2811242478448be0e1da0d5bcddbdb99 |
|
BLAKE2b-256 | aa5da2a95bfbbdc8cb5e312b69819e78ebed5f7f67a9f2e53857ec4743cea6df |
Hashes for lightmotif-0.2.0-cp38-cp38-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d5f30ab76de0b52427e0095a7f5ffb97abe6a041ca07982b014a3b292b3fa6b7 |
|
MD5 | d963392076fb0b247a095751d15ed513 |
|
BLAKE2b-256 | de2620c6e9daa21f910e21fcc41a57c649150971f2a3328f1d273975bd21f01d |
Hashes for lightmotif-0.2.0-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d045578e1cf93fc8c5a500579e5f14b77a7888cddf187411fda70d4597c31d70 |
|
MD5 | 93afce7f3188a853dacdf6982764d493 |
|
BLAKE2b-256 | f4ecbcc1f0ddb0545ebe159655067cdbbd2d0b0bbb6809efad70a07731d546f1 |
Hashes for lightmotif-0.2.0-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e3de5472efc2113ddaef94070e7e9cf25bad02563bc4e372e727849f10f299cf |
|
MD5 | 186001cb7a57a9e71260001a85acd592 |
|
BLAKE2b-256 | 6e4f6ef125bd896232bcf8d995d8725477a34d36b7fb0e97482fd7ab52ed3a8b |
Hashes for lightmotif-0.2.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3f25ca4a2f3e128d6cfa390e09fe97c04090e0e0b287d8026463b128073104d9 |
|
MD5 | 362e1f33c41d2b9918166a5598fe7159 |
|
BLAKE2b-256 | 41ed77362a0c7741b65c5b744f65ae0b77f1f761cc82addc96c6756c890a0a3c |
Hashes for lightmotif-0.2.0-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 242c79c49095f14b58a21a8acc24db2f5471695187e715ea62754672337d09ee |
|
MD5 | b6adeb6d6a82440b7520af9a868f3a97 |
|
BLAKE2b-256 | a9891487d312824432108a2915c7ba4c4e424b6adcad12195513248638bd5795 |
Hashes for lightmotif-0.2.0-cp37-cp37m-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 708b7fb65c9d08ececd8405595be549075a75b62217fd00e5df0a1b4d2dd4b16 |
|
MD5 | 7ea4a12dbb94f0cf92e66c99e3205b6a |
|
BLAKE2b-256 | a0507f5f47ed042a7252a4bc6ccb5c9cc5b5176bb78ccbf6af19713bd24b979d |