Skip to main content

PyO3 bindings and Python interface to lightmotif, a library for platform-accelerated biological motif scanning using position weight matrices.

Project description

🎼🧬 lightmotif Star me

A lightweight platform-accelerated library for biological motif scanning using position weight matrices.

Actions Coverage License Docs Crate PyPI Wheel Bioconda Python Versions Python Implementations Source Mirror GitHub issues Changelog Downloads

🗺️ Overview

Motif scanning with position weight matrices (also known as position-specific scoring matrices) is a robust method for identifying motifs of fixed length inside a biological sequence. They can be used to identify transcription factor binding sites in DNA, or protease cleavage site in polypeptides. Position weight matrices are often viewed as sequence logos:

MX000274.svg

The lightmotif library provides a Python module to run very efficient searches for a motif encoded in a position weight matrix. The position scanning combines several techniques to allow high-throughput processing of sequences:

  • Compile-time definition of alphabets and matrix dimensions.
  • Sequence symbol encoding for fast table look-ups, as implemented in HMMER[1] or MEME[2]
  • Striped sequence matrices to process several positions in parallel, inspired by Michael Farrar[3].
  • Vectorized matrix row look-up using permute instructions of AVX2.

This is the Python version, there is a Rust crate available as well.

🔧 Installing

lightmotif can be installed directly from PyPI, which hosts some pre-built wheels for most mainstream platforms, as well as the code required to compile from source with Rust:

$ pip install lightmotif

In the event you have to compile the package from source, all the required Rust libraries are vendored in the source distribution, and a Rust compiler will be setup automatically if there is none on the host machine.

💡 Example

The motif interface should be mostly compatible with the Bio.motifs module from Biopython. The notable difference is that the calculate method of PSSM objects expects a striped sequence instead.

import lightmotif

# Create a count matrix from an iterable of sequences
motif = lightmotif.create(["GTTGACCTTATCAAC", "GTTGATCCAGTCAAC"])

# Create a PSSM with 0.1 pseudocounts and uniform background frequencies
pwm = motif.counts.normalize(0.1)
pssm = pwm.log_odds()

# Encode the target sequence into a striped matrix
seq = "ATGTCCCAACAACGATACCCCGAGCCCATCGCCGTCATCGGCTCGGCATGCAGATTCCCAGGCG"
striped = lightmotif.stripe(seq)

# Compute scores using the fastest backend implementation for the host machine
scores = pssm.calculate(sseq)

⏱️ Benchmarks

Benchmarks use the MX000001 motif from PRODORIC[4], and the complete genome of an Escherichia coli K12 strain. Benchmarks were run on a i7-10710U CPU running @1.10GHz, compiled with --target-cpu=native.

lightmotif (avx2):      5,479,884 ns/iter    (+/- 3,370,523) = 807.8 MiB/s
Bio.motifs:           334,359,765 ns/iter   (+/- 11,045,456) =  13.2 MiB/s
MOODS.scan:           182,710,624 ns/iter    (+/- 9,459,257) =  24.2 MiB/s
pymemesuite.fimo:     239,694,118 ns/iter    (+/- 7,444,620) =  18.5 MiB/s

💭 Feedback

⚠️ Issue Tracker

Found a bug ? Have an enhancement request ? Head over to the GitHub issue tracker if you need to report or ask something. If you are filing in on a bug, please include as much information as you can about the issue, and try to recreate the same bug in a simple, easily reproducible situation.

📋 Changelog

This project adheres to Semantic Versioning and provides a changelog in the Keep a Changelog format.

⚖️ License

This library is provided under the open-source MIT license.

This project was developed by Martin Larralde during his PhD project at the European Molecular Biology Laboratory in the Zeller team.

📚 References

  • [1] Eddy, Sean R. ‘Accelerated Profile HMM Searches’. PLOS Computational Biology 7, no. 10 (20 October 2011): e1002195. doi:10.1371/journal.pcbi.1002195.
  • [2] Grant, Charles E., Timothy L. Bailey, and William Stafford Noble. ‘FIMO: Scanning for Occurrences of a given Motif’. Bioinformatics 27, no. 7 (1 April 2011): 1017–18. doi:10.1093/bioinformatics/btr064.
  • [3] Farrar, Michael. ‘Striped Smith–Waterman Speeds Database Searches Six Times over Other SIMD Implementations’. Bioinformatics 23, no. 2 (15 January 2007): 156–61. doi:10.1093/bioinformatics/btl582.
  • [4] Dudek, Christian-Alexander, and Dieter Jahn. ‘PRODORIC: State-of-the-Art Database of Prokaryotic Gene Regulation’. Nucleic Acids Research 50, no. D1 (7 January 2022): D295–302. doi:10.1093/nar/gkab1110.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lightmotif-0.6.0.tar.gz (58.7 kB view hashes)

Uploaded Source

Built Distributions

lightmotif-0.6.0-pp39-pypy39_pp73-win_amd64.whl (178.0 kB view hashes)

Uploaded PyPy Windows x86-64

lightmotif-0.6.0-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (339.7 kB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ x86-64

lightmotif-0.6.0-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (343.5 kB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ ARM64

lightmotif-0.6.0-pp39-pypy39_pp73-macosx_10_9_x86_64.whl (292.6 kB view hashes)

Uploaded PyPy macOS 10.9+ x86-64

lightmotif-0.6.0-pp38-pypy38_pp73-win_amd64.whl (177.9 kB view hashes)

Uploaded PyPy Windows x86-64

lightmotif-0.6.0-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (343.5 kB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ x86-64

lightmotif-0.6.0-pp38-pypy38_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (343.8 kB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ ARM64

lightmotif-0.6.0-pp38-pypy38_pp73-macosx_10_9_x86_64.whl (292.9 kB view hashes)

Uploaded PyPy macOS 10.9+ x86-64

lightmotif-0.6.0-pp37-pypy37_pp73-win_amd64.whl (180.0 kB view hashes)

Uploaded PyPy Windows x86-64

lightmotif-0.6.0-pp37-pypy37_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (341.4 kB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ x86-64

lightmotif-0.6.0-pp37-pypy37_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (345.2 kB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ ARM64

lightmotif-0.6.0-pp37-pypy37_pp73-macosx_10_9_x86_64.whl (294.4 kB view hashes)

Uploaded PyPy macOS 10.9+ x86-64

lightmotif-0.6.0-cp311-cp311-win_amd64.whl (177.3 kB view hashes)

Uploaded CPython 3.11 Windows x86-64

lightmotif-0.6.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (339.0 kB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

lightmotif-0.6.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (343.0 kB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ ARM64

lightmotif-0.6.0-cp311-cp311-macosx_11_0_arm64.whl (284.1 kB view hashes)

Uploaded CPython 3.11 macOS 11.0+ ARM64

lightmotif-0.6.0-cp311-cp311-macosx_10_9_x86_64.whl (292.0 kB view hashes)

Uploaded CPython 3.11 macOS 10.9+ x86-64

lightmotif-0.6.0-cp310-cp310-win_amd64.whl (177.3 kB view hashes)

Uploaded CPython 3.10 Windows x86-64

lightmotif-0.6.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (339.2 kB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

lightmotif-0.6.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (343.1 kB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ ARM64

lightmotif-0.6.0-cp310-cp310-macosx_11_0_arm64.whl (284.1 kB view hashes)

Uploaded CPython 3.10 macOS 11.0+ ARM64

lightmotif-0.6.0-cp310-cp310-macosx_10_9_x86_64.whl (292.0 kB view hashes)

Uploaded CPython 3.10 macOS 10.9+ x86-64

lightmotif-0.6.0-cp39-cp39-win_amd64.whl (177.5 kB view hashes)

Uploaded CPython 3.9 Windows x86-64

lightmotif-0.6.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (339.7 kB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

lightmotif-0.6.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (347.5 kB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ ARM64

lightmotif-0.6.0-cp39-cp39-macosx_11_0_arm64.whl (285.0 kB view hashes)

Uploaded CPython 3.9 macOS 11.0+ ARM64

lightmotif-0.6.0-cp39-cp39-macosx_10_9_x86_64.whl (292.8 kB view hashes)

Uploaded CPython 3.9 macOS 10.9+ x86-64

lightmotif-0.6.0-cp38-cp38-win_amd64.whl (177.4 kB view hashes)

Uploaded CPython 3.8 Windows x86-64

lightmotif-0.6.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (338.6 kB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

lightmotif-0.6.0-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (342.3 kB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ ARM64

lightmotif-0.6.0-cp38-cp38-macosx_11_0_arm64.whl (284.1 kB view hashes)

Uploaded CPython 3.8 macOS 11.0+ ARM64

lightmotif-0.6.0-cp38-cp38-macosx_10_9_x86_64.whl (291.8 kB view hashes)

Uploaded CPython 3.8 macOS 10.9+ x86-64

lightmotif-0.6.0-cp37-cp37m-win_amd64.whl (177.5 kB view hashes)

Uploaded CPython 3.7m Windows x86-64

lightmotif-0.6.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (338.8 kB view hashes)

Uploaded CPython 3.7m manylinux: glibc 2.17+ x86-64

lightmotif-0.6.0-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (342.1 kB view hashes)

Uploaded CPython 3.7m manylinux: glibc 2.17+ ARM64

lightmotif-0.6.0-cp37-cp37m-macosx_10_9_x86_64.whl (292.2 kB view hashes)

Uploaded CPython 3.7m macOS 10.9+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page