Skip to main content

Rust re-implementation of the MinCED algorithm to Detect Instances of CRISPRs in Environmental Data.

Project description

🔪🧅 Diced Star me

A Rust re-implementation of the MinCED algorithm to Detect Instances of CRISPRs in Environmental Data.

Actions Coverage License PyPI Bioconda AUR Wheel Python Versions Python Implementations Source Mirror GitHub issues Docs Changelog Downloads

🗺️ Overview

MinCED is a method developed by Connor T. Skennerton to identify Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) in isolate and metagenomic-assembled genomes. It was derived from the CRISPR Recognition Tool [1]. It uses a fast scanning algorithm to identify candidate repeats, combined with an extension step to find maximally spanning regions of the genome that feature a CRISPR repeat.

Diced is a Rust reimplementation of the MinCED method, using the original Java code as a reference. It produces exactly the same results as MinCED, corrects some bugs, and is much faster. The Diced implementation is available as a Rust library for convenience.

This is the Python version, there is a Rust crate available as well.

📋 Features

  • library interface: The Rust implementation is written as library to facilitate reusability in other projects. It is used to implement a Python library using PyO3 to generate a native extension.
  • single dependency: Diced is distributed as a Python package, so you can add it as a dependency to your project, and stop worrying about the Java Virtual Machine being present on the end-user machine.
  • zero-copy: The Scanner which iterates over candidate CRISPRs is zero-copy if provided with a simple &str reference, but it also supports data behind smart pointers such as Rc<str> or Arc<str>. The original Python string and its substrings are never copied.
  • fast string matching: The Java implementation uses a handwritten implementation of the Boyer-Moore algorithm[2], while the Rust implementation uses the str::find method of the standard library, which implements the Two-way algorithm[3]. In addition, the memchr crate can be used as a fast SIMD-capable implementation of the memmem function.

💡 Example

Diced supports any sequence in string format.

import Bio.SeqIO
import diced

record = Bio.SeqIO.read("diced/tests/data/Aquifex_aeolicus_VF5.fna", "fasta")
sequence = str(record.seq)

for crispr in diced.scan(sequence):
    print(
        crispr.start,
        crispr.end,
        len(crispr.repeats),
        crispr.repeats[0],
    )

💭 Feedback

⚠️ Issue Tracker

Found a bug ? Have an enhancement request ? Head over to the GitHub issue tracker if you need to report or ask something. If you are filing in on a bug, please include as much information as you can about the issue, and try to recreate the same bug in a simple, easily reproducible situation.

📋 Changelog

This project adheres to Semantic Versioning and provides a changelog in the Keep a Changelog format.

⚖️ License

This library is provided under the open-source GPLv3 license, or later. The code for this implementation was derived from the MinCED source code, which is available under the GPLv3 as well.

This project is in no way not affiliated, sponsored, or otherwise endorsed by the original MinCED authors. It was developed by Martin Larralde during his PhD project at the Leiden University Medical Center in the Zeller team.

📚 References

  • [1] Bland, C., Ramsey, T. L., Sabree, F., Lowe, M., Brown, K., Kyrpides, N. C., & Hugenholtz, P. (2007). 'CRISPR recognition tool (CRT): a tool for automatic detection of clustered regularly interspaced palindromic repeats'. BMC bioinformatics, 8, 209. PMID:17577412 doi:10.1186/1471-2105-8-209.
  • [2] Boyer, R. S. and & Moore, J. S. (1977). 'A fast string searching algorithm'. Commun. ACM 20, 10 762–772. doi:10.1145/359842.359859
  • [3] Crochemore, M. & Perrin, D. (1991). 'Two-way string-matching'. J. ACM 38, 3, 650–674. doi:10.1145/116825.116845

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

diced-0.1.3.tar.gz (2.2 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

diced-0.1.3-cp38-abi3-win_amd64.whl (643.6 kB view details)

Uploaded CPython 3.8+Windows x86-64

diced-0.1.3-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (798.6 kB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ x86-64

diced-0.1.3-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (782.3 kB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ ARM64

diced-0.1.3-cp38-abi3-macosx_11_0_arm64.whl (752.5 kB view details)

Uploaded CPython 3.8+macOS 11.0+ ARM64

diced-0.1.3-cp38-abi3-macosx_10_12_x86_64.whl (762.6 kB view details)

Uploaded CPython 3.8+macOS 10.12+ x86-64

File details

Details for the file diced-0.1.3.tar.gz.

File metadata

  • Download URL: diced-0.1.3.tar.gz
  • Upload date:
  • Size: 2.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for diced-0.1.3.tar.gz
Algorithm Hash digest
SHA256 06784e6f52e6eeb926973383d9edb7508d4e6a59bce336b157e0e1bf3e4658d4
MD5 9f954085c1ac5de0547832aef592962b
BLAKE2b-256 89974cd0663cf6825a1689ab15e8f70763f04efeef798c181c909fa0150e7f5f

See more details on using hashes here.

Provenance

The following attestation bundles were made for diced-0.1.3.tar.gz:

Publisher: python.yml on althonos/diced

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file diced-0.1.3-cp38-abi3-win_amd64.whl.

File metadata

  • Download URL: diced-0.1.3-cp38-abi3-win_amd64.whl
  • Upload date:
  • Size: 643.6 kB
  • Tags: CPython 3.8+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for diced-0.1.3-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 145b3990994b316a0cfc5f35ba260d2485e20b8127fd20b04c6ca3b49607be12
MD5 3c66ab06b14c71ae44f2c67eac33d112
BLAKE2b-256 3824064ac6324819a386096f7bbab67f55bd3bae3cb27fa146ef592c86bd5d86

See more details on using hashes here.

Provenance

The following attestation bundles were made for diced-0.1.3-cp38-abi3-win_amd64.whl:

Publisher: python.yml on althonos/diced

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file diced-0.1.3-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for diced-0.1.3-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 9e37464defa9f871b2ae4355ad3420ce0b6a5579fb171a9e651e432a6e931659
MD5 60baa6cdfdd5c2144e0e6e71b6bed1e9
BLAKE2b-256 ed76825080322b2ad185e2fb145ba18fb7561770a458793fe30c2ba138994062

See more details on using hashes here.

Provenance

The following attestation bundles were made for diced-0.1.3-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: python.yml on althonos/diced

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file diced-0.1.3-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for diced-0.1.3-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 3503a6851729c60091f416e58f8b32aa76a53a7ff38066c071f57458518825bb
MD5 7221f2b7bf6c53c0912f0f707e8d4d4a
BLAKE2b-256 30daa55c66c707d22eb91048423e171b1b1ece2acd5e9e0354ff9371c0b1506d

See more details on using hashes here.

Provenance

The following attestation bundles were made for diced-0.1.3-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: python.yml on althonos/diced

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file diced-0.1.3-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

  • Download URL: diced-0.1.3-cp38-abi3-macosx_11_0_arm64.whl
  • Upload date:
  • Size: 752.5 kB
  • Tags: CPython 3.8+, macOS 11.0+ ARM64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for diced-0.1.3-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 939ebabe51c097ff65fa5e5ce878d5c2e51eea9bce4b07dccd7eeb097a89385b
MD5 12b3136d6a027c702615761dd022cb09
BLAKE2b-256 aeaf931f1d6326f7eeeca2d328c1a1144bbc11668d2cc4d8f9341151a32ad898

See more details on using hashes here.

Provenance

The following attestation bundles were made for diced-0.1.3-cp38-abi3-macosx_11_0_arm64.whl:

Publisher: python.yml on althonos/diced

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file diced-0.1.3-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for diced-0.1.3-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 b28874153151df964e74aae88fcb6223f594a46034695c9d3e29dc8b37ed1d3e
MD5 e93d11c6e09122d64dc532f4394b5d5e
BLAKE2b-256 7ba639dbb7ba1af769399612b3dc76b5d11cf65233f8c1efeddab9cac1e3c9b5

See more details on using hashes here.

Provenance

The following attestation bundles were made for diced-0.1.3-cp38-abi3-macosx_10_12_x86_64.whl:

Publisher: python.yml on althonos/diced

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page