Skip to main content

Rust re-implementation of the MinCED algorithm to Detect Instances of CRISPRs in Environmental Data.

Project description

🔪🧅 Diced Star me

A Rust re-implementation of the MinCED algorithm to Detect Instances of CRISPRs in Environmental Data.

Actions Coverage License PyPI Bioconda AUR Wheel Python Versions Python Implementations Source GitHub issues Docs Changelog Downloads

🗺️ Overview

MinCED is a method developed by Connor T. Skennerton to identify Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) in isolate and metagenomic-assembled genomes. It was derived from the CRISPR Recognition Tool [1]. It uses a fast scanning algorithm to identify candidate repeats, combined with an extension step to find maximally spanning regions of the genome that feature a CRISPR repeat.

Diced is a Rust reimplementation of the MinCED method, using the original Java code as a reference. It produces exactly the same results as MinCED, corrects some bugs, and is much faster. The Diced implementation is available as a Rust library for convenience.

This is the Python version, there is a Rust crate available as well.

📋 Features

  • library interface: The Rust implementation is written as library to facilitate reusability in other projects. It is used to implement a Python library using PyO3 to generate a native extension.
  • single dependency: Pyrodigal is distributed as a Python package, so you can add it as a dependency to your project, and stop worrying about the Prodigal binary being present on the end-user machine.
  • zero-copy: The Scanner which iterates over candidate CRISPRs is zero-copy if provided with a simple &str reference, but it also supports data behind smart pointers such as Rc<str> or Arc<str>.
  • fast string matching: The Java implementation uses a handwritten implementation of the Boyer-Moore algorithm[2], while the Rust implementation uses the str::find method of the standard library, which implements the Two-way algorithm[3]. In addition, the memchr crate can be used as a fast SIMD-capable implementation of the memmem function.

💡 Example

Diced supports any sequence in string format.

import Bio.SeqIO
import diced

record = Bio.SeqIO.read("diced/tests/data/Aquifex_aeolicus_VF5.fna", "fasta")
sequence = str(record.seq)

for crispr in diced.scan(sequence):
    print(
        crispr.start,
        crispr.end,
        len(crispr.repeats),
        crispr.repeats[0],
    )

💭 Feedback

⚠️ Issue Tracker

Found a bug ? Have an enhancement request ? Head over to the GitHub issue tracker if you need to report or ask something. If you are filing in on a bug, please include as much information as you can about the issue, and try to recreate the same bug in a simple, easily reproducible situation.

📋 Changelog

This project adheres to Semantic Versioning and provides a changelog in the Keep a Changelog format.

⚖️ License

This library is provided under the open-source GPLv3 license, or later. The code for this implementation was derived from the MinCED source code, which is available under the GPLv3 as well.

This project is in no way not affiliated, sponsored, or otherwise endorsed by the original MinCED authors. It was developed by Martin Larralde during his PhD project at the Leiden University Medical Center in the Zeller team.

📚 References

  • [1] Bland, C., Ramsey, T. L., Sabree, F., Lowe, M., Brown, K., Kyrpides, N. C., & Hugenholtz, P. (2007). 'CRISPR recognition tool (CRT): a tool for automatic detection of clustered regularly interspaced palindromic repeats'. BMC bioinformatics, 8, 209. PMID:17577412 doi:10.1186/1471-2105-8-209.
  • [2] Boyer, R. S. and & Moore, J. S. (1977). 'A fast string searching algorithm'. Commun. ACM 20, 10 762–772. doi:10.1145/359842.359859
  • [3] Crochemore, M. & Perrin, D. (1991). 'Two-way string-matching'. J. ACM 38, 3, 650–674. doi:10.1145/116825.116845

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

diced-0.1.1.tar.gz (501.4 kB view hashes)

Uploaded Source

Built Distributions

diced-0.1.1-pp310-pypy310_pp73-win_amd64.whl (654.3 kB view hashes)

Uploaded PyPy Windows x86-64

diced-0.1.1-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (787.3 kB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ x86-64

diced-0.1.1-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (788.4 kB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ ARM64

diced-0.1.1-pp310-pypy310_pp73-macosx_10_15_x86_64.whl (753.7 kB view hashes)

Uploaded PyPy macOS 10.15+ x86-64

diced-0.1.1-pp39-pypy39_pp73-win_amd64.whl (654.1 kB view hashes)

Uploaded PyPy Windows x86-64

diced-0.1.1-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (787.3 kB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ x86-64

diced-0.1.1-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (788.0 kB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ ARM64

diced-0.1.1-pp39-pypy39_pp73-macosx_10_15_x86_64.whl (754.1 kB view hashes)

Uploaded PyPy macOS 10.15+ x86-64

diced-0.1.1-pp38-pypy38_pp73-win_amd64.whl (654.5 kB view hashes)

Uploaded PyPy Windows x86-64

diced-0.1.1-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (787.3 kB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ x86-64

diced-0.1.1-pp38-pypy38_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (788.6 kB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ ARM64

diced-0.1.1-pp38-pypy38_pp73-macosx_10_9_x86_64.whl (753.5 kB view hashes)

Uploaded PyPy macOS 10.9+ x86-64

diced-0.1.1-pp37-pypy37_pp73-win_amd64.whl (656.1 kB view hashes)

Uploaded PyPy Windows x86-64

diced-0.1.1-pp37-pypy37_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (789.9 kB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ x86-64

diced-0.1.1-pp37-pypy37_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (791.1 kB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ ARM64

diced-0.1.1-cp312-cp312-win_amd64.whl (652.2 kB view hashes)

Uploaded CPython 3.12 Windows x86-64

diced-0.1.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (785.2 kB view hashes)

Uploaded CPython 3.12 manylinux: glibc 2.17+ x86-64

diced-0.1.1-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (787.9 kB view hashes)

Uploaded CPython 3.12 manylinux: glibc 2.17+ ARM64

diced-0.1.1-cp312-cp312-macosx_11_0_arm64.whl (745.2 kB view hashes)

Uploaded CPython 3.12 macOS 11.0+ ARM64

diced-0.1.1-cp312-cp312-macosx_10_9_x86_64.whl (752.1 kB view hashes)

Uploaded CPython 3.12 macOS 10.9+ x86-64

diced-0.1.1-cp311-cp311-win_amd64.whl (654.1 kB view hashes)

Uploaded CPython 3.11 Windows x86-64

diced-0.1.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (787.6 kB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

diced-0.1.1-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (787.9 kB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ ARM64

diced-0.1.1-cp311-cp311-macosx_11_0_arm64.whl (745.5 kB view hashes)

Uploaded CPython 3.11 macOS 11.0+ ARM64

diced-0.1.1-cp311-cp311-macosx_10_9_x86_64.whl (754.2 kB view hashes)

Uploaded CPython 3.11 macOS 10.9+ x86-64

diced-0.1.1-cp310-cp310-win_amd64.whl (654.0 kB view hashes)

Uploaded CPython 3.10 Windows x86-64

diced-0.1.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (784.5 kB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

diced-0.1.1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (787.3 kB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ ARM64

diced-0.1.1-cp310-cp310-macosx_11_0_arm64.whl (745.0 kB view hashes)

Uploaded CPython 3.10 macOS 11.0+ ARM64

diced-0.1.1-cp310-cp310-macosx_10_9_x86_64.whl (751.3 kB view hashes)

Uploaded CPython 3.10 macOS 10.9+ x86-64

diced-0.1.1-cp39-cp39-win_amd64.whl (654.2 kB view hashes)

Uploaded CPython 3.9 Windows x86-64

diced-0.1.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (785.4 kB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

diced-0.1.1-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (787.5 kB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ ARM64

diced-0.1.1-cp39-cp39-macosx_11_0_arm64.whl (744.8 kB view hashes)

Uploaded CPython 3.9 macOS 11.0+ ARM64

diced-0.1.1-cp39-cp39-macosx_10_9_x86_64.whl (751.9 kB view hashes)

Uploaded CPython 3.9 macOS 10.9+ x86-64

diced-0.1.1-cp38-cp38-win_amd64.whl (653.3 kB view hashes)

Uploaded CPython 3.8 Windows x86-64

diced-0.1.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (784.9 kB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

diced-0.1.1-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (787.5 kB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ ARM64

diced-0.1.1-cp38-cp38-macosx_11_0_arm64.whl (745.4 kB view hashes)

Uploaded CPython 3.8 macOS 11.0+ ARM64

diced-0.1.1-cp38-cp38-macosx_10_9_x86_64.whl (752.4 kB view hashes)

Uploaded CPython 3.8 macOS 10.9+ x86-64

diced-0.1.1-cp37-cp37m-win_amd64.whl (653.8 kB view hashes)

Uploaded CPython 3.7m Windows x86-64

diced-0.1.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (784.9 kB view hashes)

Uploaded CPython 3.7m manylinux: glibc 2.17+ x86-64

diced-0.1.1-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (787.7 kB view hashes)

Uploaded CPython 3.7m manylinux: glibc 2.17+ ARM64

diced-0.1.1-cp37-cp37m-macosx_10_9_x86_64.whl (751.9 kB view hashes)

Uploaded CPython 3.7m macOS 10.9+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page