Skip to main content

A Genomic Language Model for Chimera Artifact Detection in Nanopore Direct RNA Sequencing

Project description

logo DeepChopper social

pypi PyPI - Wheel license pypi version platform Actions status Space

🧬 DeepChopper leverages language model to accurately detect and chop artificial sequences which may cause chimeric reads, ensuring higher quality and more reliable sequencing results. By integrating seamlessly with existing workflows, DeepChopper provides a robust solution for researchers and bioinformatics working with NanoPore direct-RNA sequencing data.

🚀 Quick Start: Try DeepChopper Online

Experience DeepChopper instantly through our user-friendly web interface. No installation required! Simply click the button below to launch the web application and start exploring DeepChopper's capabilities:

Open in Hugging Face Spaces

What you can do online:

  • 📤 Upload your sequencing data
  • 🔬 Run DeepChopper's analysis
  • 📊 Visualize results
  • 🎛️ Experiment with different parameters

Perfect for quick tests or demonstrations! However, for extensive analyses or custom workflows, we recommend installing DeepChopper locally.

⚠️ Note: The online version is limited to one FASTQ record at a time and may not be suitable for large-scale projects.

📦 Installation

DeepChopper can be installed using pip, the Python package installer. Follow these steps to install:

  1. Ensure you have Python 3.10 or later installed on your system.

  2. Create a virtual environment (recommended):

    python -m venv deepchopper_env
    source deepchopper_env/bin/activate  # On Windows use `deepchopper_env\Scripts\activate`
    
  3. Install DeepChopper:

    pip install deepchopper
    
  4. Verify the installation:

    deepchopper --help
    

Compatibility and Support

DeepChopper is designed to work across various platforms and Python versions. Below are the compatibility matrices for PyPI installations:

PyPI Support

Python Version Linux x86_64 macOS Intel macOS Apple Silicon Windows x86_64
3.10
3.11
3.12

🆘 Trouble installing? Check our Troubleshooting Guide or open an issue.

🛠️ Usage

For a comprehensive guide, check out our full tutorial. Here's a quick overview:

Command-Line Interface

DeepChopper offers three main commands: encode, predict, and chop.

  1. Encode your input data:

    deepchopper encode <input.fq>
    
  2. Predict chimera artifacts:

    deepchopper predict <input.parquet> --output predictions
    

    Using GPUs? Add the --gpus flag:

    deepchopper predict <input.parquet> --output predictions --gpus 2
    
  3. Chop chimera artifacts:

    deepchopper chop <predictions> raw.fq
    

Want a GUI? Launch the web interface (note: limited to one FASTQ record at a time):

deepchopper web

Python Library

Integrate DeepChopper into your Python scripts:

import deepchopper

model = deepchopper.DeepChopper.from_pretrained("yangliz5/deepchopper")
# Your analysis code here

📚 Cite

If DeepChopper aids your research, please cite our paper:

@article {Li2024.10.23.619929,
        author = {Li, Yangyang and Wang, Ting-You and Guo, Qingxiang and Ren, Yanan and Lu, Xiaotong and Cao, Qi and Yang, Rendong},
        title = {A Genomic Language Model for Chimera Artifact Detection in Nanopore Direct RNA Sequencing},
        elocation-id = {2024.10.23.619929},
        year = {2024},
        doi = {10.1101/2024.10.23.619929},
        publisher = {Cold Spring Harbor Laboratory},
        abstract = {Chimera artifacts in nanopore direct RNA sequencing (dRNA-seq) data can confound transcriptome analyses, yet no existing tools are capable of detecting and removing them due to limitations in basecalling models. We present DeepChopper, a genomic language model that accurately identifies and eliminates adapter sequences within base-called dRNA-seq reads, effectively removing chimeric read artifacts. DeepChopper significantly improves critical downstream analyses, including transcript annotation and gene fusion detection, enhancing the reliability and utility of nanopore dRNA-seq for transcriptomics research.Competing Interest StatementThe authors have declared no competing interest.},
        URL = {https://www.biorxiv.org/content/early/2024/10/25/2024.10.23.619929},
        eprint = {https://www.biorxiv.org/content/early/2024/10/25/2024.10.23.619929.full.pdf},
        journal = {bioRxiv}
}

🤝 Contribution

We welcome contributions! Here's how to set up your development environment:

Build Environment

git clone https://github.com/ylab-hi/DeepChopper.git
cd DeepChopper
conda env create -n environment.yaml
conda activate deepchopper

Install Dependencies

pip install pipx
pipx install --suffix @master git+https://github.com/python-poetry/poetry.git@master
poetry@master install

🎉 Ready to contribute? Check out our Contribution Guidelines to get started!

📬 Support

Need help? Have questions?


DeepChopper is developed with ❤️ by the YLab team. Happy sequencing! 🧬🔬

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deepchopper-1.2.6.tar.gz (69.2 MB view details)

Uploaded Source

Built Distributions

deepchopper-1.2.6-cp310-abi3-win_amd64.whl (4.4 MB view details)

Uploaded CPython 3.10+ Windows x86-64

deepchopper-1.2.6-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.6 MB view details)

Uploaded CPython 3.10+ manylinux: glibc 2.17+ x86-64

deepchopper-1.2.6-cp310-abi3-macosx_11_0_arm64.whl (3.9 MB view details)

Uploaded CPython 3.10+ macOS 11.0+ ARM64

deepchopper-1.2.6-cp310-abi3-macosx_10_12_x86_64.whl (4.5 MB view details)

Uploaded CPython 3.10+ macOS 10.12+ x86-64

File details

Details for the file deepchopper-1.2.6.tar.gz.

File metadata

  • Download URL: deepchopper-1.2.6.tar.gz
  • Upload date:
  • Size: 69.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.7.4

File hashes

Hashes for deepchopper-1.2.6.tar.gz
Algorithm Hash digest
SHA256 f264d1d451a9ad28073af0c2be95f177b71c39fa63a726408592bf412f56e4fe
MD5 632b5c21f6967120a9da787479c7cea4
BLAKE2b-256 3bce1016be3b635fbca32d8436f323cf5e60ec0b92d4211263752b92cac6304a

See more details on using hashes here.

File details

Details for the file deepchopper-1.2.6-cp310-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for deepchopper-1.2.6-cp310-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 cc20ce47f966f4a057462f57af195ead9477bf4f7d23e344dbce72818e3d4afc
MD5 8e86c34decc78cd322b5524e263f41cf
BLAKE2b-256 50039bd75607c4daa1f8a6cdaed66b77e4ab0001b5737e5399d2b859b0c4d92a

See more details on using hashes here.

File details

Details for the file deepchopper-1.2.6-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for deepchopper-1.2.6-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 8106e9acf0abfac4a5b00cc0996fcf4c24d49ddd4bc09d456fa2717287fda242
MD5 b9b62a549eabf8684dba440edaa74275
BLAKE2b-256 8a6388aecc1284b9d24735a5a6a6538f1fb1b4cb4afb23b63d203c836e8a6db8

See more details on using hashes here.

File details

Details for the file deepchopper-1.2.6-cp310-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for deepchopper-1.2.6-cp310-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 757fbb91c5f513ea458982b88be5701763178eb8af2203de49f730ca88681509
MD5 4513a1f079063b713a9d3b708b624d0d
BLAKE2b-256 4c7862ffd6275977786e40e485b8af6de39bfde512574ec3b556aab8ed661f8d

See more details on using hashes here.

File details

Details for the file deepchopper-1.2.6-cp310-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for deepchopper-1.2.6-cp310-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 ff2d2a504d140e7c6f92940887ceebc0fce60596db00a3926320f2f763602f27
MD5 9d11d1fd86eaa961c138fe1c5c78679b
BLAKE2b-256 5799e0f87deb3dc75b24bee00fb59958136dc5486b56273f526caa881e900dd9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page