DeepRM: Deep Learning for RNA Modification Detection using Nanopore Direct RNA Sequencing

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

Hyeonseo_Hwang

These details have not been verified by PyPI

Project links

Documentation

Project description

DeepRM

Deep learning for RNA Modification

GitHub Repo stars GitHub last commit GitHub code size in bytes GitHub contributors GitHub language count DOI

✨ Introduction
🎯 Key Features
📦 Installation
🚀 Quickstart
💻 Usage
- Inference
- Training
🔧 Troubleshooting
📐 Architecture
📝 Citation
📝 License
🏛️ Contributors
🏛️ Acknowledgements

✨ Introduction

DeepRM is a deep learning-based framework for RNA modification detection using Nanopore direct RNA sequencing. This repository contains the source code for training and running DeepRM.

🎯 Key Features

High accuracy: Achieves state-of-the-art accuracy in RNA modification detection and stoichiometry measurement.
Single-molecule resolution: Provides single-molecule level predictions for RNA modifications.
End-to-end pipeline: Easy-to-use pipeline from raw reads to site-level predictions.
Customizable: Supports training of custom models.

📦 Installation

Prerequisites

Linux x86_64
Python 3.9+
Pytorch 2.3+
- https://pytorch.org/get-started/locally/
- Please ensure that you have installed the correct version of PyTorch with CUDA support if you want to use GPU for inference or training.

Optional

Torchmetrics 0.9.0+ (only for training)
- ```
python -m pip install torchmetrics
```
Dorado 0.7.3+ (optional, for basecalling)
- https://github.com/nanoporetech/dorado
SAMtools 1.16.1+ (optional, for BAM file processing)
- http://www.htslib.org/
Python package requirements are listed in requirements.txt and will be installed automatically when you install DeepRM.

Installation options

Estimated time: ~10 minutes

Install via PIP (recommended)

python -m pip install deeprm

Install from source (GitHub)

git clone https://github.com/vadanamu/deeprm
cd deeprm
python -m pip install -U pip
python -m pip install -e .

If installation fails on old OS (e.g., CentOS 7) due to NumPy, you can try installing older versions of NumPy first:

 python -m pip install "numpy<2.3.0,>2.0.0"
 python -m pip install -e .

Verify Installation

deeprm --version
deeprm check

If everything is installed correctly, you should see the version of DeepRM and a message indicating that the installation is successful.
If you encounter CUDA or torch-related errors, make sure you have installed the correct version of PyTorch with CUDA support.

Build from Source

DeepRM can use a C++-based preprocessing tool for acceleration, which is both provided as a precompiled binary and source code.
Depending on your system configuration, you may need to build the C++ preprocessing tool from source, located in the cpp directory of the DeepRM repository.
Please refer to the cpp/README.md page for detailed build instructions.

🚀 Quickstart

For demonstration purposes, you can use examples POD5 and BAM files provided in the examples directory of the repository.
You can also use your own POD5 and BAM files.

RNA Modification Detection

Estimated time: ~1 hours

1️⃣ Prepare data

deeprm call prep -p inference_example.pod5 -b inference_example.bam -o <prep_dir>

(Alternative) To supply your own POD5 file:

dorado basecaller --reference <ref_fasta> --min-qscore 0 --emit-moves rna004_130bps_sup@v5.0.0 <pod5_dir> \
| tee >(samtools sort -@ <threads> -O BAM -o <bam_path> - && samtools index -@ <threads> <bam_path>) \
| deeprm call prep -p <pod5_dir> -b - -o <prep_dir>

If Dorado fails due to "illegal memory access", try adding --chunksize <chunk_size> option (e.g., chunk_size=12000).

2️⃣ Run inference

deeprm call run -b inference_example.bam -i <prep_dir> -o <pred_dir> -s 1000

Adjust the -s (batch size) parameter according to your GPU memory capacity (default: 10000).
Expected output file:
- Site-level detection result file (.bed)
- Molecule-level detection result file (.npz)

Model Training

Estimated time: ~1 hours

1️⃣ Prepare unmodified & modified training data

deeprm train prep -p training_a_example.pod5 -b training_a_example.bam -o <prep_dir>/a
deeprm train prep -p training_m6a_example.pod5 -b training_m6a_example.bam -o <prep_dir>/m6a

2️⃣ Compile training data

deeprm train compile -n <prep_dir>/a/data -p <prep_dir>/m6a/data -o <prep_dir>/compiled

3️⃣ Run training

deeprm train run -d <prep_dir>/compiled -o <output_dir> --batch 64

Adjust the --batch parameter according to your GPU memory capacity (default: 1024).
Expected output file:
- Trained DeepRM model file (.pt)

💻 Usage

Inference usage

Prepare Data

Accelerated preparation (recommended, default)

This method uses precompiled C++ binary for accelerating the preprocessing step.

dorado basecaller --reference <ref_fasta> --min-qscore 0 --emit-moves rna004_130bps_sup@v5.0.0 <pod5_dir> \
| tee >(samtools sort -@ <threads> -O BAM -o <bam_path> - && samtools index -@ <threads> <bam_path>) \
| deeprm call prep -p <pod5_dir> -b - -o <prep_dir>

If Dorado fails due to "illegal memory access", try adding --chunksize <chunk_size> option (e.g., chunk_size=12000).
If the precompiled binary does not work on your system, please refer to the cpp/README.md page for detailed build instructions.
Adjust the -g (--filter-flag) parameter according to your needs. If using a genomic reference, you may want to use -g 260.

Sequential preparation

This method is slower than the accelerated preparation method, but is supported for cases such as:
- The POD5 files are already basecalled to BAM files with move tags.
- You want to run basecalling and preprocessing in separate machines.
Basecall the POD5 files to BAM files with move tags (skip if already done):
- If Dorado fails due to "illegal memory access", try adding --chunksize <chunk_size> option (e.g., chunk_size=12000).

dorado basecaller --reference <reference_path> --min-qscore 0 --emit-moves rna004_130bps_sup@v5.0.0 <pod5_dir> > <raw_bam_path>"

Filter, sort, and index the BAM files:
- Adjust the -F parameter according to your needs. If using a genomic reference, you may want to use -F 260.

samtools view -@ <threads> -bh -F 276 -o <bam_path> <raw_bam_path>
samtools sort -@ <threads> -o <bam_path> <bam_path>
samtools index -@ <threads> <bam_path>

To preprocess the inference data (transcriptome), run the following command:

deeprm call prep -p <input_POD5_dir> -b <bam_path> -o <prep_dir>

This will create the npz files for inference.

Run Inference

The trained DeepRM model file is attached in the repository: weight/deeprm_weights.pt.
For inference, run the following command:
- Adjust the -s (batch size) parameter according to your GPU memory capacity (default: 10000).

deeprm call run --model <model_file> --data <data_dir> --output <prediction_dir> --gpu-pool <gpu_pool>

This will create a directory with the site-level and molecule-level result files.
Optionally, if you used a transcriptomic reference for alignment, you can convert the result to genomic coordinates by supplying a RefFlat/GenePred/RefGene file (--annot <annotation_file>).

Site-level BED file format

The output BED file follows the standard bedMethyl format. Please see https://genome.ucsc.edu/goldenpath/help/bedMethyl.html for description.
Please note that columns 14 to 18 are zero-filled for compatibility. These columns will be used for a planned future update.

Molecule-level BAM file format

The output BAM file contains modification information in MM and ML tags. Please see https://samtools.github.io/hts-specs/SAMtags.pdf for description.

Molecule-level NPZ file format (advanced usage)

The output NPZ file contains the following arrays:

    1. read_id
    2. label_id
    3. pred: modification score (between 0 and 1)

Read ID specification:
- The UUID4 format read ID (128 bits) is converted to two 64-bit integers for NumPy compatibility.
- You can convert the two 64-bit integers back to UUID4 using the following Python code:
```
import numpy as np
import uuid
def int_to_uuid(high, low):
    return uuid.UUID(bytes=b"".join([high.tobytes(),low.tobytes()]))
```

Label ID specification:

Label ID contains the reference, position, and strand information.
You can decode the label ID using the following Python code:

import numpy as np
def decode_label_id(label_id, label_div = 10**9):
    strand = np.sign(label_id)
    label_id_abs = np.abs(label_id) - 1
    ref_id = label_id_abs // label_div
    pos = label_id_abs % label_div
    return ref_id, pos, strand

Reference ID is extracted from the input BAM file header.

Training usage

Prepare Data

You can skip this step if your POD5 files are already basecalled to BAM files with move tags.

dorado basecaller --min-qscore 0 --emit-moves rna004_130bps_sup@v5.0.0 <pod5_dir> > <bam_path>
samtools index -@ <threads> <bam_path>

To preprocess the training data (synthetic oligonucleotide), run the following command:

deeprm train prep --input <input_POD5_dir> --output <output_file>

This will create:
- Training dataset: /block
To compile the training dataset, run the following command:

deeprm train compile --input <input_POD5_dir> --output <output_file>

This will create:
- Training dataset: /block

Run Training

To train the model, run the following command:

deeprm train run --model deeprm_model --data <data_dir> --output <output_dir> --gpu-pool <gpu_pool>

Adjust the --batch parameter according to your GPU memory capacity (default: 1024).
This will create a directory with the trained model file.

🔧 Troubleshooting

If installation fails on old OS (e.g., CentOS 7) due to a NumPy-related error, you can try installing older versions of NumPy first:
```
python -m pip install "numpy<2.3.0,>2.0.0"
python -m pip install -e .
```
If you encounter CUDA or torch-related errors, make sure you have installed the correct version of PyTorch with correct CUDA version support.
If Dorado fails due to "illegal memory access", try adding --chunksize <chunk_size> option (e.g., chunk_size=12000).
If DeepRM call fails due to memory error, try reducing the batch size (-s option, default: 10000).
If DeepRM train fails due to memory error, try reducing the batch size (--batch option, default: 1024).
If DeeepRM call preprocess fails due to libssl.so.1.1 not found error in newer versons of Ubuntu, try installing libssl1.1 package:
- The libssl file can be found at: https://nz2.archive.ubuntu.com/ubuntu/pool/main/o/openssl
```
wget <libssl_file>
sudo dpkg <libssl_file>
```
If DeepRM call preprocess fails due to memory error, try reducing the number of threads (-t option), the preprocessing batch size (-n option), or the output chunk size (-k option).
If DeepRM train does not output training-related metrics, try installing torchmetrics package:
```
python -m pip install torchmetrics
```

📐 Architecture

📝 Citation

If you use DeepRM in your research, please cite the following paper:

:class: nohighlight
@article{
  title={Comprehensive single-molecule resolution discovery of m6A RNA modification sites in the human transcriptome},
  author={Gihyeon Kang, Hyeonseo Hwang, Hyeonseong Jeon, Heejin Choi, Hee Ryung Chang, Nagyeong Yeo, Junehee Park, Narae Son, Eunkyeong Jeon, Jungmin Lim, Jaeung Yun, Wook Choi, Jae-Yoon Jo, Jong-Seo Kim, Sangho Park, Yoon Ki Kim, Daehyun Baek},
  journal={Nature Communications},
  year={2025},
  volume={In press},
  publisher={Springer Nature}
  doi={10.1038/s41467-025-67417-w}

The article is fully open access and available at https://doi.org/10.1038/s41467-025-67417-w

📝 License

DeepRM is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License by Seoul National University R&DB Foundation and Genome4me Inc.

See the LICENSE file for details.

🏛️ Contributors

This repository is developed and maintained by the following organization:

Laboratory of Computational Biology, School of Biological Sciences, Seoul National University
- Principal Investigator: Prof. Daehyun Baek
Genome4me, Inc., Seoul, Republic of Korea

🏛️ Acknowledgements

This study was supported by the National Research Foundation of Korea (NRF) funded by the Ministry of Science and ICT, Republic of Korea (MSIT) (RS-2019-NR037866, RS-2020-NR049252, RS-2020-NR049538, and RS-2022-NR067483), by a grant of Korean ARPA-H Project through the Korea Health Industry Development Institute (KHIDI), funded by the Ministry of Health & Welfare, Republic of Korea (RS-2025-25422732), by Artificial Intelligence Industrial Convergence Cluster Development Project funded by MSIT and Gwangju Metropolitan City, by National IT Industry Promotion Agency (NIPA) funded by MSIT, and by Korea Research Environment Open Network (KREONET) managed and operated by Korea Institute of Science and Technology Information (KISTI).

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

Hyeonseo_Hwang

These details have not been verified by PyPI

Project links

Documentation

Release history Release notifications | RSS feed

1.1.3

Apr 14, 2026

1.1.2

Apr 9, 2026

1.1.1.post6

Mar 25, 2026

1.1.1.post5

Mar 25, 2026

This version

1.0.9

Feb 26, 2026

1.0.8

Dec 3, 2025

1.0.7

Nov 18, 2025

1.0.6

Nov 11, 2025

1.0.5

Sep 26, 2025

1.0.4

Sep 13, 2025

1.0.3

Sep 12, 2025

1.0.2

Sep 4, 2025

1.0.1

Sep 1, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deeprm-1.0.9.tar.gz (33.1 MB view details)

Uploaded Feb 26, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

deeprm-1.0.9-py3-none-any.whl (31.7 MB view details)

Uploaded Feb 26, 2026 Python 3

File details

Details for the file deeprm-1.0.9.tar.gz.

File metadata

Download URL: deeprm-1.0.9.tar.gz
Upload date: Feb 26, 2026
Size: 33.1 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for deeprm-1.0.9.tar.gz
Algorithm	Hash digest
SHA256	`2994b8289139b41a079f0ae782722225d189ca5661ae259ca992f5ff0204ec75`
MD5	`f94e18f50e90a4f3844ab29f973717dc`
BLAKE2b-256	`55eb62c3fb25796ed14dab2e9f243b4331aa33ea2b644eb68892566b6fdccc79`

See more details on using hashes here.

Provenance

The following attestation bundles were made for deeprm-1.0.9.tar.gz:

Publisher: publish.yml on vadanamu/DeepRM

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: deeprm-1.0.9.tar.gz
- Subject digest: 2994b8289139b41a079f0ae782722225d189ca5661ae259ca992f5ff0204ec75
- Sigstore transparency entry: 994324935
- Sigstore integration time: Feb 26, 2026
Source repository:
- Permalink: vadanamu/DeepRM@9c31be02f47f3bf73e8beab768651d7b91cef0f9
- Branch / Tag: refs/tags/v1.0.9
- Owner: https://github.com/vadanamu
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@9c31be02f47f3bf73e8beab768651d7b91cef0f9
- Trigger Event: release

File details

Details for the file deeprm-1.0.9-py3-none-any.whl.

File metadata

Download URL: deeprm-1.0.9-py3-none-any.whl
Upload date: Feb 26, 2026
Size: 31.7 MB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for deeprm-1.0.9-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b886831759125d30034744b2b112b4d29d071e9bc439c994820bd06b3365e934`
MD5	`e97c1969d6bb3c3e66bec34fb74c4e45`
BLAKE2b-256	`04b6e1260e30cb8363ebd409466b2cb3db1ac1c8498a4a60b00d6b6d5c67fcbf`

See more details on using hashes here.

Provenance

The following attestation bundles were made for deeprm-1.0.9-py3-none-any.whl:

Publisher: publish.yml on vadanamu/DeepRM

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: deeprm-1.0.9-py3-none-any.whl
- Subject digest: b886831759125d30034744b2b112b4d29d071e9bc439c994820bd06b3365e934
- Sigstore transparency entry: 994325008
- Sigstore integration time: Feb 26, 2026
Source repository:
- Permalink: vadanamu/DeepRM@9c31be02f47f3bf73e8beab768651d7b91cef0f9
- Branch / Tag: refs/tags/v1.0.9
- Owner: https://github.com/vadanamu
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@9c31be02f47f3bf73e8beab768651d7b91cef0f9
- Trigger Event: release

deeprm 1.0.9

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

DeepRM

Deep learning for RNA Modification

Table of Contents

✨ Introduction

🎯 Key Features

📦 Installation

Prerequisites

Optional

Installation options

Verify Installation

Build from Source

🚀 Quickstart

RNA Modification Detection

Model Training

💻 Usage

Inference usage

Prepare Data

Accelerated preparation (recommended, default)

Sequential preparation

Run Inference

Site-level BED file format

Molecule-level BAM file format

Molecule-level NPZ file format (advanced usage)

Training usage

Prepare Data

Run Training

🔧 Troubleshooting

📐 Architecture

📝 Citation

📝 License

🏛️ Contributors

🏛️ Acknowledgements

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance