DeepMiRT: miRNA target prediction using RNA foundation models and cross-attention
Project description
DeepMiRT
miRNA Target Prediction with RNA Foundation Models
DeepMiRT predicts miRNA-target interactions using RNA-FM embeddings and cross-attention, ranking #1 on eCLIP benchmarks among 12 methods.
Why DeepMiRT?
Existing miRNA target prediction tools rely on hand-crafted thermodynamic rules or shallow sequence features, struggling to capture the full complexity of miRNA-target recognition. DeepMiRT addresses this by leveraging RNA-FM, a foundation model pre-trained on 23 million non-coding RNAs, as a shared encoder for both miRNA and target. A cross-attention mechanism then lets the target "read" the miRNA to learn complementarity patterns beyond simple seed matching. The result: state-of-the-art performance, ranking #1 among 12 methods on eCLIP benchmarks and achieving 0.96 AUROC on a held-out test set of 813K samples.
Key Results at a Glance
| 0.96 AUROC |
#1 / 12 eCLIP Benchmark |
813K Test Samples |
3-line Python API |
Quick Start
pip install deepmirt
from deepmirt import predict
probs = predict(
mirna_seqs=["UGAGGUAGUAGGUUGUAUAGUU"],
target_seqs=["ACUGCAGCAUAUCUACUAUUUGCUACUGUAACCAUUGAUCU"],
)
print(f"Interaction probability: {probs[0]:.4f}")
Model weights are automatically downloaded from Hugging Face Hub on first use (~495 MB).
Architecture
graph LR
A["miRNA<br/>(18-25 nt)"] --> B["RNA-FM<br/>Encoder"]
C["Target<br/>(40 nt)"] --> D["RNA-FM<br/>Encoder"]
B --> E["miRNA Embedding<br/>(B, M, 640)"]
D --> F["Target Embedding<br/>(B, T, 640)"]
E --> G["Cross-Attention<br/>(2 layers, 8 heads)"]
F --> G
G --> H["Masked Mean Pool"]
H --> I["MLP Head<br/>640 → 256 → 64 → 1"]
I --> J["Probability"]
style B fill:#4a90d9,color:#fff
style D fill:#4a90d9,color:#fff
style G fill:#e67e22,color:#fff
style I fill:#27ae60,color:#fff
The model uses a shared RNA-FM encoder for both sequences, ensuring they lie in the same representation space. Cross-attention lets the target "read" the miRNA to capture complementarity and binding signals. Two-phase training first optimizes the classifier (frozen backbone), then fine-tunes the top 3 RNA-FM layers.
ASCII diagram (fallback)
miRNA (18-25 nt) ──→ [RNA-FM Encoder] ──→ miRNA embedding (B, M, 640) ──────────┐
(shared weights) ↓
Target (40 nt) ──→ [RNA-FM Encoder] ──→ target embedding (B, T, 640) → [Cross-Attention]
↓
Masked Mean Pool
↓
[MLP Head]
640 → 256 → 64 → 1
↓
probability
Genome-Wide Target Scanning
DeepMiRT can scan entire 3'UTR or transcript sequences to identify binding sites genome-wide -- similar to miRanda, but powered by the deep learning model.
Python API
from deepmirt import scan_targets
results = scan_targets(
mirna_fasta="mirnas.fa", # or dict: {"let-7": "UGAGGUAGUAGGUUGUAUAGUU"}
target_fasta="3utrs.fa",
output_prefix="results/scan", # writes _details.txt, _hits.tsv, _summary.tsv
device="cuda",
scan_mode="hybrid", # "seed" | "hybrid" | "exhaustive"
prob_threshold=0.5,
)
for r in results:
for hit in r.hits:
print(f"{r.target_id} pos={hit.position} prob={hit.probability:.3f} ({hit.seed_type})")
Command Line
# Scan with a FASTA of miRNAs against target 3'UTRs
deepmirt-predict scan \
--mirna-fasta mirnas.fa \
--target-fasta 3utrs.fa \
--output results/scan \
--device cuda \
--scan-mode hybrid \
--threshold 0.5
# Scan with a single miRNA sequence
deepmirt-predict scan \
--mirna UGAGGUAGUAGGUUGUAUAGUU \
--mirna-id dme-let-7 \
--target-fasta 3utrs.fa \
--output results/scan \
--device cpu
Scanning Modes
| Mode | Strategy | Speed | Use Case |
|---|---|---|---|
seed |
Only score seed match positions (8mer/7mer/6mer) | Fastest | Quick screen, high-confidence sites |
hybrid |
Seed matches + sliding window (stride=20) to fill gaps | Default | Balanced: catches seed + non-canonical sites |
exhaustive |
Sliding window across entire target | Slowest | Comprehensive, catches everything |
Output Files
| File | Description |
|---|---|
{prefix}_details.txt |
Human-readable report with ASCII alignment for each hit |
{prefix}_hits.tsv |
Per-hit table: position, probability, seed type, window sequence |
{prefix}_summary.tsv |
Per-target summary: number of hits, max/mean probability |
Example alignment from _details.txt:
Scanning: dme-miR-1-3p vs FBTR0082186_3UTR (1247 nt)
Hits found: 2
Hit at position 617, Prob: 0.8923, Seed: 7mer-m8
miRNA 3' ...uauCCGCGGCCggg... 5'
||||||||::
Target 5' ...aGGCGCCGGAact... 3'
Installation
From PyPI
pip install deepmirt
From source (for development)
git clone https://github.com/zichengll/DeepMiRT.git
cd DeepMiRT
pip install -e ".[dev,eval]"
Requirements
- Python >= 3.9
- PyTorch >= 1.12
- RNA-FM (
rna-fmpackage) - ~495 MB disk space for model weights (auto-downloaded)
Usage
Python API
from deepmirt import predict
# Single pair
probs = predict(
mirna_seqs=["UGAGGUAGUAGGUUGUAUAGUU"],
target_seqs=["ACUGCAGCAUAUCUACUAUUUGCUACUGUAACCAUUGAUCU"],
)
# Multiple pairs
probs = predict(
mirna_seqs=["UGAGGUAGUAGGUUGUAUAGUU", "UAGCAGCACGUAAAUAUUGGCG"],
target_seqs=["ACUGCAGCAUAUCUACUAUUUGCUACUGUAACCAUUGAUCU",
"GCAAUGUUUUCCACAGUGCUUACACAGAAAUAGCAACUUUA"],
device="cuda", # use GPU if available
)
CSV Batch Prediction
from deepmirt.predict import predict_from_csv
df = predict_from_csv(
csv_path="input.csv", # must have mirna_seq and target_seq columns
output_path="results.csv",
device="cpu",
)
Command Line
# Single pair
deepmirt-predict single --mirna UGAGGUAGUAGGUUGUAUAGUU \
--target ACUGCAGCAUAUCUACUAUUUGCUACUGUAACCAUUGAUCU
# Batch from CSV
deepmirt-predict batch --input data.csv --output results.csv --device cuda
Input Format
- miRNA sequences: 18-25 nt, DNA (T) or RNA (U) format
- Target sequences: 40 nt recommended (the model was trained on 40-nt target site fragments)
- Sequences are automatically converted to RNA format internally
Web Demo
Try DeepMiRT without installation: huggingface.co/spaces/liuliu2333/deepmirt
The demo supports single-pair prediction with pre-loaded examples and batch CSV upload.
Benchmark Results
Standard Benchmark: miRBench eCLIP Datasets
DeepMiRT ranks #1 on both eCLIP benchmark datasets from miRBench (fair comparison -- all methods evaluated on the same held-out data by the benchmark authors):
| Method | Type | eCLIP Klimentova 2022 | eCLIP Manakov 2022 |
|---|---|---|---|
| DeepMiRT (ours) | DL + LM | 0.7511 | 0.7543 |
| TargetScanCnn | CNN | 0.7138 | 0.7205 |
| miRBind | DL | 0.7004 | -- |
| miRNA_CNN | CNN | 0.6981 | -- |
| RNACofold | Thermo. | -- | 0.6841 |
Standard Benchmark: miRBench CLASH Dataset
On the CLASH dataset, DeepMiRT ranks #5 (honest reporting -- CLASH captures different biology):
| Method | AUROC |
|---|---|
| miRBind | 0.7649 |
| miRNA_CNN | 0.7614 |
| InteractionAwareModel | 0.7510 |
| RNACofold | 0.7455 |
| DeepMiRT (ours) | 0.6952 |
Our Test Set (813K samples)
| Method | Type | AUROC | AUPRC | F1 | MCC |
|---|---|---|---|---|---|
| DeepMiRT (ours) | DL + LM | 0.9606 | 0.9669 | 0.8949 | 0.8111 |
| TargetScanCnn | CNN | 0.8856 | 0.8999 | 0.8449 | 0.7197 |
| Seed Match | Rule | 0.8817 | 0.8585 | 0.8719 | 0.7726 |
| miRanda | Complement + MFE | 0.7688 | 0.7168 | 0.7813 | 0.5858 |
| miRBind | DL | 0.7641 | 0.7446 | 0.7012 | 0.3723 |
| RNAhybrid | MFE | 0.7230 | 0.7079 | 0.6673 | 0.3406 |
Full comparison table (16 methods)
| Method | Type | AUROC | AUPRC | F1 | MCC | Sensitivity | Specificity |
|---|---|---|---|---|---|---|---|
| DeepMiRT (ours) | DL + LM | 0.9606 | 0.9669 | 0.8949 | 0.8111 | 0.8396 | 0.9641 |
| TargetScanCnn | CNN | 0.8856 | 0.8999 | 0.8449 | 0.7197 | 0.7944 | 0.9182 |
| Seed Match | Rule | 0.8817 | 0.8585 | 0.8719 | 0.7726 | 0.8098 | 0.9535 |
| Seed6mer | Rule | 0.8670 | 0.8241 | 0.8592 | 0.7374 | 0.8263 | 0.9077 |
| Seed7mer | Rule | 0.7802 | 0.7619 | 0.7268 | 0.6118 | 0.5865 | 0.9740 |
| miRanda | Complement + MFE | 0.7688 | 0.7168 | 0.7813 | 0.5858 | 0.7420 | 0.8410 |
| miRBind | DL | 0.7641 | 0.7446 | 0.7012 | 0.3723 | 0.7659 | 0.6021 |
| miRNA_CNN | CNN | 0.7299 | 0.7200 | 0.6555 | 0.3543 | 0.6296 | 0.7229 |
| RNAhybrid | MFE | 0.7230 | 0.7079 | 0.6673 | 0.3406 | 0.6582 | 0.6823 |
| Seed8mer | Rule | 0.6624 | 0.6785 | 0.4534 | 0.4254 | 0.2963 | 0.9938 |
| Seed6merBulge | Rule | 0.6455 | 0.6313 | 0.6050 | 0.2878 | 0.5718 | 0.7147 |
| CnnMirTarget | CNN | 0.5830 | 0.5793 | 0.5362 | 0.0910 | 0.5025 | 0.5879 |
| TargetNet | DL | 0.4972 | 0.4980 | 0.4712 | -0.0080 | 0.4583 | 0.5333 |
| Random | -- | 0.5000 | 0.5005 | 0.4896 | -0.0014 | 0.4960 | 0.5026 |
Note: DeepMiRT excels at target site identification (eCLIP-type tasks) but is less suited for distinguishing competitive binding among multiple miRNAs at the same target site (CLASH-type tasks).
Training your own model
Training
# Phase 1: Frozen backbone
python deepmirt/training/train.py \
--config deepmirt/configs/default.yaml
# Phase 2: Unfreeze top 3 layers (edit config or use overrides)
python deepmirt/training/train.py \
--config deepmirt/configs/default.yaml \
--override unfreezing.enabled=true model.freeze_backbone=false \
--ckpt checkpoints/best-phase1.ckpt
See deepmirt/configs/default.yaml for all configurable hyperparameters.
Project structure
Project Structure
DeepMiRT/
├── deepmirt/
│ ├── model/ # Neural network modules
│ │ ├── mirna_target_model.py # Full model: RNA-FM + CrossAttn + MLP
│ │ ├── rnafm_encoder.py # RNA-FM wrapper with freeze/unfreeze
│ │ ├── cross_attention.py # Multi-layer cross-attention block
│ │ └── classifier.py # MLP classification head
│ ├── training/ # PyTorch Lightning training
│ │ ├── lightning_module.py # LightningModule with metrics
│ │ ├── train.py # Training entry point
│ │ └── callbacks.py # Staged unfreezing callback
│ ├── data_module/ # Data loading and preprocessing
│ ├── evaluation/ # 9-step evaluation pipeline
│ ├── scanning/ # Genome-wide target site scanning
│ │ ├── scanner.py # Core TargetScanner class
│ │ ├── site_finder.py # Seed match finder
│ │ └── output_formatter.py # TXT/TSV output formatters
│ ├── configs/ # YAML configuration files
│ ├── predict.py # Public prediction & scanning API
│ └── tests/ # Unit tests
├── app.py # Gradio web demo
├── examples/ # Usage examples
└── docs/ # Figures and documentation
Contributing
Contributions are welcome. Please open an issue first to discuss what you would like to change.
- Fork the repository
- Create a feature branch (
git checkout -b feature/my-feature) - Install dev dependencies:
pip install -e ".[dev]" - Run tests:
pytest deepmirt/tests/ - Open a pull request
Citation
@software{liu2026deepmirt,
title={DeepMiRT: miRNA Target Prediction with RNA Foundation Models},
author={Liu, Zicheng},
year={2026},
url={https://github.com/zichengll/DeepMiRT}
}
License
This project is licensed under the MIT License.
Acknowledgments
- RNA-FM -- pre-trained RNA foundation model (Chen et al., 2022)
- miRBench -- standardized benchmark framework (Gresova et al.)
- PyTorch Lightning -- training framework
- Hugging Face -- model hosting and demo platform
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file deepmirt-1.0.0.tar.gz.
File metadata
- Download URL: deepmirt-1.0.0.tar.gz
- Upload date:
- Size: 159.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cc26fd3833579980297dd326819760c43a6470a17de67a630c1b3b1512bf4a4c
|
|
| MD5 |
4dcf34ea52e07876348715a58d7bc1c1
|
|
| BLAKE2b-256 |
9e768e518900476e0bab266d5bdd415db6833d87fd32da29033ce0634f7dd818
|
File details
Details for the file deepmirt-1.0.0-py3-none-any.whl.
File metadata
- Download URL: deepmirt-1.0.0-py3-none-any.whl
- Upload date:
- Size: 187.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e630b9b6f141e3ac9a5e49dee4ea9d0806275a09f9f238d688dd02fc4bb76257
|
|
| MD5 |
81aab13c0a6f464ec07fd5623eb95330
|
|
| BLAKE2b-256 |
eebd98c9f0c6f42ccb277da1a301902bcccc6249ddd3a4cb9d5e69a549539e6b
|