Skip to main content

ncRNA language model

Project description

ncRNABert: Deciphering the landscape of non-coding RNA using language model

PyPI - Version PyPI - Python Version GitHub - LICENSE PyPI - Downloads Wheel build

Model details

Model # of parameters # of hidden size Pretraining dataset # of ncRNAs Model download
ncRNABert 303M 1024 RNAcentral 26M Download

Install

As a prerequisite, you must have PyTorch installed to use this repository.

You can use this one-liner for installation, using the latest release version

# latest version
pip install git+https://github.com/wangleiofficial/ncRNABert

# stable version
pip install ncRNABert

Usage

ncRNA sequence embedding

from ncRNABert.pretrain import load_ncRNABert
from ncRNABert.utils import BatchConverter
import torch

data = [
    ("ncRNA1", "ACGGAGGATGCGAGCGTTATCCGGATTTACTGGGCG"),
    ("ncRNA2", "AGGTTTTTAATCTAATTAAGATAGTTGA"),
]

ids, batch_token, lengths = BatchConverter(data)
model = load_ncRNABert()
with torch.no_grad():
    results = model(batch_token, lengths, repr_layers=[24])
# Generate per-sequence representations via averaging
token_representations = results["representations"][24]
sequence_representations = []
sequence_representations_ex = []
batch_lens = [len(item[1]) for item in data]
for i, tokens_len in enumerate(batch_lens):
    sequence_representations.append(token_representations[i].mean(0))

Comprehensive benchmarking of Large Language Models

When comparing the performance of different RNA language models, the ncRNABert model has demonstrated exceptional performance across multiple evaluation metrics. According to the tales, ncRNABert outperforms other models in terms of F1 score, achieving an average accuracy of 0.595, which is the highest among all the models.

Methods 16s 23s 5s RNaseP grp1 srp tRNA telomerase tmRNA Average
ERNIE-RNA 0.539 0.580 0.820 0.687 0.317 0.610 0.841 0.151 0.700 0.583
RNA-FM 0.152 0.193 0.555 0.324 0.136 0.277 0.763 0.121 0.293 0.313
RNA-MSM 0.133 0.223 0.264 0.207 0.189 0.151 0.338 0.072 0.240 0.202
RNABERT 0.144 0.167 0.211 0.171 0.144 0.152 0.458 0.101 0.152 0.189
RNAErnie 0.191 0.227 0.536 0.198 0.170 0.164 0.795 0.071 0.259 0.290
RiNALMo 0.473 0.596 0.796 0.667 0.566 0.548 0.845 0.093 0.669 0.584
one-hot 0.155 0.188 0.279 0.169 0.149 0.174 0.452 0.132 0.175 0.208
ncRNABert 0.573 0.733 0.773 0.629 0.423 0.589 0.789 0.161 0.688 0.595
Methods bpRNA bpRNA-new
ERNIE-RNA 0.628 0.601
RNA-FM 0.522 0.423
RNA-MSM 0.426 0.393
RNABERT 0.357 0.358
RNAErnie 0.442 0.387
RiNALMo 0.599 0.446
one-hot 0.351 0.383
ncRNABert 0.595 0.572

License

This source code is licensed under the Apache-2.0 license found in the LICENSE file in the root directory of this source tree.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ncrnabert-0.1.4.tar.gz (11.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ncrnabert-0.1.4-py3-none-any.whl (11.9 kB view details)

Uploaded Python 3

File details

Details for the file ncrnabert-0.1.4.tar.gz.

File metadata

  • Download URL: ncrnabert-0.1.4.tar.gz
  • Upload date:
  • Size: 11.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.21

File hashes

Hashes for ncrnabert-0.1.4.tar.gz
Algorithm Hash digest
SHA256 d998442dc72fa2826c98b172303de3353f7eb27ba1ade56e8296eceba8604cdc
MD5 5098996b914ed3a154702da5c66adaaa
BLAKE2b-256 a432bf5ef4a4234625c357b6b3041eadcba0371c9cb4dec28f6012b32cdeaea6

See more details on using hashes here.

File details

Details for the file ncrnabert-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: ncrnabert-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 11.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.21

File hashes

Hashes for ncrnabert-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 d621fd9890b4e1352ee40fb2ae0c062a5348208b0f0dc57fb9b40b4037057dce
MD5 e44442f3f9443c0f6720bb86df67c02b
BLAKE2b-256 299022088aeef9dca7e0abbc358c8b2eb230e830213fd1e03222042997b5102a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page