Skip to main content

ncRNA language model

Project description

ncRNABert: Deciphering the landscape of non-coding RNA using language model

PyPI - Version PyPI - Python Version GitHub - LICENSE PyPI - Downloads Wheel build

Model details

Model # of parameters # of hidden size Pretraining dataset # of ncRNAs Model download
ncRNABert 303M 1024 RNAcentral 26M Download
ncRNABert 303M 1024 RNAcentral + nt - Download

Install

As a prerequisite, you must have PyTorch installed to use this repository.

You can use this one-liner for installation, using the latest release version

# latest version
pip install git+https://github.com/wangleiofficial/ncRNABert

# stable version
pip install ncRNABert

Usage

ncRNA sequence embedding

from ncRNABert.pretrain import load_ncRNABert, load_ncRNABert_ex
from ncRNABert.utils import BatchConverter
import torch

data = [
    ("ncRNA1", "ACGGAGGATGCGAGCGTTATCCGGATTTACTGGGCG"),
    ("ncRNA2", "AGGTTTTTAATCTAATTAAGATAGTTGA"),
]

ids, batch_token, lengths = BatchConverter(data)
model = load_ncRNABert()
model_ex = load_ncRNABert_ex()
with torch.no_grad():
    results = model(batch_token, lengths, repr_layers=[24])
    results_ex = model_ex(batch_token, lengths, repr_layers=[24])
# Generate per-sequence representations via averaging
token_representations = results["representations"][24]
token_representations_ex = results_ex["representations"][24]
sequence_representations = []
sequence_representations_ex = []
batch_lens = [len(item[1]) for item in data]
for i, tokens_len in enumerate(batch_lens):
    sequence_representations.append(token_representations[i, 1 : tokens_len - 1].mean(0))
    sequence_representations_ex.append(token_representations_ex[i, 1 : tokens_len - 1].mean(0))

License

This source code is licensed under the Apache-2.0 license found in the LICENSE file in the root directory of this source tree.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ncRNABert-0.1.3.tar.gz (10.3 kB view hashes)

Uploaded Source

Built Distribution

ncRNABert-0.1.3-py3-none-any.whl (11.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page