ncRNA language model
Project description
ncRNABert: Deciphering the landscape of non-coding RNA using language model
Model details
Model | # of parameters | # of hidden size | Pretraining dataset | # of ncRNAs | Model download |
---|---|---|---|---|---|
ncRNABert | 303M | 1024 | RNAcentral | 26M | Download |
ncRNABert | 303M | 1024 | RNAcentral + nt | - | Download |
Install
As a prerequisite, you must have PyTorch installed to use this repository.
You can use this one-liner for installation, using the latest release version
# latest version
pip install git+https://github.com/wangleiofficial/ncRNABert
# stable version
pip install ncRNABert
Usage
ncRNA sequence embedding
from ncRNABert.pretrain import load_ncRNABert, load_ncRNABert_ex
from ncRNABert.utils import BatchConverter
import torch
data = [
("ncRNA1", "ACGGAGGATGCGAGCGTTATCCGGATTTACTGGGCG"),
("ncRNA2", "AGGTTTTTAATCTAATTAAGATAGTTGA"),
]
ids, batch_token, lengths = BatchConverter(data)
model = load_ncRNABert()
model_ex = load_ncRNABert_ex()
with torch.no_grad():
results = model(batch_token, lengths, repr_layers=[24])
results_ex = model_ex(batch_token, lengths, repr_layers=[24])
# Generate per-sequence representations via averaging
token_representations = results["representations"][24]
token_representations_ex = results_ex["representations"][24]
sequence_representations = []
sequence_representations_ex = []
batch_lens = [len(item[1]) for item in data]
for i, tokens_len in enumerate(batch_lens):
sequence_representations.append(token_representations[i, 1 : tokens_len - 1].mean(0))
sequence_representations_ex.append(token_representations_ex[i, 1 : tokens_len - 1].mean(0))
License
This source code is licensed under the Apache-2.0 license found in the LICENSE file in the root directory of this source tree.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
ncRNABert-0.1.3.tar.gz
(10.3 kB
view details)
Built Distribution
ncRNABert-0.1.3-py3-none-any.whl
(11.2 kB
view details)
File details
Details for the file ncRNABert-0.1.3.tar.gz
.
File metadata
- Download URL: ncRNABert-0.1.3.tar.gz
- Upload date:
- Size: 10.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.18
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4e8ddd4fbecd10729714be106ad298c0f1ce5bffa4dc150a7ead5ee4cf815d9f |
|
MD5 | 930fd0f23d7b8deb18ae98e1990eefaa |
|
BLAKE2b-256 | b8d8ab03d223cdc63ca6a70ea931cbf93c5af922f339a980f70b0407e1d509bd |
File details
Details for the file ncRNABert-0.1.3-py3-none-any.whl
.
File metadata
- Download URL: ncRNABert-0.1.3-py3-none-any.whl
- Upload date:
- Size: 11.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.18
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 53a80128bbfe4d648f54dd675d99c8a3f80c530eed3679095f821fafc679d37c |
|
MD5 | 3a74bce477540fbef570049738ce404c |
|
BLAKE2b-256 | 9ddf3894d272a507d58226990fdceb729014dd46cfaae4ca5584424e2ddaba99 |