Skip to main content

ncRNA language model

Project description

ncRNABert: Deciphering the landscape of non-coding RNA using language model

PyPI - Version PyPI - Python Version GitHub - LICENSE PyPI - Downloads Wheel build

Model details

Model # of parameters # of hidden size Pretraining dataset # of ncRNAs Model download
ncRNABert 303M 1024 RNAcentral 26M Download
ncRNABert 303M 1024 RNAcentral + nt - Download

Install

As a prerequisite, you must have PyTorch installed to use this repository.

You can use this one-liner for installation, using the latest release version

# latest version
pip install git+https://github.com/wangleiofficial/ncRNABert

# stable version
pip install ncRNABert

Usage

ncRNA sequence embedding

from ncRNABert.pretrain import load_ncRNABert, load_ncRNABert_ex
from ncRNABert.utils import BatchConverter
import torch

data = [
    ("ncRNA1", "ACGGAGGATGCGAGCGTTATCCGGATTTACTGGGCG"),
    ("ncRNA2", "AGGTTTTTAATCTAATTAAGATAGTTGA"),
]

ids, batch_token, lengths = BatchConverter(data)
model = load_ncRNABert()
model_ex = load_ncRNABert_ex()
with torch.no_grad():
    results = model(batch_token, lengths, repr_layers=[24])
    results_ex = model_ex(batch_token, lengths, repr_layers=[24])
# Generate per-sequence representations via averaging
token_representations = results["representations"][24]
token_representations_ex = results_ex["representations"][24]
sequence_representations = []
sequence_representations_ex = []
batch_lens = [len(item[1]) for item in data]
for i, tokens_len in enumerate(batch_lens):
    sequence_representations.append(token_representations[i, 1 : tokens_len - 1].mean(0))
    sequence_representations_ex.append(token_representations_ex[i, 1 : tokens_len - 1].mean(0))

License

This source code is licensed under the Apache-2.0 license found in the LICENSE file in the root directory of this source tree.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ncRNABert-0.1.3.tar.gz (10.3 kB view details)

Uploaded Source

Built Distribution

ncRNABert-0.1.3-py3-none-any.whl (11.2 kB view details)

Uploaded Python 3

File details

Details for the file ncRNABert-0.1.3.tar.gz.

File metadata

  • Download URL: ncRNABert-0.1.3.tar.gz
  • Upload date:
  • Size: 10.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for ncRNABert-0.1.3.tar.gz
Algorithm Hash digest
SHA256 4e8ddd4fbecd10729714be106ad298c0f1ce5bffa4dc150a7ead5ee4cf815d9f
MD5 930fd0f23d7b8deb18ae98e1990eefaa
BLAKE2b-256 b8d8ab03d223cdc63ca6a70ea931cbf93c5af922f339a980f70b0407e1d509bd

See more details on using hashes here.

File details

Details for the file ncRNABert-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: ncRNABert-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 11.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for ncRNABert-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 53a80128bbfe4d648f54dd675d99c8a3f80c530eed3679095f821fafc679d37c
MD5 3a74bce477540fbef570049738ce404c
BLAKE2b-256 9ddf3894d272a507d58226990fdceb729014dd46cfaae4ca5584424e2ddaba99

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page