protein language model
Project description
ProtFlash: A lightweight protein language model
Install
As a prerequisite, you must have PyTorch installed to use this repository.
You can use this one-liner for installation, using the latest release version
# latest version
pip install git+https://github.com/isyslab-hust/ProtFlash
# stable version
pip install ProtFlash
Usage
from ProtFlash.pretrain import load_prot_flash_base
from ProtFlash.utils import batchConverter
data = [
("protein1", "MKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLAGG"),
("protein2", "KALTARQQEVFDLIRDHISQTGMPPTRAEIAQRLGFRSPNAAEEHLKALARKGVIEIVSGASRGIRLLQEE"),
]
ids, batch_token, lengths = batchConverter(data)
model = load_prot_flash_base()
with torch.no_grad():
token_embedding = model(batch_token, lengths)
# Generate per-sequence representations via averaging
sequence_representations = []
for i, (_, seq) in enumerate(data):
sequence_representations.append(token_embedding[i, 0: len(seq) + 1].mean(0))
License
This source code is licensed under the MIT license found in the LICENSE file in the root directory of this source tree.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
ProtFlash-0.1.0.tar.gz
(5.9 kB
view hashes)
Built Distribution
Close
Hashes for ProtFlash-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 05bf98433e9b77a241619b5742c79635d2abfd4d6b299388cd10b754b28fc893 |
|
MD5 | 94ec2a4f8138c77d829931d426aa0ed3 |
|
BLAKE2b-256 | 709b23c7c71bc3156ca1622c5c83391b2cdaa49440722fe953826da9a4504e90 |