Wrapper on top of ESM/Protbert model in order to easily work with protein embedding
Project description
Description
bio-transformers is a wrapper on top of the ESM/Protbert model, trained on millions on proteins and used to predict embeddings. This package provide other functionalities (like compute the loglikelihood of a protein) or compute embeddings on multiple-gpu.
Installation
It is recommended to work with conda environnements in order to manage the specific dependencies of the package.
conda create --name bio-transformers python=3.7 -y
conda activate bio-transformers
pip install bio-transformers
How it works
The main class BioTranformers
allow the developper to use Protbert and ESM backend
from biotransformers import BioTransformers
BioTransformers.list_backend()
Embeddings
Choose a backend and pass a list of sequences of Amino acids to compute the embeddings.
By default, the compute_embeddings
function return the <CLS>
token embedding.
You can add a pooling_list
in addition , so you can compute the mean of the tokens embeddings.
from biotransformers import BioTransformers
sequences = [
"MKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLAGG",
"KALTARQQEVFDLIRDHISQTGMPPTRAEIAQRLGFRSPNAAEEHLKALARKGVIEIVSGASRGIRLLQEE",
]
bio_trans = BioTransformers(model_dir="Rostlab/prot_bert")
embeddings = bio_trans.compute_embeddings(sequences, pooling_list=['mean'])
cls_emb = embeddings['cls']
mean_emb = embeddings['mean']
Loglikelihood
Choose a backend and pass a list of sequences of Amino acids to compute the Loglikelihood.
from biotransformers import BioTransformers
sequences = [
"MKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLAGG",
"KALTARQQEVFDLIRDHISQTGMPPTRAEIAQRLGFRSPNAAEEHLKALARKGVIEIVSGASRGIRLLQEE",
]
bio_trans = BioTransformers(model_dir="Rostlab/prot_bert")
loglikelihood = bio_trans.compute_loglikelihood(sequences)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for bio_transformers-0.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5ec2ee2c370afa69ce0a6e4b495f3d066ec649710d5791b1d82d34198ae64223 |
|
MD5 | 505e8c4f25625871ca10b1e54b86f593 |
|
BLAKE2b-256 | e0f024ae3c7f320368ff726db651ddc419d0c04a6a1119ac2bce58e09053341e |