Wrapper on top of ESM/Protbert model in order to easily work with protein embedding
Project description
Description
bio-transformers is a wrapper on top of the ESM/Protbert model, trained on millions on proteins and used to predict embeddings. This package provide other functionalities (like compute the loglikelihood of a protein) or compute embeddings on multiple-gpu.
Installation
It is recommended to work with conda environnements in order to manage the specific dependencies of the package.
conda create --name bio-transformers python=3.7 -y
conda activate bio-transformers
pip install bio-transformers
How it works
The main class BioTranformers
allow the developper to use Protbert and ESM backend
from biotransformers import BioTransformers
BioTransformers.list_backend()
Embeddings
Choose a backend and pass a list of sequences of Amino acids to compute the embeddings.
By default, the compute_embeddings
function return the <CLS>
token embedding.
You can add a pooling_list
in addition , so you can compute the mean of the tokens embeddings.
from biotransformers import BioTransformers
sequences = [
"MKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLAGG",
"KALTARQQEVFDLIRDHISQTGMPPTRAEIAQRLGFRSPNAAEEHLKALARKGVIEIVSGASRGIRLLQEE",
]
bio_trans = BioTransformers(model_dir="Rostlab/prot_bert")
embeddings = bio_trans.compute_embeddings(sequences, pooling_list=['mean'])
cls_emb = embeddings['cls']
mean_emb = embeddings['mean']
Loglikelihood
Choose a backend and pass a list of sequences of Amino acids to compute the Loglikelihood.
from biotransformers import BioTransformers
sequences = [
"MKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLAGG",
"KALTARQQEVFDLIRDHISQTGMPPTRAEIAQRLGFRSPNAAEEHLKALARKGVIEIVSGASRGIRLLQEE",
]
bio_trans = BioTransformers(model_dir="Rostlab/prot_bert")
loglikelihood = bio_trans.compute_loglikelihood(sequences)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for bio_transformers-0.0.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c50bcf162c6e541a8038f99db050be4af89e3e356efc2b439145094b88a2e3b1 |
|
MD5 | 5c043733496ca69c9da0864e84efd385 |
|
BLAKE2b-256 | 3c86dea4bcf7cc95c5d118a3ea653b76f3c9497436b29e56491f13842ff16aed |