Wrapper on top of ESM/Protbert model in order to easily work with protein embedding
Project description
Description
bio-transformers is a wrapper on top of the ESM/Protbert model, trained on millions on proteins and used to predict embeddings. This package provide other functionalities (like compute the loglikelihood of a protein) or compute embeddings on multiple-gpu.
Installation
It is recommended to work with conda environnements in order to manage the specific dependencies of the package.
conda create --name bio-transformers python=3.7 -y
conda activate bio-transformers
pip install bio-transformers
How it works
The main class BioTranformers
allow the developper to use Protbert and ESM backend
from biotransformers import BioTransformers
BioTransformers.list_backend()
Embeddings
Choose a backend and pass a list of sequences of Amino acids to compute the embeddings.
By default, the compute_embeddings
function return the <CLS>
token embedding.
You can add a pooling_list
in addition , so you can compute the mean of the tokens embeddings.
from biotransformers import BioTransformers
sequences = [
"MKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLAGG",
"KALTARQQEVFDLIRDHISQTGMPPTRAEIAQRLGFRSPNAAEEHLKALARKGVIEIVSGASRGIRLLQEE",
]
bio_trans = BioTransformers(model_dir="Rostlab/prot_bert")
embeddings = bio_trans.compute_embeddings(sequences, pooling_list=['mean'])
cls_emb = embeddings['cls']
mean_emb = embeddings['mean']
Loglikelihood
Choose a backend and pass a list of sequences of Amino acids to compute the Loglikelihood.
from biotransformers import BioTransformers
sequences = [
"MKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLAGG",
"KALTARQQEVFDLIRDHISQTGMPPTRAEIAQRLGFRSPNAAEEHLKALARKGVIEIVSGASRGIRLLQEE",
]
bio_trans = BioTransformers(model_dir="Rostlab/prot_bert")
loglikelihood = bio_trans.compute_loglikelihood(sequences)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file bio-transformers-0.0.2.tar.gz
.
File metadata
- Download URL: bio-transformers-0.0.2.tar.gz
- Upload date:
- Size: 11.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/3.10.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.7.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 591e589446c9b8054372c06e00912d8675bd4e69418fbcf8f037e97caf7a925e |
|
MD5 | dadf3372df49b7a75c9ca12aac5413ed |
|
BLAKE2b-256 | f5f1bcd2171b99b120c783855a0d16d233ab720ee611676544bc1a9a14f23dc5 |
File details
Details for the file bio_transformers-0.0.2-py3-none-any.whl
.
File metadata
- Download URL: bio_transformers-0.0.2-py3-none-any.whl
- Upload date:
- Size: 18.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/3.10.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.7.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c50bcf162c6e541a8038f99db050be4af89e3e356efc2b439145094b88a2e3b1 |
|
MD5 | 5c043733496ca69c9da0864e84efd385 |
|
BLAKE2b-256 | 3c86dea4bcf7cc95c5d118a3ea653b76f3c9497436b29e56491f13842ff16aed |