Skip to main content

Wrapper on top of ESM/Protbert model in order to easily work with protein embedding

Project description

Description

bio-transformers is a wrapper on top of the ESM/Protbert model, trained on millions on proteins and used to predict embeddings. This package provide other functionalities (like compute the loglikelihood of a protein) or compute embeddings on multiple-gpu.

Installation

It is recommended to work with conda environnements in order to manage the specific dependencies of the package.

  conda create --name bio-transformers python=3.7 -y 
  conda activate bio-transformers
  pip install bio-transformers

How it works

The main class BioTranformers allow the developper to use Protbert and ESM backend

from biotransformers import BioTransformers
BioTransformers.list_backend()

Embeddings

Choose a backend and pass a list of sequences of Amino acids to compute the embeddings. By default, the compute_embeddings function return the <CLS> token embedding. You can add a pooling_list in addition , so you can compute the mean of the tokens embeddings.

from biotransformers import BioTransformers

sequences = [
        "MKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLAGG",
        "KALTARQQEVFDLIRDHISQTGMPPTRAEIAQRLGFRSPNAAEEHLKALARKGVIEIVSGASRGIRLLQEE",
    ]

bio_trans = BioTransformers(model_dir="Rostlab/prot_bert")
embeddings = bio_trans.compute_embeddings(sequences, pooling_list=['mean'])

cls_emb = embeddings['cls']
mean_emb = embeddings['mean']

Loglikelihood

Choose a backend and pass a list of sequences of Amino acids to compute the Loglikelihood.

from biotransformers import BioTransformers

sequences = [
        "MKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLAGG",
        "KALTARQQEVFDLIRDHISQTGMPPTRAEIAQRLGFRSPNAAEEHLKALARKGVIEIVSGASRGIRLLQEE",
    ]

bio_trans = BioTransformers(model_dir="Rostlab/prot_bert")
loglikelihood = bio_trans.compute_loglikelihood(sequences)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bio-transformers-0.0.2.tar.gz (11.4 kB view details)

Uploaded Source

Built Distribution

bio_transformers-0.0.2-py3-none-any.whl (18.8 kB view details)

Uploaded Python 3

File details

Details for the file bio-transformers-0.0.2.tar.gz.

File metadata

  • Download URL: bio-transformers-0.0.2.tar.gz
  • Upload date:
  • Size: 11.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.7.10

File hashes

Hashes for bio-transformers-0.0.2.tar.gz
Algorithm Hash digest
SHA256 591e589446c9b8054372c06e00912d8675bd4e69418fbcf8f037e97caf7a925e
MD5 dadf3372df49b7a75c9ca12aac5413ed
BLAKE2b-256 f5f1bcd2171b99b120c783855a0d16d233ab720ee611676544bc1a9a14f23dc5

See more details on using hashes here.

File details

Details for the file bio_transformers-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: bio_transformers-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 18.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.7.10

File hashes

Hashes for bio_transformers-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 c50bcf162c6e541a8038f99db050be4af89e3e356efc2b439145094b88a2e3b1
MD5 5c043733496ca69c9da0864e84efd385
BLAKE2b-256 3c86dea4bcf7cc95c5d118a3ea653b76f3c9497436b29e56491f13842ff16aed

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page