Skip to main content

No project description provided

Project description

AntiBERTy

Official repository for AntiBERTy, an antibody-specific transformer language model pre-trained on 558M natural antibody sequences, as described in Deciphering antibody affinity maturation with language models and weakly supervised learning.

Setup

To use AntiBERTy, install via pip:

pip install antiberty

Alternatively, you can clone this repository and install the package locally:

$ git clone git@github.com:jeffreyruffolo/AntiBERTy.git 
$ pip install AntiBERTy

Usage

Embeddings

To use AntiBERTy to generate sequence embeddings, use the embed function. The output is a list of embedding tensors, where each tensor is the embedding for the corresponding sequence. Each embedding has dimension [(Length + 2) x 512].

from antiberty import AntiBERTyRunner

antiberty = AntiBERTyRunner()

sequences = [
    "EVQLVQSGPEVKKPGTSVKVSCKASGFTFMSSAVQWVRQARGQRLEWIGWIVIGSGNTNYAQKFQERVTITRDMSTSTAYMELSSLRSEDTAVYYCAAPYCSSISCNDGFDIWGQGTMVTVS",
    "DVVMTQTPFSLPVSLGDQASISCRSSQSLVHSNGNTYLHWYLQKPGQSPKLLIYKVSNRFSGVPDRFSGSGSGTDFTLKISRVEAEDLGVYFCSQSTHVPYTFGGGTKLEIK",
]
embeddings = antiberty.embed(sequences)

To access the attention matrices, pass the return_attention flag to the embed function. The output is a list of attention matrices, where each matrix is the attention matrix for the corresponding sequence. Each attention matrix has dimension [Layer x Heads x (Length + 2) x (Length + 2)].

from antiberty import AntiBERTyRunner

antiberty = AntiBERTyRunner()

sequences = [
    "EVQLVQSGPEVKKPGTSVKVSCKASGFTFMSSAVQWVRQARGQRLEWIGWIVIGSGNTNYAQKFQERVTITRDMSTSTAYMELSSLRSEDTAVYYCAAPYCSSISCNDGFDIWGQGTMVTVS",
    "DVVMTQTPFSLPVSLGDQASISCRSSQSLVHSNGNTYLHWYLQKPGQSPKLLIYKVSNRFSGVPDRFSGSGSGTDFTLKISRVEAEDLGVYFCSQSTHVPYTFGGGTKLEIK",
]
embeddings, attentions = antiberty.embed(sequences, return_attention=True)

The embed function can also be used with masked sequences. Masked residues should be indicated with underscores.

Classification

To use AntiBERTy to predict the species and chain type of sequences, use the classify function. The output is two lists of classifications for each sequences.

from antiberty import AntiBERTyRunner

antiberty = AntiBERTyRunner()

sequences = [
    "EVQLVQSGPEVKKPGTSVKVSCKASGFTFMSSAVQWVRQARGQRLEWIGWIVIGSGNTNYAQKFQERVTITRDMSTSTAYMELSSLRSEDTAVYYCAAPYCSSISCNDGFDIWGQGTMVTVS",
    "DVVMTQTPFSLPVSLGDQASISCRSSQSLVHSNGNTYLHWYLQKPGQSPKLLIYKVSNRFSGVPDRFSGSGSGTDFTLKISRVEAEDLGVYFCSQSTHVPYTFGGGTKLEIK",
]
species_preds, chain_preds = antiberty.classify(sequences)

The classify function can also be used with masked sequences. Masked residues should be indicated with underscores.

Mask prediction

To use AntiBERTy to predict the identity of masked residues, use the fill_masks function. Masked residues should be indicated with underscores. The output is a list of filled sequences, corresponding to the input masked sequences.

from antiberty import AntiBERTyRunner

antiberty = AntiBERTyRunner()

sequences = [
    "____VQSGPEVKKPGTSVKVSCKASGFTFMSSAVQWVRQARGQRLEWIGWIVIGSGN_NYAQKFQERVTITRDM__STAYMELSSLRSEDTAVYYCAAPYCSSISCNDGFD____GTMVTVS",
    "DVVMTQTPFSLPV__GDQASISCRSSQSLVHSNGNTY_HWYLQKPGQSPKLLIYKVSNRFSGVPDRFSG_GSGTDFTLKISRVEAEDLGVYFCSQSTHVPYTFGG__KLEIK",
]
filled_sequences = antiberty.fill_masks(sequences)

Pseudo log-likelihood

To use AntiBERTy to calculate the pseudo log-likelihood of a sequence, use the pseudo_log_likelihood function. The pseudo log-likelihood of a sequence is calculated as the average of per-residue masked log-likelihoods. The output is a list of pseudo log-likelihoods, corresponding to the input sequences.

from antiberty import AntiBERTyRunner

antiberty = AntiBERTyRunner()

sequences = [
    "EVQLVQSGPEVKKPGTSVKVSCKASGFTFMSSAVQWVRQARGQRLEWIGWIVIGSGNTNYAQKFQERVTITRDMSTSTAYMELSSLRSEDTAVYYCAAPYCSSISCNDGFDIWGQGTMVTVS",
    "DVVMTQSSTPFSLPVSLGDQASISCRSSQSLVHSNGNTYLHWYLQKPGQSPKLLIYKVSNRFSGVPDRFSGSGSGTDFTLKISRVEAEDLGVYFCSQSTHVPYTFGGGTKLEIK",
]

pll = antiberty.pseudo_log_likelihood(sequences, batch_size=16)

Citing this work

@article{ruffolo2021deciphering,
    title = {Deciphering antibody affinity maturation with language models and weakly supervised learning},
    author = {Ruffolo, Jeffrey A and Gray, Jeffrey J and Sulam, Jeremias},
    journal = {arXiv preprint arXiv:2112.07782},
    year= {2021}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

antiberty-0.1.3.tar.gz (96.6 MB view details)

Uploaded Source

Built Distribution

antiberty-0.1.3-py3-none-any.whl (96.6 MB view details)

Uploaded Python 3

File details

Details for the file antiberty-0.1.3.tar.gz.

File metadata

  • Download URL: antiberty-0.1.3.tar.gz
  • Upload date:
  • Size: 96.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.7.13

File hashes

Hashes for antiberty-0.1.3.tar.gz
Algorithm Hash digest
SHA256 899a401e8b0ef9586d27713b4867aa26149ec0b63387d0be55164f458b6c3bad
MD5 d3f2c92a3d79f5395f6faab5569c3f02
BLAKE2b-256 a13b2cf48ec21956252fdc5c5dd1b7f8bb8b12f5208bd3eaaad412ced3ed0ff5

See more details on using hashes here.

File details

Details for the file antiberty-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: antiberty-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 96.6 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.7.13

File hashes

Hashes for antiberty-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 30d910992b190013871bac49cdc032e01a19339f7d2b958ab99b0eb44638352a
MD5 d2c4ad0cd64116b2ffa38736ebe83356
BLAKE2b-256 9769ef028f0b04dde139c4656ea81b398fd238800c770c372ad4ffb780eec973

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page