Skip to main content

No project description provided

Project description

IgLM

Official repository for IgLM: Generative Language Modeling for Antibody Design

The code and pre-trained models from this work are made available for non-commercial use under the terms of the JHU Academic Software License Agreement.

Setup

To use IgLM, install via pip:

pip install iglm

Alternatively, you can clone this repository and install the package locally:

Command line usage

IgLM supports sequence infilling, sequence generation (with prompting), and sequence evaluation from the command line.

Re-design spans of an antibody sequence

To use IgLM to re-design spans of an antibody sequence, supply the fasta file, the fasta record ID corresponding to the sequence to design, the start index of the span (0-indexed), and the end index of the span (0-indexed, exclusive).

To generate 100 unique sequences of the anti-tissue factor antibody (1JPT) heavy chain with an IgLM-designed CDR3:

iglm_infill data/antibodies/1jpt/1jpt.fasta :H 98 106 --chain_token [HEAVY] --species_token [HUMAN] --num_seqs 100 

Full antibody sequence generation

IgLM can be used to generate full antibody sequences while conditioning on the chain type and species-of-origin. See Appendix A.5 for starting tokens and sampling temperatures used for the results in the paper.

To generate 100 unique human heavy chain sequences starting with EVQ:

iglm_generate --prompt_sequence EVQ --chain_token [HEAVY] --species_token [HUMAN] --num_seqs 100 

To generate 100 unique nanobody sequences starting with QVQ:

iglm_generate --prompt_sequence QVQ --chain_token [HEAVY] --species_token [CAMEL] --num_seqs 100 

Sequence evaluation

IgLM can be used to calculate the log likelihood of a sequence given a chain type and species-of-origin.

Full sequence log likelihood calculation:

iglm_evaluate data/antibodies/1jpt/1jpt.fasta :H --chain_token [HEAVY] --species_token [HUMAN]

Infilled sequence log likelihood calculation:

iglm_evaluate data/antibodies/1jpt/1jpt.fasta :H --start 98 --end 106 --chain_token [HEAVY] --species_token [HUMAN]

Package usage

IgLM may also be used as a Python package, enabling the above use cases and more flexible usage.

Re-design spans of an antibody sequence

To use IgLM to re-design spans of an antibody sequence, supply the fasta file, the fasta record ID corresponding to the sequence to design, the start index of the span (0-indexed), and the end index of the span (0-indexed, exclusive).

To generate 100 unique sequences of the anti-tissue factor antibody (1JPT) heavy chain with an IgLM-designed CDR3:

from iglm import IgLM

iglm = IgLM()

parent_sequence = "EVQLVESGGGLVQPGGSLRLSCAASGFNIKEYYMHWVRQAPGKGLEWVGLIDPEQGNTIYDPKFQDRATISADNSKNTAYLQMNSLRAEDTAVYYCARDTAAYFDYWGQGTLVTVS"
chain_token = "[HEAVY]"
species_token = "[HUMAN]"
infill_range = (98, 106)
num_seqs = 100

generated_seqs = iglm.infill(
    parent_sequence,
    chain_token,
    species_token,
    infill_range=infill_range,
    num_to_generate=num_seqs,
)

Full antibody sequence generation

IgLM can be used to generate full antibody sequences while conditioning on the chain type and species-of-origin. See Appendix A.5 for starting tokens and sampling temperatures used for the results in the paper.

To generate 100 unique human heavy chain sequences starting with EVQ:

from iglm import IgLM

iglm = IgLM()

prompt_sequence = "EVQ"
chain_token = "[HEAVY]"
species_token = "[HUMAN]"
num_seqs = 100

generated_seqs = iglm.generate(
    chain_token,
    species_token,
    prompt_sequence=prompt_sequence,
    num_to_generate=num_seqs,
)

To generate 100 unique nanobody sequences starting with QVQ:

from iglm import IgLM

iglm = IgLM()

prompt_sequence = "QVQ"
chain_token = "[HEAVY]"
species_token = "[CAMEL]"
num_seqs = 100

generated_seqs = iglm.generate(
    chain_token,
    species_token,
    prompt_sequence=prompt_sequence,
    num_to_generate=num_seqs,
)

Sequence evaluation

IgLM can be used to calculate the log likelihood of a sequence given a chain type and species-of-origin.

Full sequence log likelihood calculation:

import math
from iglm import IgLM

iglm = IgLM()

sequence = "EVQLVESGGGLVQPGGSLRLSCAASGFNIKEYYMHWVRQAPGKGLEWVGLIDPEQGNTIYDPKFQDRATISADNSKNTAYLQMNSLRAEDTAVYYCARDTAAYFDYWGQGTLVTVS"
chain_token = "[HEAVY]"
species_token = "[HUMAN]"

log_likelihood = iglm.log_likelihood(
    sequence,
    chain_token,
    species_token,
    infill_range=infill_range,
)
perplexity = math.exp(-ll)

Infilled sequence log likelihood calculation:

import math
from iglm import IgLM

iglm = IgLM()

sequence = "EVQLVESGGGLVQPGGSLRLSCAASGFNIKEYYMHWVRQAPGKGLEWVGLIDPEQGNTIYDPKFQDRATISADNSKNTAYLQMNSLRAEDTAVYYCARDTAAYFDYWGQGTLVTVS"
chain_token = "[HEAVY]"
species_token = "[HUMAN]"
infill_range = (98, 106)

log_likelihood = iglm.log_likelihood(
    sequence,
    chain_token,
    species_token,
    infill_range=infill_range,
)
perplexity = math.exp(-ll)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

iglm-0.0.1.tar.gz (48.0 MB view details)

Uploaded Source

Built Distribution

iglm-0.0.1-py3-none-any.whl (48.0 MB view details)

Uploaded Python 3

File details

Details for the file iglm-0.0.1.tar.gz.

File metadata

  • Download URL: iglm-0.0.1.tar.gz
  • Upload date:
  • Size: 48.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.4

File hashes

Hashes for iglm-0.0.1.tar.gz
Algorithm Hash digest
SHA256 5e7cdef8f5d981cf8a229ea9f17506a2271cd25066dfa216e50cb543f175e49c
MD5 d42b25fc9644827523cdc9a31122121a
BLAKE2b-256 9af022744ed90e26368bb5b25ba06d0a9f01ceaa1b55af9af883301696589c27

See more details on using hashes here.

File details

Details for the file iglm-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: iglm-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 48.0 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.4

File hashes

Hashes for iglm-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 eb1cb74c2d9ab1342358bef2d3d3b511cf5d8813177ddca5b413397daa825790
MD5 38dd03d3d958e45b5efb5f942d67b8d9
BLAKE2b-256 448f555f06c2e4e5afcbd3776e3d276a02ef69583cce0027a3e76c5fb9ee9859

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page