No project description provided

Project description

IgLM

Official repository for IgLM: Generative Language Modeling for Antibody Design

The code and pre-trained models from this work are made available for non-commercial use under the terms of the JHU Academic Software License Agreement.

Setup

To use IgLM, install via pip:

pip install iglm

Alternatively, you can clone this repository and install the package locally:

$ git clone git@github.com:Graylab/IgLM.git 
$ pip install IgFold

Command line usage

IgLM supports sequence infilling, sequence generation (with prompting), and sequence evaluation from the command line.

Re-design spans of an antibody sequence

To use IgLM to re-design spans of an antibody sequence, supply the fasta file, the fasta record ID corresponding to the sequence to design, the start index of the span (0-indexed), and the end index of the span (0-indexed, exclusive).

To generate 100 unique sequences of the anti-tissue factor antibody (1JPT) heavy chain with an IgLM-designed CDR3:

iglm_infill data/antibodies/1jpt/1jpt.fasta :H 98 106 --chain_token [HEAVY] --species_token [HUMAN] --num_seqs 100

Full antibody sequence generation

IgLM can be used to generate full antibody sequences while conditioning on the chain type and species-of-origin. See Appendix A.5 for starting tokens and sampling temperatures used for the results in the paper.

To generate 100 unique human heavy chain sequences starting with EVQ:

iglm_generate --prompt_sequence EVQ --chain_token [HEAVY] --species_token [HUMAN] --num_seqs 100

To generate 100 unique nanobody sequences starting with QVQ:

iglm_generate --prompt_sequence QVQ --chain_token [HEAVY] --species_token [CAMEL] --num_seqs 100

Sequence evaluation

IgLM can be used to calculate the log likelihood of a sequence given a chain type and species-of-origin.

Full sequence log likelihood calculation:

iglm_evaluate data/antibodies/1jpt/1jpt.fasta :H --chain_token [HEAVY] --species_token [HUMAN]

Infilled sequence log likelihood calculation:

iglm_evaluate data/antibodies/1jpt/1jpt.fasta :H --start 98 --end 106 --chain_token [HEAVY] --species_token [HUMAN]

Package usage

IgLM may also be used as a Python package, enabling the above use cases and more flexible usage.

Re-design spans of an antibody sequence

To generate 100 unique sequences of the anti-tissue factor antibody (1JPT) heavy chain with an IgLM-designed CDR3:

from iglm import IgLM

iglm = IgLM()

parent_sequence = "EVQLVESGGGLVQPGGSLRLSCAASGFNIKEYYMHWVRQAPGKGLEWVGLIDPEQGNTIYDPKFQDRATISADNSKNTAYLQMNSLRAEDTAVYYCARDTAAYFDYWGQGTLVTVS"
chain_token = "[HEAVY]"
species_token = "[HUMAN]"
infill_range = (98, 106)
num_seqs = 100

generated_seqs = iglm.infill(
    parent_sequence,
    chain_token,
    species_token,
    infill_range=infill_range,
    num_to_generate=num_seqs,
)

Full antibody sequence generation

To generate 100 unique human heavy chain sequences starting with EVQ:

from iglm import IgLM

iglm = IgLM()

prompt_sequence = "EVQ"
chain_token = "[HEAVY]"
species_token = "[HUMAN]"
num_seqs = 100

generated_seqs = iglm.generate(
    chain_token,
    species_token,
    prompt_sequence=prompt_sequence,
    num_to_generate=num_seqs,
)

To generate 100 unique nanobody sequences starting with QVQ:

from iglm import IgLM

iglm = IgLM()

prompt_sequence = "QVQ"
chain_token = "[HEAVY]"
species_token = "[CAMEL]"
num_seqs = 100

generated_seqs = iglm.generate(
    chain_token,
    species_token,
    prompt_sequence=prompt_sequence,
    num_to_generate=num_seqs,
)

Sequence evaluation

IgLM can be used to calculate the log likelihood of a sequence given a chain type and species-of-origin.

Full sequence log likelihood calculation:

import math
from iglm import IgLM

iglm = IgLM()

sequence = "EVQLVESGGGLVQPGGSLRLSCAASGFNIKEYYMHWVRQAPGKGLEWVGLIDPEQGNTIYDPKFQDRATISADNSKNTAYLQMNSLRAEDTAVYYCARDTAAYFDYWGQGTLVTVS"
chain_token = "[HEAVY]"
species_token = "[HUMAN]"

log_likelihood = iglm.log_likelihood(
    sequence,
    chain_token,
    species_token,
    infill_range=infill_range,
)
perplexity = math.exp(-log_likelihood)

Infilled sequence log likelihood calculation:

import math
from iglm import IgLM

iglm = IgLM()

sequence = "EVQLVESGGGLVQPGGSLRLSCAASGFNIKEYYMHWVRQAPGKGLEWVGLIDPEQGNTIYDPKFQDRATISADNSKNTAYLQMNSLRAEDTAVYYCARDTAAYFDYWGQGTLVTVS"
chain_token = "[HEAVY]"
species_token = "[HUMAN]"
infill_range = (98, 106)

log_likelihood = iglm.log_likelihood(
    sequence,
    chain_token,
    species_token,
    infill_range=infill_range,
)
perplexity = math.exp(-log_likelihood)

Project details

Release history Release notifications | RSS feed

This version

0.1.0

Feb 17, 2023

0.0.2

Oct 30, 2022

0.0.1

Oct 29, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

iglm-0.1.0.tar.gz (53.3 MB view details)

Uploaded Feb 17, 2023 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

iglm-0.1.0-py3-none-any.whl (53.4 MB view details)

Uploaded Feb 17, 2023 Python 3

File details

Details for the file iglm-0.1.0.tar.gz.

File metadata

Download URL: iglm-0.1.0.tar.gz
Upload date: Feb 17, 2023
Size: 53.3 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.10.4

File hashes

Hashes for iglm-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`b6fde5ae285ff96b3e972791e6d5f1abf4d8739c84bca8c4fc764168cc65f361`
MD5	`6278f0665066888766245badf61e9a39`
BLAKE2b-256	`e4aa2058efb6b5205c6c80490ffdefe141a1652afa2855f2e6a7877898d6dd7e`

See more details on using hashes here.

File details

Details for the file iglm-0.1.0-py3-none-any.whl.

File metadata

Download URL: iglm-0.1.0-py3-none-any.whl
Upload date: Feb 17, 2023
Size: 53.4 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.10.4

File hashes

Hashes for iglm-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6bb7b65c531583b10d3723dbfbc06e95b2cb30e3cc8cbf2fbc7eb22945e1c2be`
MD5	`11ed0cd595d447236a76a709b42b2544`
BLAKE2b-256	`457bab3b58795bd34e686bb25fc1449de499e4cb34cd701b4ec291f31519d4ec`

See more details on using hashes here.

iglm 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project description

IgLM

Setup

Command line usage

Re-design spans of an antibody sequence

Full antibody sequence generation

Sequence evaluation

Package usage

Re-design spans of an antibody sequence

Full antibody sequence generation

Sequence evaluation

Project details

Verified details

Maintainers

Unverified details

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes