No project description provided
Project description
IgLM
Official repository for IgLM: Generative Language Modeling for Antibody Design
The code and pre-trained models from this work are made available for non-commercial use under the terms of the JHU Academic Software License Agreement.
Setup
To use IgLM, install via pip:
pip install iglm
Alternatively, you can clone this repository and install the package locally:
$ git clone git@github.com:Graylab/IgLM.git
$ pip install IgFold
Command line usage
IgLM supports sequence infilling, sequence generation (with prompting), and sequence evaluation from the command line.
Re-design spans of an antibody sequence
To use IgLM to re-design spans of an antibody sequence, supply the fasta file, the fasta record ID corresponding to the sequence to design, the start index of the span (0-indexed), and the end index of the span (0-indexed, exclusive).
To generate 100 unique sequences of the anti-tissue factor antibody (1JPT) heavy chain with an IgLM-designed CDR3:
iglm_infill data/antibodies/1jpt/1jpt.fasta :H 98 106 --chain_token [HEAVY] --species_token [HUMAN] --num_seqs 100
Full antibody sequence generation
IgLM can be used to generate full antibody sequences while conditioning on the chain type and species-of-origin. See Appendix A.5 for starting tokens and sampling temperatures used for the results in the paper.
To generate 100 unique human heavy chain sequences starting with EVQ:
iglm_generate --prompt_sequence EVQ --chain_token [HEAVY] --species_token [HUMAN] --num_seqs 100
To generate 100 unique nanobody sequences starting with QVQ:
iglm_generate --prompt_sequence QVQ --chain_token [HEAVY] --species_token [CAMEL] --num_seqs 100
Sequence evaluation
IgLM can be used to calculate the log likelihood of a sequence given a chain type and species-of-origin.
Full sequence log likelihood calculation:
iglm_evaluate data/antibodies/1jpt/1jpt.fasta :H --chain_token [HEAVY] --species_token [HUMAN]
Infilled sequence log likelihood calculation:
iglm_evaluate data/antibodies/1jpt/1jpt.fasta :H --start 98 --end 106 --chain_token [HEAVY] --species_token [HUMAN]
Package usage
IgLM may also be used as a Python package, enabling the above use cases and more flexible usage.
Re-design spans of an antibody sequence
To use IgLM to re-design spans of an antibody sequence, supply the fasta file, the fasta record ID corresponding to the sequence to design, the start index of the span (0-indexed), and the end index of the span (0-indexed, exclusive).
To generate 100 unique sequences of the anti-tissue factor antibody (1JPT) heavy chain with an IgLM-designed CDR3:
from iglm import IgLM
iglm = IgLM()
parent_sequence = "EVQLVESGGGLVQPGGSLRLSCAASGFNIKEYYMHWVRQAPGKGLEWVGLIDPEQGNTIYDPKFQDRATISADNSKNTAYLQMNSLRAEDTAVYYCARDTAAYFDYWGQGTLVTVS"
chain_token = "[HEAVY]"
species_token = "[HUMAN]"
infill_range = (98, 106)
num_seqs = 100
generated_seqs = iglm.infill(
parent_sequence,
chain_token,
species_token,
infill_range=infill_range,
num_to_generate=num_seqs,
)
Full antibody sequence generation
IgLM can be used to generate full antibody sequences while conditioning on the chain type and species-of-origin. See Appendix A.5 for starting tokens and sampling temperatures used for the results in the paper.
To generate 100 unique human heavy chain sequences starting with EVQ:
from iglm import IgLM
iglm = IgLM()
prompt_sequence = "EVQ"
chain_token = "[HEAVY]"
species_token = "[HUMAN]"
num_seqs = 100
generated_seqs = iglm.generate(
chain_token,
species_token,
prompt_sequence=prompt_sequence,
num_to_generate=num_seqs,
)
To generate 100 unique nanobody sequences starting with QVQ:
from iglm import IgLM
iglm = IgLM()
prompt_sequence = "QVQ"
chain_token = "[HEAVY]"
species_token = "[CAMEL]"
num_seqs = 100
generated_seqs = iglm.generate(
chain_token,
species_token,
prompt_sequence=prompt_sequence,
num_to_generate=num_seqs,
)
Sequence evaluation
IgLM can be used to calculate the log likelihood of a sequence given a chain type and species-of-origin.
Full sequence log likelihood calculation:
import math
from iglm import IgLM
iglm = IgLM()
sequence = "EVQLVESGGGLVQPGGSLRLSCAASGFNIKEYYMHWVRQAPGKGLEWVGLIDPEQGNTIYDPKFQDRATISADNSKNTAYLQMNSLRAEDTAVYYCARDTAAYFDYWGQGTLVTVS"
chain_token = "[HEAVY]"
species_token = "[HUMAN]"
log_likelihood = iglm.log_likelihood(
sequence,
chain_token,
species_token,
infill_range=infill_range,
)
perplexity = math.exp(-log_likelihood)
Infilled sequence log likelihood calculation:
import math
from iglm import IgLM
iglm = IgLM()
sequence = "EVQLVESGGGLVQPGGSLRLSCAASGFNIKEYYMHWVRQAPGKGLEWVGLIDPEQGNTIYDPKFQDRATISADNSKNTAYLQMNSLRAEDTAVYYCARDTAAYFDYWGQGTLVTVS"
chain_token = "[HEAVY]"
species_token = "[HUMAN]"
infill_range = (98, 106)
log_likelihood = iglm.log_likelihood(
sequence,
chain_token,
species_token,
infill_range=infill_range,
)
perplexity = math.exp(-log_likelihood)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file iglm-0.1.0.tar.gz
.
File metadata
- Download URL: iglm-0.1.0.tar.gz
- Upload date:
- Size: 53.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b6fde5ae285ff96b3e972791e6d5f1abf4d8739c84bca8c4fc764168cc65f361 |
|
MD5 | 6278f0665066888766245badf61e9a39 |
|
BLAKE2b-256 | e4aa2058efb6b5205c6c80490ffdefe141a1652afa2855f2e6a7877898d6dd7e |
File details
Details for the file iglm-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: iglm-0.1.0-py3-none-any.whl
- Upload date:
- Size: 53.4 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6bb7b65c531583b10d3723dbfbc06e95b2cb30e3cc8cbf2fbc7eb22945e1c2be |
|
MD5 | 11ed0cd595d447236a76a709b42b2544 |
|
BLAKE2b-256 | 457bab3b58795bd34e686bb25fc1449de499e4cb34cd701b4ec291f31519d4ec |