No project description provided
Project description
IgFold
Official repository for IgFold: Fast, accurate antibody structure prediction from deep learning on massive set of natural antibodies.
The code, data, and weights for this work are made available for non-commercial use (including at commercial entities) under the terms of the JHU Academic Software License Agreement. For commercial inquiries, please contact jruffolo[at]jhu.edu
.
Install
For easiest use, install IgFold via PyPI:
$ pip install igfold
To access the latest version of the code, clone and install the repository:
$ git clone git@github.com:Graylab/IgFold.git
$ pip install IgFold
Note: To predict refined, full-atom antibody structures, PyRosetta should be installed following the instructions here.
Usage
Note: The first time IgFoldRunner
is initialized, it will download the pre-trained weights. This may take a few minutes and will require a network connection.
Antibody structure prediction from sequence
Paired antibody sequences can be provided as a dictionary of sequences, where the keys are chain names and the values are the sequences.
from igfold import IgFoldRunner, init_pyrosetta
init_pyrosetta()
sequences = {
"H": "EVQLVQSGPEVKKPGTSVKVSCKASGFTFMSSAVQWVRQARGQRLEWIGWIVIGSGNTNYAQKFQERVTITRDMSTSTAYMELSSLRSEDTAVYYCAAPYCSSISCNDGFDIWGQGTMVTVS",
"L": "DVVMTQTPFSLPVSLGDQASISCRSSQSLVHSNGNTYLHWYLQKPGQSPKLLIYKVSNRFSGVPDRFSGSGSGTDFTLKISRVEAEDLGVYFCSQSTHVPYTFGGGTKLEIK"
}
pred_pdb = "my_antibody.pdb"
igfold = IgFoldRunner()
igfold.fold(
pred_pdb, # Output PDB file
sequences=sequences, # Antibody sequences
do_refine=True, # Refine the antibody structure with PyRosetta
do_renum=True, # Send predicted structure to AbNum server for Chothia renumbering
)
To predict a nanobody structure (or an individual heavy or light chain), simply provide one sequence:
from igfold import IgFoldRunner, init_pyrosetta
init_pyrosetta()
sequences = {
"H": "QVQLQESGGGLVQAGGSLTLSCAVSGLTFSNYAMGWFRQAPGKEREFVAAITWDGGNTYYTDSVKGRFTISRDNAKNTVFLQMNSLKPEDTAVYYCAAKLLGSSRYELALAGYDYWGQGTQVTVS"
}
pred_pdb = "my_nanobody.pdb"
igfold = IgFoldRunner()
igfold.fold(
pred_pdb, # Output PDB file
sequences=sequences, # Nanobody sequence
do_refine=True, # Refine the antibody structure with PyRosetta
do_renum=True, # Send predicted structure to AbNum server for Chothia renumbering
)
To predict a structure without PyRosetta refinement, set do_refine=False
:
from igfold import IgFoldRunner
sequences = {
"H": "QVQLQESGGGLVQAGGSLTLSCAVSGLTFSNYAMGWFRQAPGKEREFVAAITWDGGNTYYTDSVKGRFTISRDNAKNTVFLQMNSLKPEDTAVYYCAAKLLGSSRYELALAGYDYWGQGTQVTVS"
}
pred_pdb = "my_nanobody.pdb"
igfold = IgFoldRunner()
igfold.fold(
pred_pdb, # Output PDB file
sequences=sequences, # Nanobody sequence
do_refine=False, # Refine the antibody structure with PyRosetta
do_renum=True, # Send predicted structure to AbNum server for Chothia renumbering
)
Predicted RMSD for antibody structures
RMSD estimates are calculated per-residue and recorded in the B-factor column of the output PDB file. These values are also returned from the fold
method.
from igfold import IgFoldRunner, init_pyrosetta
init_pyrosetta()
sequences = {
"H": "EVQLVQSGPEVKKPGTSVKVSCKASGFTFMSSAVQWVRQARGQRLEWIGWIVIGSGNTNYAQKFQERVTITRDMSTSTAYMELSSLRSEDTAVYYCAAPYCSSISCNDGFDIWGQGTMVTVS",
"L": "DVVMTQTPFSLPVSLGDQASISCRSSQSLVHSNGNTYLHWYLQKPGQSPKLLIYKVSNRFSGVPDRFSGSGSGTDFTLKISRVEAEDLGVYFCSQSTHVPYTFGGGTKLEIK"
}
pred_pdb = "my_antibody.pdb"
igfold = IgFoldRunner()
out = igfold.fold(
pred_pdb, # Output PDB file
sequences=sequences, # Antibody sequences
do_refine=True, # Refine the antibody structure with PyRosetta
do_renum=True, # Send predicted structure to AbNum server for Chothia renumbering
)
out.prmsd # Predicted RMSD for each residue's N, CA, C, CB atoms (dim: 1, L, 4)
Antibody sequence embedding
Features from IgFold may be useful as a feature for machine learning models. The embed
method can be used to surface a variety of antibody representations from the model:
from igfold import IgFoldRunner
sequences = {
"H": "EVQLVQSGPEVKKPGTSVKVSCKASGFTFMSSAVQWVRQARGQRLEWIGWIVIGSGNTNYAQKFQERVTITRDMSTSTAYMELSSLRSEDTAVYYCAAPYCSSISCNDGFDIWGQGTMVTVS",
"L": "DVVMTQTPFSLPVSLGDQASISCRSSQSLVHSNGNTYLHWYLQKPGQSPKLLIYKVSNRFSGVPDRFSGSGSGTDFTLKISRVEAEDLGVYFCSQSTHVPYTFGGGTKLEIK"
}
igfold = IgFoldRunner()
emb = igfold.embed(
sequences=sequences, # Antibody sequences
)
emb.bert_embs # Embeddings from AntiBERTy final hidden layer (dim: 1, L, 512)
emb.gt_embs # Embeddings after graph transformer layers (dim: 1, L, 64)
emb.strucutre_embs # Embeddings after template incorporation IPA (dim: 1, L, 64)
Bug reports
If you run into any problems while using IgFold, please create a Github issue with a description of the problem and the steps to reproduce it.
Citing this work
@article{ruffolo2021deciphering,
title = {Deciphering antibody affinity maturation with language models and weakly supervised learning},
author = {Ruffolo, Jeffrey A and Gray, Jeffrey J and Sulam, Jeremias},
journal = {arXiv preprint arXiv:2112.07782},
year= {2021}
}
@article{ruffolo2021deciphering,
title = {Fast, accurate antibody structure prediction from deep learning on massive set of natural antibodies},
author = {Ruffolo, Jeffrey A, Lee-Shin Chu, Sai Pooja Mahajan, and Gray, Jeffrey J},
journal = {bioRxiv},
year= {2022}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.