No project description provided
Project description
IgFold
Official repository for IgFold: Fast, accurate antibody structure prediction from deep learning on massive set of natural antibodies.
The code and pre-trained models from this work are made available for non-commercial use (including at commercial entities) under the terms of the JHU Academic Software License Agreement. For commercial inquiries, please contact Johns Hopkins Tech Ventures at dmalon11@jhu.edu
.
Try antibody structure prediction in Google Colab.
Updates
Updating to IgFold v0.3.0 is strongly recommended to speed up predictions.
- Version 0.3.0
- Reduce runtime by refactoring embedding code
- Remove need for external download to access model checkpoints
- Version 0.2.3
- Remove dependence on pytorch3d.
- Fix Colab notebook.
- Version 0.2.0
- Add gradient-based refinement to resolve clashes and improve backbone geometry.
- Updated predicted structures to reduce occurrence of clashes and cis-peptides.
Install
For easiest use, create a conda environment and install IgFold via PyPI:
$ pip install igfold
To access the latest version of the code, clone and install the repository:
$ git clone git@github.com:Graylab/IgFold.git
$ pip install IgFold
Refinement
Two refinement methods are supported for IgFold predictions. To follow the manuscript, PyRosetta should be installed following the instructions here. If PyRosetta is not installed, refinement with OpenMM will be attempted. For this option, OpenMM must be installed and configured before running IgFold as follows:
$ conda install -c conda-forge openmm==7.7.0 pdbfixer
Renumbering
Antibody renumbering requires installation of AbNumber. To install AbNumber, run the following command:
$ conda install -c bioconda abnumber
Usage
Note: The first time IgFoldRunner
is initialized, it will download the pre-trained weights. This may take a few minutes and will require a network connection.
Antibody structure prediction from sequence
Paired antibody sequences can be provided as a dictionary of sequences, where the keys are chain names and the values are the sequences.
from igfold import IgFoldRunner
from igfold.refine.pyrosetta_ref import init_pyrosetta
init_pyrosetta()
sequences = {
"H": "EVQLVQSGPEVKKPGTSVKVSCKASGFTFMSSAVQWVRQARGQRLEWIGWIVIGSGNTNYAQKFQERVTITRDMSTSTAYMELSSLRSEDTAVYYCAAPYCSSISCNDGFDIWGQGTMVTVS",
"L": "DVVMTQTPFSLPVSLGDQASISCRSSQSLVHSNGNTYLHWYLQKPGQSPKLLIYKVSNRFSGVPDRFSGSGSGTDFTLKISRVEAEDLGVYFCSQSTHVPYTFGGGTKLEIK"
}
pred_pdb = "my_antibody.pdb"
igfold = IgFoldRunner()
igfold.fold(
pred_pdb, # Output PDB file
sequences=sequences, # Antibody sequences
do_refine=True, # Refine the antibody structure with PyRosetta
do_renum=True, # Renumber predicted antibody structure (Chothia)
)
To predict a nanobody structure (or an individual heavy or light chain), simply provide one sequence:
from igfold import IgFoldRunner
from igfold.refine.pyrosetta_ref import init_pyrosetta
init_pyrosetta()
sequences = {
"H": "QVQLQESGGGLVQAGGSLTLSCAVSGLTFSNYAMGWFRQAPGKEREFVAAITWDGGNTYYTDSVKGRFTISRDNAKNTVFLQMNSLKPEDTAVYYCAAKLLGSSRYELALAGYDYWGQGTQVTVS"
}
pred_pdb = "my_nanobody.pdb"
igfold = IgFoldRunner()
igfold.fold(
pred_pdb, # Output PDB file
sequences=sequences, # Nanobody sequence
do_refine=True, # Refine the antibody structure with PyRosetta
do_renum=True, # Renumber predicted antibody structure (Chothia)
)
To predict a structure without refinement, set do_refine=False
:
from igfold import IgFoldRunner
sequences = {
"H": "QVQLQESGGGLVQAGGSLTLSCAVSGLTFSNYAMGWFRQAPGKEREFVAAITWDGGNTYYTDSVKGRFTISRDNAKNTVFLQMNSLKPEDTAVYYCAAKLLGSSRYELALAGYDYWGQGTQVTVS"
}
pred_pdb = "my_nanobody.pdb"
igfold = IgFoldRunner()
igfold.fold(
pred_pdb, # Output PDB file
sequences=sequences, # Nanobody sequence
do_refine=False, # Refine the antibody structure with PyRosetta
do_renum=True, # Renumber predicted antibody structure (Chothia)
)
Predicted RMSD for antibody structures
RMSD estimates are calculated per-residue and recorded in the B-factor column of the output PDB file. These values are also returned from the fold
method.
from igfold import IgFoldRunner
from igfold.refine.pyrosetta_ref import init_pyrosetta
init_pyrosetta()
sequences = {
"H": "EVQLVQSGPEVKKPGTSVKVSCKASGFTFMSSAVQWVRQARGQRLEWIGWIVIGSGNTNYAQKFQERVTITRDMSTSTAYMELSSLRSEDTAVYYCAAPYCSSISCNDGFDIWGQGTMVTVS",
"L": "DVVMTQTPFSLPVSLGDQASISCRSSQSLVHSNGNTYLHWYLQKPGQSPKLLIYKVSNRFSGVPDRFSGSGSGTDFTLKISRVEAEDLGVYFCSQSTHVPYTFGGGTKLEIK"
}
pred_pdb = "my_antibody.pdb"
igfold = IgFoldRunner()
out = igfold.fold(
pred_pdb, # Output PDB file
sequences=sequences, # Antibody sequences
do_refine=True, # Refine the antibody structure with PyRosetta
do_renum=True, # Renumber predicted antibody structure (Chothia)
)
out.prmsd # Predicted RMSD for each residue's N, CA, C, CB atoms (dim: 1, L, 4)
Antibody sequence embedding
Representations from IgFold may be useful as features for machine learning models. The embed
method can be used to surface a variety of antibody representations from the model:
from igfold import IgFoldRunner
sequences = {
"H": "EVQLVQSGPEVKKPGTSVKVSCKASGFTFMSSAVQWVRQARGQRLEWIGWIVIGSGNTNYAQKFQERVTITRDMSTSTAYMELSSLRSEDTAVYYCAAPYCSSISCNDGFDIWGQGTMVTVS",
"L": "DVVMTQTPFSLPVSLGDQASISCRSSQSLVHSNGNTYLHWYLQKPGQSPKLLIYKVSNRFSGVPDRFSGSGSGTDFTLKISRVEAEDLGVYFCSQSTHVPYTFGGGTKLEIK"
}
igfold = IgFoldRunner()
emb = igfold.embed(
sequences=sequences, # Antibody sequences
)
emb.bert_embs # Embeddings from AntiBERTy final hidden layer (dim: 1, L, 512)
emb.gt_embs # Embeddings after graph transformer layers (dim: 1, L, 64)
emb.strucutre_embs # Embeddings after template incorporation IPA (dim: 1, L, 64)
Extra options
Refinement with OpenMM can be prioritized over PyRosetta by setting use_openmm=True
.
from igfold import IgFoldRunner
sequences = {
"H": "EVQLVQSGPEVKKPGTSVKVSCKASGFTFMSSAVQWVRQARGQRLEWIGWIVIGSGNTNYAQKFQERVTITRDMSTSTAYMELSSLRSEDTAVYYCAAPYCSSISCNDGFDIWGQGTMVTVS",
"L": "DVVMTQTPFSLPVSLGDQASISCRSSQSLVHSNGNTYLHWYLQKPGQSPKLLIYKVSNRFSGVPDRFSGSGSGTDFTLKISRVEAEDLGVYFCSQSTHVPYTFGGGTKLEIK"
}
pred_pdb = "my_antibody.pdb"
igfold = IgFoldRunner()
igfold.fold(
pred_pdb, # Output PDB file
sequences=sequences, # Antibody sequences
do_refine=True, # Refine the antibody structure with PyRosetta
use_openmm=True, # Use OpenMM for refinement
do_renum=True, # Renumber predicted antibody structure (Chothia)
)
Synthetic antibody structures
To demonstrate the capabilities of IgFold for large-scale prediction of antibody structures, we applied the model to two sets of natural paired antibody sequences.
The first set contains 104K non-redundant paired antibody sequences from the Observed Antibody Space database. These predicted structures are made available for use online.
$ wget https://data.graylab.jhu.edu/OAS_paired.tar.gz
The second set contains 1.3M unique paired antibodies from four human donors, collected by Jaffe et al.. These predicted structures are made available for use online.
$ wget https://data.graylab.jhu.edu/Jaffe2022.tar.gz
Bug reports
If you run into any problems while using IgFold, please create a Github issue with a description of the problem and the steps to reproduce it.
Citing this work
@article{ruffolo2021deciphering,
title = {Deciphering antibody affinity maturation with language models and weakly supervised learning},
author = {Ruffolo, Jeffrey A and Gray, Jeffrey J and Sulam, Jeremias},
journal = {arXiv},
year= {2021}
}
@article{ruffolo2022fast,
title = {Fast, accurate antibody structure prediction from deep learning on massive set of natural antibodies},
author = {Ruffolo, Jeffrey A and Chu, Lee-Shin and Mahajan, Sai Pooja and Gray, Jeffrey J},
journal = {bioRxiv},
year= {2022}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file igfold-0.3.1.tar.gz
.
File metadata
- Download URL: igfold-0.3.1.tar.gz
- Upload date:
- Size: 23.4 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.16
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a4b07e131432a1aa7b3f983529346e1d811a4837aa2ec67a789c25b20e77a443 |
|
MD5 | c9061a0488a569dc52e5473410c45b18 |
|
BLAKE2b-256 | 77f51e42082789f310b07ce292c782b6d7b3d413b157d00dd120372b3d9d429e |
File details
Details for the file igfold-0.3.1-py3-none-any.whl
.
File metadata
- Download URL: igfold-0.3.1-py3-none-any.whl
- Upload date:
- Size: 23.4 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.16
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3fc174f8baba5b9108f773737e288a1c7a511698a339c4a4c91a081f10eb84be |
|
MD5 | 48178fc581375460043b05c34c816e0c |
|
BLAKE2b-256 | b4a6e388d29c338d1a2de8fb17e23d35b3fa2568db71262f6464a36fb6916caf |