Skip to main content

No project description provided

Project description

IgFold

Official repository for IgFold: Fast, accurate antibody structure prediction from deep learning on massive set of natural antibodies.

The code, data, and weights for this work are made available for non-commercial use (including at commercial entities) under the terms of the JHU Academic Software License Agreement. For commercial inquiries, please contact jruffolo[at]jhu.edu.

Install

For easiest use, install IgFold via PyPI:

$ pip install igfold

To access the latest version of the code, clone and install the repository:

$ git clone git@github.com:Graylab/IgFold.git 
$ pip install IgFold

Note: To predict refined, full-atom antibody structures, PyRosetta should be installed following the instructions here.

Usage

Note: The first time IgFoldRunner is initialized, it will download the pre-trained weights. This may take a few minutes and will require a network connection.

Antibody structure prediction from sequence

Paired antibody sequences can be provided as a dictionary of sequences, where the keys are chain names and the values are the sequences.

from igfold import IgFoldRunner, init_pyrosetta

init_pyrosetta()

sequences = {
    "H": "EVQLVQSGPEVKKPGTSVKVSCKASGFTFMSSAVQWVRQARGQRLEWIGWIVIGSGNTNYAQKFQERVTITRDMSTSTAYMELSSLRSEDTAVYYCAAPYCSSISCNDGFDIWGQGTMVTVS",
    "L": "DVVMTQTPFSLPVSLGDQASISCRSSQSLVHSNGNTYLHWYLQKPGQSPKLLIYKVSNRFSGVPDRFSGSGSGTDFTLKISRVEAEDLGVYFCSQSTHVPYTFGGGTKLEIK"
}
pred_pdb = "my_antibody.pdb"

igfold = IgFoldRunner()
igfold.fold(
    pred_pdb, # Output PDB file
    sequences=sequences, # Antibody sequences
    do_refine=True, # Refine the antibody structure with PyRosetta
    do_renum=True, # Send predicted structure to AbNum server for Chothia renumbering
)

To predict a nanobody structure (or an individual heavy or light chain), simply provide one sequence:

from igfold import IgFoldRunner, init_pyrosetta

init_pyrosetta()

sequences = {
    "H": "QVQLQESGGGLVQAGGSLTLSCAVSGLTFSNYAMGWFRQAPGKEREFVAAITWDGGNTYYTDSVKGRFTISRDNAKNTVFLQMNSLKPEDTAVYYCAAKLLGSSRYELALAGYDYWGQGTQVTVS"
}
pred_pdb = "my_nanobody.pdb"

igfold = IgFoldRunner()
igfold.fold(
    pred_pdb, # Output PDB file
    sequences=sequences, # Nanobody sequence
    do_refine=True, # Refine the antibody structure with PyRosetta
    do_renum=True, # Send predicted structure to AbNum server for Chothia renumbering
)

To predict a structure without PyRosetta refinement, set do_refine=False:

from igfold import IgFoldRunner

sequences = {
    "H": "QVQLQESGGGLVQAGGSLTLSCAVSGLTFSNYAMGWFRQAPGKEREFVAAITWDGGNTYYTDSVKGRFTISRDNAKNTVFLQMNSLKPEDTAVYYCAAKLLGSSRYELALAGYDYWGQGTQVTVS"
}
pred_pdb = "my_nanobody.pdb"

igfold = IgFoldRunner()
igfold.fold(
    pred_pdb, # Output PDB file
    sequences=sequences, # Nanobody sequence
    do_refine=False, # Refine the antibody structure with PyRosetta
    do_renum=True, # Send predicted structure to AbNum server for Chothia renumbering
)

Predicted RMSD for antibody structures

RMSD estimates are calculated per-residue and recorded in the B-factor column of the output PDB file. These values are also returned from the fold method.

from igfold import IgFoldRunner, init_pyrosetta

init_pyrosetta()

sequences = {
    "H": "EVQLVQSGPEVKKPGTSVKVSCKASGFTFMSSAVQWVRQARGQRLEWIGWIVIGSGNTNYAQKFQERVTITRDMSTSTAYMELSSLRSEDTAVYYCAAPYCSSISCNDGFDIWGQGTMVTVS",
    "L": "DVVMTQTPFSLPVSLGDQASISCRSSQSLVHSNGNTYLHWYLQKPGQSPKLLIYKVSNRFSGVPDRFSGSGSGTDFTLKISRVEAEDLGVYFCSQSTHVPYTFGGGTKLEIK"
}
pred_pdb = "my_antibody.pdb"

igfold = IgFoldRunner()
out = igfold.fold(
    pred_pdb, # Output PDB file
    sequences=sequences, # Antibody sequences
    do_refine=True, # Refine the antibody structure with PyRosetta
    do_renum=True, # Send predicted structure to AbNum server for Chothia renumbering
)

out.prmsd # Predicted RMSD for each residue's N, CA, C, CB atoms (dim: 1, L, 4)

Antibody sequence embedding

Features from IgFold may be useful as a feature for machine learning models. The embed method can be used to surface a variety of antibody representations from the model:

from igfold import IgFoldRunner

sequences = {
    "H": "EVQLVQSGPEVKKPGTSVKVSCKASGFTFMSSAVQWVRQARGQRLEWIGWIVIGSGNTNYAQKFQERVTITRDMSTSTAYMELSSLRSEDTAVYYCAAPYCSSISCNDGFDIWGQGTMVTVS",
    "L": "DVVMTQTPFSLPVSLGDQASISCRSSQSLVHSNGNTYLHWYLQKPGQSPKLLIYKVSNRFSGVPDRFSGSGSGTDFTLKISRVEAEDLGVYFCSQSTHVPYTFGGGTKLEIK"
}

igfold = IgFoldRunner()
emb = igfold.embed(
    sequences=sequences, # Antibody sequences
)

emb.bert_embs # Embeddings from AntiBERTy final hidden layer (dim: 1, L, 512)
emb.gt_embs # Embeddings after graph transformer layers (dim: 1, L, 64)
emb.strucutre_embs # Embeddings after template incorporation IPA (dim: 1, L, 64)

Bug reports

If you run into any problems while using IgFold, please create a Github issue with a description of the problem and the steps to reproduce it.

Citing this work

@article{ruffolo2021deciphering,
    title = {Deciphering antibody affinity maturation with language models and weakly supervised learning},
    author = {Ruffolo, Jeffrey A and Gray, Jeffrey J and Sulam, Jeremias},
    journal = {arXiv preprint arXiv:2112.07782},
    year= {2021}
}
@article{ruffolo2021deciphering,
    title = {Fast, accurate antibody structure prediction from deep learning on massive set of natural antibodies},
    author = {Ruffolo, Jeffrey A, Lee-Shin Chu, Sai Pooja Mahajan, and Gray, Jeffrey J},
    journal = {bioRxiv},
    year= {2022}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

igfold-0.0.4.tar.gz (23.9 kB view details)

Uploaded Source

Built Distribution

igfold-0.0.4-py3-none-any.whl (28.6 kB view details)

Uploaded Python 3

File details

Details for the file igfold-0.0.4.tar.gz.

File metadata

  • Download URL: igfold-0.0.4.tar.gz
  • Upload date:
  • Size: 23.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.7.11

File hashes

Hashes for igfold-0.0.4.tar.gz
Algorithm Hash digest
SHA256 9b1bb765e68f011edd6fc5bb796366b5c939e969022470a7872960494e3d3d5b
MD5 d0ed4433bce78404e0457351fc027d5c
BLAKE2b-256 a7e7f2e8dbc943ee691e58b429b5e2c5b8a67bd96d5760d690677883663953d2

See more details on using hashes here.

File details

Details for the file igfold-0.0.4-py3-none-any.whl.

File metadata

  • Download URL: igfold-0.0.4-py3-none-any.whl
  • Upload date:
  • Size: 28.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.7.11

File hashes

Hashes for igfold-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 7516afdb3fc80f9d88793b2d75b11f6955eb3528a260dbbdbb9382e1c23dac70
MD5 293327b067eca61ba7752b2593bfd11d
BLAKE2b-256 e40b2b3a7402630e7831cf0e6bfedc731d4a0be8dbda2fb27541fed05f71257a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page