Skip to main content

A Numpy-based PDB structure manipulation package

Project description

Afpdb - An Efficient Protein Structure Manipulation Tool

The emergence of AlphaFold and subsequent protein AI models has revolutionized protein design. To maximize the probability of success, the AI-driven protein design process involves analyzing thousands of protein structures. This includes handling structure file read/write operations, aligning structures and measuring structural deviations, standardizing chain/residue labels, extracting residues, identifying mutations, and creating visualizations. However, existing programming packages do not fully address these challenges. To bridge this gap, we introduce the Afpdb module. Built upon AlphaFold’s numpy architecture and leveraging the intuitive contig syntax proposed by RFDiffusion, afpdb streamlines structure analyses. While supplementing Biopython with dozens of methods commonly used in protein AI design but not readily available elsewhere; it also offers a user-friendly interface that seamlessly integrates with PyMOL’s visualization capabilities.

Tutorial

The tutorial book is availabe in PDF.

The best way to learn and practice Afpdb is to open Tutorial Notebook in Google Colab.

Table of Content

  1. Demo
  2. Fundamental Concepts
    • Internal Data Structure
    • Contig
  3. Selection
    • Atom Selection
    • Residue Selection
    • Residue List
  4. Read/Write
  5. Sequence & Chain
  6. Geometry, Measurement, & Visualization
    • Select Neighboring Residues
    • Display
    • B-factors
    • PyMOL Interface
    • RMSD
    • Solvent-Accessible Surface Area (SASA)
    • Secondary Structures - DSSP
    • Internal Coordinates
  7. Object Manipulation
    • Move Objects
    • Align
    • Split & Merge Objects

AI Use Cases

Interested in applying Afpdb to AI protein design? Open AI Use Case Notebook in Google Colab.

Table of Content

  • Example AI Protein Design Use Cases
    • Handle Missing Residues in AlphaFold Prediction
    • Structure Prediction with ESMFold
    • Create Side Chains for de novo Designed Proteins
    • Compute Binding Scores in EvoPro

Developer's Note

Open Developer Notebook in Google Colab.

Install

pip install git+https://github.com/data2code/afpdb.git

or

git clone https://github.com/data2code/afpdb.git
cd afpdb
pip install .

To import the package use:

from afpdb.afpdb import Protein,RS,RL,ATS

Demo

Structure Read & Summary

# load the ab-ag complex structure 5CIL from PDB
p=Protein("5cil")
# show key statistics summary of the structure
p.summary().display()

Output

    Chain    Sequence                    Length    #Missing Residues    #Insertion Code    First Residue Name    Last Residue Name
--  -------  ---------------------------------------------------------------------------------------------------------------------
 0  H        VQLVQSGAEVKRPGSSVTVS...        220                   20                 14                     2                  227
 1  L        EIVLTQSPGTQSLSPGERAT...        212                    0                  1                     1                  211
 2  P        NWFDITNWLWYIK                   13                    0                  0                   671                  683

Residue Relabeling

print("Old P chain residue numbering:", p.rs("P").name(), "\n")

Output:
Old P chain residue numbering: ['671', '672', '673', '674', '675', '676', '677', '678', '679', '680', '681', '682', '683'] 

p.renumber("RESTART", inplace=True)
print("New P chain residue numbering:", p.rs("P").name(), "\n")

Output:
New P chain residue numbering: ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13'] 

p.summary()

Output

    Chain    Sequence                    Length    #Missing Residues    #Insertion Code    First Residue Name    Last Residue Name
--  -------  ---------------------------------------------------------------------------------------------------------------------
 0  H        VQLVQSGAEVKRPGSSVTVS...        220                   20                 14                     1                  226
 1  L        EIVLTQSPGTQSLSPGERAT...        212                    0                  1                     1                  211
 2  P        NWFDITNWLWYIK                   13                    0                  0                     1                   13

Replace Missing Residues for AI Prediction

print("Sequence for AlphaFold modeling, with missing residues replaced by Glycine:")
print(">5cil\n"+p.seq(gap="G")+"\n")

Output

Sequence for AlphaFold modeling, with missing residues replaced by Glycine:
>5cil
VQLVQSGAEVKRPGSSVTVSCKASGGSFSTYALSWVRQAPGRGLEWMGGVIPLLTITNYAPRFQGRITITADRSTSTAYLELNSLRPEDTAVYYCAREGTTGDGDLGKPIGAFAHWGQGTLVTVSSASTKGPSVFPLAPSGGGGGGGGGTAALGCLVKDYFPEPVTVGSWGGGGNSGALTSGGVHTFPAVLQSGSGLYSLSSVVTVPSSSLGTGGQGTYICNVNHKPSNTKVDKKGGVEP:EIVLTQSPGTQSLSPGERATLSCRASQSVGNNKLAWYQQRPGQAPRLLIYGASSRPSGVADRFSGSGSGTDFTLTISRLEPEDFAVYYCQQYGQSLSTFGQGTKVEVKRTVAAPSVFIFPPSDEQLKSGTASVVCLLNNFYPREAKVQWKVDNALQSGNSQESVTEQDSKDSTYSLSSTLTLSKADYEKHKVYACEVTHQGLSSPVTKSFNR:NWFDITNWLWYIK

Interface Computing

# identify H,L chain residues within 4A to antigen P chain
rs_binder, rs_seed, df_dist=p.rs_around("P", dist=4)

# show the distance of binder residues to antigen P chain
df_dist[:5].display()

Output

     chain_a      resn_a    resn_i_a    resi_a  res_a    chain_b    resn_b      resn_i_b    resi_b  res_b       dist  atom_a    atom_b
---  ---------  --------  ----------  --------  -------  ---------  --------  ----------  --------  -------  -------  --------  --------
408  P                 6           6       437  T        H          94                94        97  E        2.63625  OG1       OE2
640  P                 4           4       435  D        L          32                32       252  K        2.81482  OD1       NZ
807  P                 2           2       433  W        L          94                94       314  S        2.91194  N         OG
767  P                 1           1       432  N        L          91                91       311  Y        2.9295   ND2       O
526  P                 7           7       438  N        H          99E               99       107  K        3.03857  ND2       CE

Residue Selection & Boolean Operations

# create a new PDB file only containing the antigen and binder residues
p=p.extract(rs_binder | "P")

Structure I/O

# save the new structure into a local PDB file
p.save("binders.pdb")

Structure Display within Jupyter Notebook

# display the PDB struture, default is show ribbon and color by chains.
p.show(show_sidechains=True)

Output (It will be 3D interactive within Jupyter Notebook)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

afpdb-0.2.1.tar.gz (90.9 kB view details)

Uploaded Source

Built Distribution

afpdb-0.2.1-py3-none-any.whl (87.8 kB view details)

Uploaded Python 3

File details

Details for the file afpdb-0.2.1.tar.gz.

File metadata

  • Download URL: afpdb-0.2.1.tar.gz
  • Upload date:
  • Size: 90.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.15

File hashes

Hashes for afpdb-0.2.1.tar.gz
Algorithm Hash digest
SHA256 7e67c728a25c9ac565c79fa2615b0fd84ecbf9d661b1f575a863156e169fcf10
MD5 8c052992daf4ab81a64ec6702430cd78
BLAKE2b-256 c11f494f1b6acde7d5a1f70b8f9fc74500a4d57137eb9c4b8e86c48dbf79b8b9

See more details on using hashes here.

File details

Details for the file afpdb-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: afpdb-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 87.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.15

File hashes

Hashes for afpdb-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c7330b8da9a2c507017c249106438be4f44a09d9b9ab72ad717ba3ce66b677d9
MD5 2e7ec497375d54c90448949b3430cfca
BLAKE2b-256 4fa0c2a2127dbc7c2029fdbbee18db0be951ec991e7470f3ea2c33185240116b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page