A Numpy-based PDB structure manipulation package
Project description
Afpdb - An Efficient Protein Structure Manipulation Tool
The emergence of AlphaFold and subsequent protein AI models has revolutionized protein design. To maximize the probability of success, the AI-driven protein design process involves analyzing thousands of protein structures. This includes handling structure file read/write operations, aligning structures and measuring structural deviations, standardizing chain/residue labels, extracting residues, identifying mutations, and creating visualizations. However, existing programming packages predate the recent AI breakthroughts, leading to extra human coding and slow code execution. To bridge this gap, we introduce the Afpdb module. Built upon AlphaFold’s numpy architecture, Afpdb offers a high-performance computing core. By leveraging the intuitive contig syntax proposed by RFDiffusion, afpdb makes code succint and readable. By offering a user-friendly interface that seamlessly integrates with PyMOL, afpdb automates visual quality control. Providing over 180 methods commonly used in protein AI design but not readily available elsewhere, afpdb enables users to write less but faster code for protein structure analyses.
Tutorial
The tutorial book is availabe in PDF.
The best way to learn and practice Afpdb is to open Tutorial Notebook in Google Colab.
Table of Content
- Demo
- Fundamental Concepts
- Internal Data Structure
- Contig
- Selection
- Atom Selection
- Residue Selection
- Residue List
- Read/Write
- Sequence & Chain
- Geometry, Measurement, & Visualization
- Select Neighboring Residues
- Display
- B-factors
- PyMOL Interface
- RMSD
- Solvent-Accessible Surface Area (SASA)
- Secondary Structures - DSSP
- Internal Coordinates
- Object Manipulation
- Move Objects
- Align
- Split & Merge Objects
- Parsers for AI Models
AI Use Cases
Interested in applying Afpdb to AI protein design? Open AI Use Case Notebook in Google Colab.
Table of Content
- Example AI Protein Design Use Cases
- Handle Missing Residues in AlphaFold Prediction
- Structure Prediction with ESMFold
- Create Side Chains for de novo Designed Proteins
- Compute Binding Scores in EvoPro
Developer's Note
Open Developer Notebook in Google Colab.
Install
Stable version:
pip install afpdb
or
conda install bioconda::afpdb
Development version:
pip install git+https://github.com/data2code/afpdb.git
or
git clone https://github.com/data2code/afpdb.git
cd afpdb
pip install .
To import the package use:
from afpdb.afpdb import Protein,RS,RL,ATS
Demo
Structure Read & Summary
# load the ab-ag complex structure 5CIL from PDB
p=Protein("5cil")
# show key statistics summary of the structure
p.summary().display()
Output
Chain Sequence Length #Missing Residues #Insertion Code First Residue Name Last Residue Name
-- ------- ---------------------------------------------------------------------------------------------------------------------
0 H VQLVQSGAEVKRPGSSVTVS... 220 20 14 2 227
1 L EIVLTQSPGTQSLSPGERAT... 212 0 1 1 211
2 P NWFDITNWLWYIK 13 0 0 671 683
Residue Relabeling
print("Old P chain residue numbering:", p.rs("P").name(), "\n")
Output:
Old P chain residue numbering: ['671', '672', '673', '674', '675', '676', '677', '678', '679', '680', '681', '682', '683']
p.renumber("RESTART", inplace=True)
print("New P chain residue numbering:", p.rs("P").name(), "\n")
Output:
New P chain residue numbering: ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13']
p.summary()
Output
Chain Sequence Length #Missing Residues #Insertion Code First Residue Name Last Residue Name
-- ------- ---------------------------------------------------------------------------------------------------------------------
0 H VQLVQSGAEVKRPGSSVTVS... 220 20 14 1 226
1 L EIVLTQSPGTQSLSPGERAT... 212 0 1 1 211
2 P NWFDITNWLWYIK 13 0 0 1 13
Replace Missing Residues for AI Prediction
print("Sequence for AlphaFold modeling, with missing residues replaced by Glycine:")
print(">5cil\n"+p.seq(gap="G")+"\n")
Output
Sequence for AlphaFold modeling, with missing residues replaced by Glycine:
>5cil
VQLVQSGAEVKRPGSSVTVSCKASGGSFSTYALSWVRQAPGRGLEWMGGVIPLLTITNYAPRFQGRITITADRSTSTAYLELNSLRPEDTAVYYCAREGTTGDGDLGKPIGAFAHWGQGTLVTVSSASTKGPSVFPLAPSGGGGGGGGGTAALGCLVKDYFPEPVTVGSWGGGGNSGALTSGGVHTFPAVLQSGSGLYSLSSVVTVPSSSLGTGGQGTYICNVNHKPSNTKVDKKGGVEP:EIVLTQSPGTQSLSPGERATLSCRASQSVGNNKLAWYQQRPGQAPRLLIYGASSRPSGVADRFSGSGSGTDFTLTISRLEPEDFAVYYCQQYGQSLSTFGQGTKVEVKRTVAAPSVFIFPPSDEQLKSGTASVVCLLNNFYPREAKVQWKVDNALQSGNSQESVTEQDSKDSTYSLSSTLTLSKADYEKHKVYACEVTHQGLSSPVTKSFNR:NWFDITNWLWYIK
Interface Computing
# identify H,L chain residues within 4A to antigen P chain
rs_binder, rs_seed, df_dist=p.rs_around("P", dist=4)
# show the distance of binder residues to antigen P chain
df_dist[:5].display()
Output
chain_a resn_a resn_i_a resi_a res_a chain_b resn_b resn_i_b resi_b res_b dist atom_a atom_b
--- --------- -------- ---------- -------- ------- --------- -------- ---------- -------- ------- ------- -------- --------
408 P 6 6 437 T H 94 94 97 E 2.63625 OG1 OE2
640 P 4 4 435 D L 32 32 252 K 2.81482 OD1 NZ
807 P 2 2 433 W L 94 94 314 S 2.91194 N OG
767 P 1 1 432 N L 91 91 311 Y 2.9295 ND2 O
526 P 7 7 438 N H 99E 99 107 K 3.03857 ND2 CE
Residue Selection & Boolean Operations
# create a new PDB file only containing the antigen and binder residues
p=p.extract(rs_binder | "P")
Structure I/O
# save the new structure into a local PDB file
p.save("binders.pdb")
Structure Display within Jupyter Notebook
# display the PDB struture, default is show ribbon and color by chains.
p.show(show_sidechains=True)
Output (It will be 3D interactive within Jupyter Notebook)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file afpdb-0.2.2.tar.gz
.
File metadata
- Download URL: afpdb-0.2.2.tar.gz
- Upload date:
- Size: 91.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.15
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fbdc60785aba0177a3766f952df8bc848aa03adcc0693506cc9f1f3601449e98 |
|
MD5 | 98da4cdb857a8b514143537c9a418f95 |
|
BLAKE2b-256 | 3a3c771a7ad385cd4d52b84f95fdb0313cee1eda7619bb62fe373ce214b05cfa |
File details
Details for the file afpdb-0.2.2-py3-none-any.whl
.
File metadata
- Download URL: afpdb-0.2.2-py3-none-any.whl
- Upload date:
- Size: 87.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.15
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9fd5b47dcef3d980b4df74d41df92d8d1aace675e98bbdacc7e18c53685a49be |
|
MD5 | 2b063bc384511d47df0c07cdb07018fb |
|
BLAKE2b-256 | d9b5a9614195d8d2f1bb392ba9e9d205cae1e4dd7d8e62032c2487245ccc0c2b |