A 3D Protein Pocket Detection & Classification Pipeline using geometry and electrostatic methods
Project description
BSPDA: Binding-Site(s) Predictor by Diego and Alberto
Authors: Diego Maquieira Vidal and Alberto Domingo Gómez
Introduction
Welcome to BSPDA (our PYT-SBI project). The aim of this pipeline is to predict the position of binding sites within the 3D structure of a given protein model.
While the primary focus is geometric prediction (locating surface cavities where a ligand could bind), BSPDA also performs complex chemical analysis. Using Solvent Accessible Surface Area (SASA) and studying partial charges, the pipeline suggests the type of ligand involved and lists the specific amino acids lining the pocket.
PocketPicker tool was used as the main reference to build BSPDA [1].
This tutorial will walk you through the modular architecture of the pipeline.
Prerequisites
Before running this tutorial, ensure you have propperly installed the package.
# Import the packages needed for this markdown.
import numpy as np
from rdkit import Chem
# Import the modular classes of the BSPDA pipeline
from structure import ProteinStructure
from geometry import GridGenerator
from detector import PocketDetector
from classifier import PocketClassifier
from io import Exporter
Step 1: Setup and Data Preparation
First, it is needed to download the target protein and process it. PDB files often contain "noise" for the methods of this pipeline, such as crystallized water molecules, buffer ions or existing ligands.
If hetero-atoms are not eliminated, the geometric algorithm will view the pockets as "already full" and fail to detect them.
print("1. Initializing and Cleaning Protein Structure...")
# Define output directory and target PDB ID
output_dir = "eval_results/1bid"
pdb_target = "1bid"
# Initialize the Structure object
struct = ProteinStructure(pdb_id=pdb_target, work_dir=output_dir)
# Download the PDB and load it into RDKit
struct.downloadPDB()
# Strip water, ions, and co-crystallized ligands to leave empty cavities
struct.strip_htm()
# Calculate Van der Waals radii for all remaining protein atoms
ptable = Chem.GetPeriodicTable()
vdw_radii = np.array([ptable.GetRvdw(atom.GetAtomicNum()) for atom in struct.mol.GetAtoms()])
print(f"Successfully loaded {pdb_target} with {struct.coords.shape[0]} heavy atoms.")
Step 2: 3D Grid Generation & Flood Fill
Before searching pockets, empty spaces need to be defined.
The GridGenerator creates a 3D bounding box around the protein and fills it with a voxel grid (1.0 Å resolution). It then measures the distance from every voxel to the nearest protein atom.
Using a 3D flood-fill algorithm, it distinguishes between the infinite bulk solvent outside the protein and the deep clefts/internal cavities.
print("2. Generating 3D Grid and Isolating Surface Probes...")
# Initialize grid generator with protein coordinates and VDW radii
grid_gen = GridGenerator(struct.coords, vdw_radii, resolution=1.0, padding=8.0)
# Apply flood-fill to find points that are empty, connected to the surface, but near the protein
surface_probes = grid_gen.get_valid_surface_points()
print(f"Generated {len(surface_probes)} valid surface probe points.")
Step 3: Ray-Casting Scoring & Clustering
Not all surface space is a binding pocket. To determine how "deep" or "buried" a point is, the PocketDetector casts 30 uniformly distributed rays outward from each probe.
If a ray hits a protein atom, it counts as a hit. Probes with high hit counts (high "buriedness") are deep inside clefts. Deep probes are filtered and grouped into spatial clusters.
print("3. Scoring Buriedness and Clustering Deep Pockets...")
detector = PocketDetector(struct.coords)
# Calculate ray-casting scores for all surface probes
detector.score_probes(surface_probes)
# Filter probes that are deeply buried (Score > 18 out of 30)
deep_mask = detector.scores > 18
deep_probes = surface_probes[deep_mask]
deep_scores = detector.scores[deep_mask]
# Group neighboring deep probes into distinct clusters (Minimum volume: 40 ų)
clusters = detector.cluster_pockets(surface_probes)
print(f"Detected {len(clusters)} distinct binding pockets.")
Step 4: Pocket(s) Classification (SASA-Based)
Once the geometric boundaries of the pockets have been defined, their biochemistry will be studied.
The PocketClassifier utilizes freesasa to calculate the exposed surface area of the atoms lining the pocket. By combining this with RDKit's Gasteiger partial charges, one can calculate the percentage of unpolar, positive, and negative surface area.
Based on these established literature standards, pockets are classified regarding their biological and chemical roles. For instance, high localized percentages of negative or positive surface area dictate the binding behavior of specific co-factors, metal ions, or highly polar parts of larger ligands [2]. Furthermore, empirical NMR screening data highlights that the physical volume and hydrophobic properties of a cavity are the primary indicators of its druggability, as they are required to stabilize the aromatic rings common in pharmacology [4]. In this way, valid drug pockets must meet minimum volume thresholds (typically larger than 30-40 ų, with primary druggable clefts often exceeding 500 ų), while smaller or highly charged cavities tend to be related to the transient binding of ions or certain solvent molecules [4]. Ultimately, this study of spatial volume, surface area, and atomic partial charges tries to use the most important features used in recent methodologies [3].
print("4. Analyzing Surface Chemistry and Lining Residues...\n")
# Initialize the classifier (this automatically computes Gasteiger charges and SASA)
classifier = PocketClassifier(struct.mol)
# Analyze all geometric clusters
analyzed_pockets = classifier.analyze_clusters(deep_probes, deep_scores, clusters)
# Print out the detailed report for each pocket
for p in analyzed_pockets:
print(f"=== POCKET {p.pocket_id} ===")
print(f" Classification: {p.classification}")
print(f" Volume: {p.volume} A^3")
print(f" Chemical Makeup: {p.p_unp:.1f}% Unpolar | {p.p_pos:.1f}% Positive | {p.p_neg:.1f}% Negative")
print(f" Lining Residues: {', '.join(p.lining_residues)}\n")
Step 5: Output Generation & PyMOL Visualization
The Exporter class maps the deep probes back into a standard PDB file format, cleverly storing the "buriedness score" in the B-factor column so we can apply color gradients.
It also automatically generates four distinct .pml scripts to view the protein surface, electrostatics, and pocket grids in PyMOL.
print("5. Exporting PDB and PyMOL Visualization Scripts...")
pdb_filename = "scored_pockets.pdb"
pdb_out_path = f"{output_dir}/{pdb_filename}"
# Save the synthetic pocket probes into a unified PDB file
Exporter.save_pdb(struct.mol, analyzed_pockets, pdb_out_path)
# Generate the 4 PyMOL visualization permutations
Exporter.write_pymol_scripts(output_dir, pdb_filename, len(analyzed_pockets))
print("Pipeline Complete! Check your output directory for the PyMOL scripts.")
Visualizing Results
Navigate to your output directory and open any of the generated scripts in PyMOL:
1_grey_surface_gradient_dots.pml: standard view, pocket depth shown via heat map.2_grey_surface_discrete_dots.pml: standard view, pockets colored by unique ID.3_electro_surface_gradient_dots.pml: protein colored by charge (red/blue), depth heat map.4_electro_surface_discrete_dots.pml: protein colored by charge, pockets colored by ID.
1-Line Execution in the terminal
For batch validations (like the PocketPicker validation script), it is not needed to write step by step the steps above. One can utilize the orchestration function defined in __init__.py:
from pocket_finder import run_pipeline
# Runs steps 1-5 automatically and returns the list of PocketProperties dataclasses
pockets = run_pipeline(input_pdb="structure_files/pdb1bid.ent", output_dir="results_1bid")
References
[1] Weisel, M., Proschak, E., & Schneider, G. (2007). PocketPicker: Analysis of ligand binding-sites with shape descriptors. Chemistry Central Journal, 1(1), 7. https://doi.org/10.1186/1752-153X-1-7
[2] Dundas, J., Adamian, L., & Liang, J. (2011). Structural signatures of enzyme binding pockets from order-independent surface alignment: A study of metalloendopeptidase and NAD binding proteins. Journal of Molecular Biology, 406(5), 713–729. https://doi.org/10.1016/j.jmb.2010.12.005
[3] Wang, D. D., Wu, W., & Wang, R. (2024). Structure-based, deep-learning models for protein–ligand binding affinity prediction. Journal of Cheminformatics, 16, 2. https://doi.org/10.1186/s13321-023-00795-9
[4] Hajduk, P. J., Huth, J. R., & Fesik, S. W. (2005). Druggability indices for protein targets derived from NMR-based screening data. Journal of Medicinal Chemistry, 48(7), 2518–2525. https://doi.org/10.1021/jm049131r
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file bspda-0.1.0.tar.gz.
File metadata
- Download URL: bspda-0.1.0.tar.gz
- Upload date:
- Size: 17.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
627841580407e087f92024a40118ae4f709710cf93eecfab4661ecf123fb665c
|
|
| MD5 |
3b1b9183b75fa4cdc4b8c679f6bd3f94
|
|
| BLAKE2b-256 |
e79b51027a89382ea0252f5a23fe94e38a2e33536889de582d6073deb1aebea7
|
File details
Details for the file bspda-0.1.0-py3-none-any.whl.
File metadata
- Download URL: bspda-0.1.0-py3-none-any.whl
- Upload date:
- Size: 20.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0243b9f8d35bce274a782d9c08cab03b421addd5ae5a8b06062ddbabc361cbc2
|
|
| MD5 |
0ec95bfe2877f370454cefad2e77de50
|
|
| BLAKE2b-256 |
f54cc3535a44e5eee9902d55edbddce576b18438bd8d47cb3484a323876b9f25
|