Search the RCSB for occluded volumes like cavities, pockets, and pores.

These details have not been verified by PyPI

Project description

volumizer discovers and annotates occluded volumes in proteins including:

cavities: Volumes within a protein that do not make any contacts with bulk solvent. Useful for e.g. carrying cargo.
pockets: Volumes on the protein surface that make a single contact with bulk solvent. Useful for e.g. ligand binding or catalysis.
pores: Volumes connecting two bulk solvent surfaces. Useful for e.g. filtering solutes.
hubs: Volumes connecting more than two bulk solvent surfaces. Useful for e.g. containing a small reaction volume.

Example Identified Volume

Here is shown an example pore identified and annotated (red) in PDB 4JPN (green).

The same pore volume annotated (red) shown as a slice through the protein (grey).

The dataframe output shows volume/dimensinos of each occluded volume in the structure

	id	type	volume	x	y	z
0	0	pore	38286.0	108.221	38.574	36.310
1	0	pocket	189.0	8.214	5.628	0.000
2	1	pocket	162.0	6.635	3.843	2.840
3	2	pocket	162.0	7.298	4.002	2.557
4	3	pocket	162.0	10.757	2.701	0.000
5	4	pocket	135.0	6.000	6.000	0.000

Installation

The package is published on PyPi, pip install volumizer.

Developing the Volumizer Package

If you want to develop the package, poetry install within the repository. Test with pytest within the repository base directory

Usage

Using the test file tests/pdbs/4jpn.pdb try out the following:

Volumize a PDB and Save the Volumized PDB and DataFrame

Performing end-to-end loading, cleaning, volumizing, and saving is done with a single convenience function:

from volumizer import volumizer

volumizer.volumize_pdb_and_save("my_input.pdb", "volumized_pdb.pdb", "volumized_df.json")

Load a PDB as a Biotite Structure, Clean, Volumize, and Save Output

If you want access to the individual end-products: volume dataframe, the input structure after cleaning, and the structure of the volumes:

from volumizer import volumizer, pdb

pdb_structure = pdb.load_structure("my_input.pdb")
volumes_df, cleaned_structure, volumes_structure = volumizer.volumize_structure(pdb_structure)

# take the cleaned input and annotated volumes and convert them to a PDB format string and then save
#  modify the `deliminator` to suit your visualization preference
#  e.g. the default "END" allows Pymol to load the resulting PDB file as two separate objects, one for the cleaned input, and one for the volumes
pdb_lines = pdb.make_volumized_pdb_lines([cleaned_structure, volumes_structure], deliminator="END")
pdb.save_pdb_lines(pdb_lines, "volumized_pdb.pdb")

volumes_df.to_json("volumized_df.json")

Changing Resolution, Modifying PDB Cleaning, and Beyond

If you are interested in additional control over the volumizing method, the resolution of the voxels can be changed, you can skip cleaning, or modify which residues are kept/removed by the cleaning process

Note: the default voxel resolution is 3.0 Angstroms, which gives sensible results in the majority of cases. Higher resolutions especially < 2.0 Angstroms will often find small paths through a protein structure, making e.g. cavities look like pores, etc. Lower resolutions are faster to compute, but may begin to under-estimate the true volume of solvent occluded elements.

Note: by default all residues that make L- or D- peptide bonds are retained through cleaning (e.g. Non-canonicals are kept, even if they are heteroatoms in PDB structure). By constrast all non-covalently attached residues are removed. Currently glycan residues are also removed as they make non-peptide bonds, below is shown an example of how would would retain glycans.

from volumizer import volumizer, pdb, utils

# specify whichever residues you are interested in keeping during cleaning, or having additionally removed
KEEP_RESIDUE_NAMES = {"GAL", "NAG", "MAN", "GLC"}  # some example sugar residues to keep when cleaning
REMOVE_RESIDUE_NAMES = {"SME"}  # some example NCAAs to remove when cleaning

utils.set_resolution(2.0)
utils.add_protein_components(KEEP_RESIDUE_NAMES)
utils.remove_protein_components(REMOVE_RESIDUE_NAMES)

pdb_structure = pdb.load_structure("my_input.pdb")
cleaned_structure = volumizer.prepare_pdb_structure(pdb_structure)  # skip this if you want to keep the exact input structure
volumes_df, volumes_structure = volumizer.annotate_structure_volumes(cleaned_structure)

# take the cleaned input and annotated volumes and convert them to a PDB format string and then save
#  modify the `deliminator` to suit your visualization preference
#  e.g. the default "END" allows Pymol to load the resulting PDB file as two separate objects, one for the cleaned input, and one for the volumes
pdb_lines = pdb.make_volumized_pdb_lines([cleaned_structure, volumes_structure], deliminator="END")
pdb.save_pdb_lines(pdb_lines, "volumized_pdb.pdb")

volumes_df.to_json("volumized_df.json")

How It Works

volumizer identifies hydrated volumes in a protein structure that are not fully solvent exposed, e.g. a binding pocket. It then computes the volume and dimensions of these and outputs that information along with an annotated version of the input PDB showing where these volumes are (which can be visualized in e.g. Pymol, Chimerax, etc.).

Identifying hydrated volumes

A large voxel-grid is built around the atoms of the protein or other structure supplied.
All voxels within a van der Waals radius of a protein or other atom is flagged as being non-solvent
Remaining solvent atoms are then broken into two groups: bulk solvent and occluded volumes This is done by tracing a vector along each ordinal axis from a given voxel and if 2 or more of these vectors would cross a non-solvent voxel, then the query voxel is identified as an occluded volume to be further analyzed otherwise it is considered bulk solvent The intention is to identify points on the grid that are outside the protein as bulk solvent
All occluded volume voxels are then grouped into a number of continuous volumes
For each continous volume the number of distinct surfaces that contact bulk solvent voxels is computed and used to indicate the volume type: 0 surfaces interacting with solvent: cavity 1 surface interacting with solvent: pocket 2 surfaces interacting with solvent: pore 3+ surfaces interacting with solvent: hub

Annotations

Pandas DataFrame

Annotations are given as a pandas data frame saved as a .json file. The annotation lists all hydrated volumes ordered by total volume, giving the type of volume, and dimensions.

PDB File

The input PDB file will be annotated by adding atoms to represent the hydrated volumes. The ATOM entries contain several points of information about the volume from which they come:

Type of volume: the residue name encodes the type of volume in 3-letter code

OCC for occluded
CAV for cavity
POK for pocket
POR for pore
HUB for hub

Surface of the hydrated volume that interacts with bulk solvent: this is indicated by a B-factor of 50.0, whereas the remainder of the volume (that does not interact with the bulk solvent) has a value of 0.0.

All atoms of a particular volume are grouped under the same residue number.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.4

Aug 18, 2023

0.1.3

Aug 5, 2023

0.1.2

Jul 15, 2023

This version

0.1.1

Jul 15, 2023

0.1.0

Jul 9, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

volumizer-0.1.1.tar.gz (23.3 kB view details)

Uploaded Jul 15, 2023 Source

Built Distribution

volumizer-0.1.1-py3-none-any.whl (22.5 kB view details)

Uploaded Jul 15, 2023 Python 3

File details

Details for the file volumizer-0.1.1.tar.gz.

File metadata

Download URL: volumizer-0.1.1.tar.gz
Upload date: Jul 15, 2023
Size: 23.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.3.1 CPython/3.10.11 Linux/6.0.12-200.fc36.x86_64

File hashes

Hashes for volumizer-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`6a78909b4d331d5d40f0a0321bd9361387e9c112dc2e90e36695a9537f74d257`
MD5	`73a2b1f97d232c9ae46486e6928f373f`
BLAKE2b-256	`68139429cc1d01b76dface0996cc13811df7f24decced65413f1ed3d086df9eb`

See more details on using hashes here.

File details

Details for the file volumizer-0.1.1-py3-none-any.whl.

File metadata

Download URL: volumizer-0.1.1-py3-none-any.whl
Upload date: Jul 15, 2023
Size: 22.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.3.1 CPython/3.10.11 Linux/6.0.12-200.fc36.x86_64

File hashes

Hashes for volumizer-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7c8de9b5c079860321c27f039093d9a2b883e51e437ee97573789a8d4b8e9c5d`
MD5	`139d06f0252d1424f5148ffad31d66b0`
BLAKE2b-256	`4834e25e3f8489ba35afc1eba7df75be9cb52173a4eb8718e51ceea497620889`