A tool to find and fill protein cavities with water molecules using KVFinder and Packmol
Project description
CaveFiller
A Python tool to find and fill protein cavities with water molecules using KVFinder, Monte Carlo sampling, and RDKit-based explicit water generation.
Features
- Cavity Detection: Uses pyKVFinder to detect cavities in protein structures
- Interactive Selection: Select specific cavities to fill with user-defined water counts
- Monte Carlo Sampling: Places water molecules using Monte Carlo sampling with clash detection
- Explicit Waters: Builds full H-O-H waters with RDKit (including hydrogens)
- CLI Interface: Easy-to-use command-line interface built with Typer
Installation
Prerequisites
- Python: Python 3.8 or higher
Install CaveFiller
# Clone the repository
git clone https://github.com/Desperadus/CaveFiller.git
cd CaveFiller
# Install the package
pip install -e .
Usage
Basic Usage
cavefiller protein.pdb
This will:
- Detect cavities in
protein.pdb - Display a list of found cavities with their volumes and areas
- Prompt you to select which cavities to fill
- Prompt you for the number of water molecules per cavity
- Place waters using Monte Carlo sampling with clash detection
- Build explicit RDKit H-O-H waters and export a combined PDB
- Save the output to
./output/protein_filled.pdb
Command-line Options
cavefiller [PROTEIN_FILE] [OPTIONS]
Arguments:
PROTEIN_FILE: Path to the protein PDB file (required)
Options:
--output-dir PATH: Directory to save output files (default:./output)--grid-step FLOAT: Grid spacing for cavity detection in Ångströms (default: 0.6)--probe-in FLOAT: Probe In radius for cavity detection in Ångströms (default: 1.4)--probe-out FLOAT: Probe Out radius for cavity detection in Ångströms (default: 4.0)--exterior-trim-distance FLOAT: Exterior trim distance in Ångströms (default: 2.4)--volume-cutoff FLOAT: Minimum cavity volume to consider in Ų (default: 5.0)--auto-select: Automatically select all cavities without user interaction--cavity-ids TEXT: Comma-separated list of cavity IDs to fill (e.g., '1,2,3')--waters-per-cavity TEXT: Comma-separated list of water counts (e.g., '10,15,20'), must match cavity-ids order--optimize-mmff94 / --no-optimize-mmff94: Enable/disable MMFF94 with protein fixed (default: enabled)--mmff-max-iterations INTEGER: Max MMFF94 iterations (default: 300)--remove-after-optim / --no-remove-after-optim: After MMFF94, remove waters that fail post-checks (default: enabled)- Also accepted:
--remove_after_optim / --no_remove_after_optim
- Also accepted:
Recommended usage:
- Prefer interactive/manual cavity and water-count selection over
--auto-select. Auto-selection often overfills cavities with too many waters. - Keep
--optimize-mmff94enabled (recommended) to refine water placement after Monte Carlo sampling. - Use
--no-remove-after-optimif you want to keep all waters after MMFF94, even if they clash or move out of cavity bounds.
Examples
Interactive cavity and water selection:
cavefiller protein.pdb --output-dir results
Auto-select all cavities with default water counts (not generally recommended):
cavefiller protein.pdb --auto-select
Fill specific cavities with specific water counts:
cavefiller protein.pdb --cavity-ids "1,3,5" --waters-per-cavity "10,15,20"
Custom cavity detection parameters:
cavefiller protein.pdb --grid-step 0.6 --probe-in 1.4 --probe-out 4.0 --exterior-trim-distance 2.4 --volume-cutoff 5.0
Workflow
- Cavity Detection: The tool uses pyKVFinder to detect cavities in the input protein structure
- Cavity Analysis: Displays information about detected cavities (ID, volume, surface area)
- Cavity Selection:
- Interactive mode: User selects cavities and specifies water counts
- Auto mode: All cavities are selected with automatic water count estimation
- Command-line mode: Specific cavities and water counts are pre-selected
- Water Placement:
- Monte Carlo sampling places waters randomly in cavity
- Clash detection validates each position against protein atoms and other waters
- Uses Van der Waals radii for distance calculations
- RDKit Water Construction:
- Explicit H-O-H waters are generated with ideal geometry
- Waters include hydrogens and proper HOH residue records in the output PDB
Algorithm Details
Monte Carlo Sampling
- Samples around cavity grid points with small local jitter
- Validates position stays near cavity voxels (< 0.7 Å from a grid point)
- Checks for clashes with protein atoms (minimum distance based on VDW radii)
- Checks for clashes with other waters (minimum 2.7 Å separation)
- Attempts up to 500 placements per water molecule
Clash Detection
- Uses Van der Waals radii for different atom types (H, C, N, O, S, P)
- Minimum water-protein distance: 2.35 Å
- Minimum water-water distance: 2.7 Å
- Tolerance of 0.5 Å for VDW overlap
RDKit Water Geometry
- Creates proper H-O-H geometry for each water
- Writes explicit HOH residues (O, H1, H2) into output PDB
Output
The tool generates the following files in the output directory:
protein_filled.pdb: Protein structure with explicit water molecules in selected cavities
Dependencies
- typer: CLI framework
- pyKVFinder: Cavity detection
- rdkit: Molecular manipulation and explicit water generation
- numpy: Numerical operations
- biopython: PDB file handling
Development
Running Tests
pip install -e ".[dev]"
pytest
Code Formatting
black cavefiller/
ruff check cavefiller/
Automated CI/CD and PyPI Publishing
This repository includes GitHub Actions workflow at .github/workflows/ci-cd.yml that:
- Runs
pyteston every push tomain - Runs
pyteston every pull request targetingmain - Builds package distributions after tests pass
- Publishes to PyPI only on pushes to
mainwherepyproject.tomlproject.versionchanged
One-time setup for automatic PyPI publishing
- Create a PyPI account at https://pypi.org and create your project once (or publish once manually so the name exists).
- In PyPI, open your project settings and add a Trusted Publisher:
- Owner: your GitHub username/org
- Repository:
Desperadus/CaveFiller - Workflow name:
CI/CD - Environment: leave empty (unless you choose to use one)
- In GitHub, ensure Actions are enabled for the repository.
No PyPI API token secret is needed when using Trusted Publishing.
Releasing a new version
- Bump version in both:
pyproject.toml(project.version)cavefiller/__init__.py(__version__)
- Commit and push to
main. - CI will publish that pushed version to PyPI automatically, but only if
pyproject.tomlversion changed versus the previous commit onmain.
License
See LICENSE file for details.
Citation
If you use CaveFiller in your research, please cite:
- pyKVFinder: Guerra et al. (2020) BMC Bioinformatics
- RDKit: RDKit: Open-source cheminformatics; http://www.rdkit.org
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cavefiller-0.3.1.tar.gz.
File metadata
- Download URL: cavefiller-0.3.1.tar.gz
- Upload date:
- Size: 23.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0e3a7d204fce8ac98fba5bdabd87e7507f34e985d8ba612f93a4e0f20cc405a2
|
|
| MD5 |
d00b3cdaf77993dbb260d9a0b41bef28
|
|
| BLAKE2b-256 |
15c91b40c275d4cc6339a70fe905724517a7b831e94125f51ed5cc0d090d1083
|
Provenance
The following attestation bundles were made for cavefiller-0.3.1.tar.gz:
Publisher:
ci-cd.yml on Desperadus/CaveFiller
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
cavefiller-0.3.1.tar.gz -
Subject digest:
0e3a7d204fce8ac98fba5bdabd87e7507f34e985d8ba612f93a4e0f20cc405a2 - Sigstore transparency entry: 924382804
- Sigstore integration time:
-
Permalink:
Desperadus/CaveFiller@cc0f55c693c57da84284d826fe3576eecfa0f184 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/Desperadus
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci-cd.yml@cc0f55c693c57da84284d826fe3576eecfa0f184 -
Trigger Event:
push
-
Statement type:
File details
Details for the file cavefiller-0.3.1-py3-none-any.whl.
File metadata
- Download URL: cavefiller-0.3.1-py3-none-any.whl
- Upload date:
- Size: 21.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ca92d1bb7f818eaa86de268ae2f94baf2964f96b69c1826889c8990463bd74d2
|
|
| MD5 |
ca5a9b394b33553c74359b83ba704141
|
|
| BLAKE2b-256 |
6b508db937b86ba19096cee72e2ef4f3c70dc75f9993c6344b2ecf8333dc3fb6
|
Provenance
The following attestation bundles were made for cavefiller-0.3.1-py3-none-any.whl:
Publisher:
ci-cd.yml on Desperadus/CaveFiller
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
cavefiller-0.3.1-py3-none-any.whl -
Subject digest:
ca92d1bb7f818eaa86de268ae2f94baf2964f96b69c1826889c8990463bd74d2 - Sigstore transparency entry: 924382813
- Sigstore integration time:
-
Permalink:
Desperadus/CaveFiller@cc0f55c693c57da84284d826fe3576eecfa0f184 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/Desperadus
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci-cd.yml@cc0f55c693c57da84284d826fe3576eecfa0f184 -
Trigger Event:
push
-
Statement type: