Graph Network for protein-protein interface including language model features
Project description
deeprank-gnn-esm
Graph Network for protein-protein interface including language model features.
For details refer to our publication at https://academic.oup.com/bioinformaticsadvances/article/4/1/vbad191/7511844
For detailed protocol to use our deeprank-gnn-esm software, refer to our publication
at https://arxiv.org/abs/2407.16375
Installation
pip install deeprank-gnn-esm
CPU only
To avoid downloading the heavy CUDA libraries (~3GB), install the CPU-only torch first:
pip install torch --extra-index-url https://download.pytorch.org/whl/cpu
pip install deeprank-gnn-esm
GPU support
GPU support is included automatically — the default PyPI torch wheel bundles CUDA.
If your system requires a specific CUDA version, install torch first:
# example for CUDA 12.1
pip install torch --extra-index-url https://download.pytorch.org/whl/cu121
pip install deeprank-gnn-esm
Check pytorch.org for the right CUDA version for your system.
Usage
As a scoring function
We provide a command-line interface for deeprank-gnn-esm that can easily be
used to score protein-protein complexes. The command-line interface can be used
as follows:
$ deeprank-gnn-esm-predict -h
usage: deeprank-gnn-esm-predict [-h] pdb_file chain_id_1 chain_id_2 num_cores
positional arguments:
pdb_file Path to the PDB file.
chain_id_1 First chain ID.
chain_id_2 Second chain ID.
num_cores Number of cores
optional arguments:
-h, --help show this help message and exit
Example, score the 1B6C complex
# download it
$ wget https://files.rcsb.org/view/1B6C.pdb -q
$ deeprank-gnn-esm-predict 1B6C.pdb A B 1
2023-06-28 06:08:21,889 predict:64 INFO - Setting up workspace - /home/deeprank-gnn-esm/1B6C-gnn_esm_pred_A_B
2023-06-28 06:08:21,945 predict:72 INFO - Renumbering PDB file.
2023-06-28 06:08:22,294 predict:104 INFO - Reading sequence of PDB 1B6C.pdb
2023-06-28 06:08:22,423 predict:131 INFO - Generating embedding for protein sequence.
2023-06-28 06:08:22,423 predict:132 INFO - ################################################################################
2023-06-28 06:08:32,447 predict:138 INFO - Transferred model to GPU
2023-06-28 06:08:32,450 predict:147 INFO - Read /home/1B6C-gnn_esm_pred_A_B/all.fasta with 2 sequences
2023-06-28 06:08:32,459 predict:157 INFO - Processing 1 of 1 batches (2 sequences)
2023-06-28 06:08:36,462 predict:200 INFO - ################################################################################
2023-06-28 06:08:36,470 predict:205 INFO - Generating graph, using 79 processors
Graphs added to the HDF5 file
Embedding added to the /home/1B6C-gnn_esm_pred_A_B/graph.hdf5 file file
2023-06-28 06:09:03,345 predict:220 INFO - Graph file generated: /home/deeprank-gnn-esm/1B6C-gnn_esm_pred_A_B/graph.hdf5
2023-06-28 06:09:03,345 predict:226 INFO - Predicting fnat of protein complex.
2023-06-28 06:09:03,345 predict:234 INFO - Using device: cuda:0
# ...
2023-06-28 06:09:07,794 predict:280 INFO - Predicted fnat for 1B6C between chainA and chainB: 0.359
2023-06-28 06:09:07,803 predict:290 INFO - Output written to /home/deeprank-gnn-esm/1B6C-gnn_esm_pred/GNN_esm_prediction.csv
From the output above you can see that the predicted fnat for the 1B6C
complex is 0.359, this information is also written to the
GNN_esm_prediction.csv file.
The command above will generate a folder in the current working directory, containing the following:
1B6C-gnn_esm_pred_A_B
├── 1B6C.pdb #input pdb file
├── all.fasta #fasta sequence for the pdb input
├── 1B6C.A.pt #esm-2 embedding for chainA in protein 1B6C
├── 1B6C.B.pt #esm-2 embedding for chainB in protein 1B6C
├── graph.hdf5 #input protein graph in hdf5 format
├── GNN_esm_prediction.hdf5 #prediction output in hdf5 format
└── GNN_esm_prediction.csv #prediction output in csv format
As a framework
Note about input pdb files
To ensure the mapping between interface residue and esm-2 embeddings is correct, make sure that for all the chains, residue numbering in the PDB file is continuous and starts with residue '1'.
We provide a script (scripts/pdb_renumber.py) to do the numbering.
Generate esm-2 embeddings for your protein
-
To generate fasta sequences from PDBs, use script
get_fasta.pyusage: get_fasta.py [-h] pdb_file_path chain_id1 chain_id2 positional arguments: pdb_file_path Path to the directory containing PDB files chain_id1 Chain ID for the first sequence chain_id2 Chain ID for the second sequence options: -h, --help show this help message and exit python scripts/get_fasta.py tests/data/pdb/1ATN/ A B
-
Generate embeddings in bulk from combined fasta files, use the script provided inside esm-2 package,
$ python esm_2_installation_location/scripts/extract.py \ esm2_t33_650M_UR50D \ all.fasta \ tests/data/embedding/1ATN/ \ --repr_layers 0 32 33 \ --include mean per_tok
Replace 'esm_2_installation_location' with your installation location, 'all.fasta' with fasta sequence generated above, 'tests/data/embedding/1ATN/' with the output folder name for esm embeddings
Generate graph
-
Example code to generate residue graphs in hdf5 format:
from deeprank_gnn.GraphGenMP import GraphHDF5 pdb_path = "tests/data/pdb/1ATN/" pssm_path = "tests/data/pssm/1ATN/" embedding_path = "tests/data/embedding/1ATN/" nproc = 20 outfile = "1ATN_residue.hdf5" GraphHDF5( pdb_path = pdb_path, pssm_path = pssm_path, embedding_path = embedding_path, graph_type = "residue", outfile = outfile, nproc = nproc, #number of cores to use tmpdir="./tmpdir")
-
Example code to add continuous or binary targets to the hdf5 file
import h5py import random hdf5_file = h5py.File('1ATN_residue.hdf5', "r+") for mol in hdf5_file.keys(): fnat = random.random() bin_class = [1 if fnat > 0.3 else 0] hdf5_file.create_dataset(f"/{mol}/score/binclass", data=bin_class) hdf5_file.create_dataset(f"/{mol}/score/fnat", data=fnat) hdf5_file.close()
Use pre-trained models to predict
-
Example code to use pre-trained deeprank-gnn-esm model
from deeprank_gnn.ginet import GINet from deeprank_gnn.NeuralNet import NeuralNet database_test = "1ATN_residue.hdf5" gnn = GINet target = "fnat" edge_attr = ["dist"] threshold = 0.3 pretrained_model = 'deeprank-GNN-esm/paper_pretrained_models/scoring_of_docking_models/gnn_esm/treg_yfnat_b64_e20_lr0.001_foldall_esm.pth.tar' node_feature = ["type", "polarity", "bsa", "charge", "embedding"] device_name = "cuda:0" num_workers = 10 model = NeuralNet( database_test, gnn, device_name = device_name, edge_feature = edge_attr, node_feature = node_feature, target = target, num_workers = num_workers, pretrained_model = pretrained_model, threshold = threshold) model.test(hdf5 = "tmpdir/GNN_esm_prediction.hdf5")
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file deeprank_gnn_esm-1.0.1.tar.gz.
File metadata
- Download URL: deeprank_gnn_esm-1.0.1.tar.gz
- Upload date:
- Size: 635.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bb4feea0762c3b0838d0676e69b51c52abd985d16fb2ff5931c18f42e66918ec
|
|
| MD5 |
c6e2ca01f5710e888091efc96bb69602
|
|
| BLAKE2b-256 |
f41ea6191282bc3da30ae60ddd923898b74ab11fb2713132eecc0361eaed4165
|
Provenance
The following attestation bundles were made for deeprank_gnn_esm-1.0.1.tar.gz:
Publisher:
publish.yml on haddocking/deeprank-gnn-esm
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
deeprank_gnn_esm-1.0.1.tar.gz -
Subject digest:
bb4feea0762c3b0838d0676e69b51c52abd985d16fb2ff5931c18f42e66918ec - Sigstore transparency entry: 1065668965
- Sigstore integration time:
-
Permalink:
haddocking/deeprank-gnn-esm@351c27da6d8239e29e14894f21d2612416753977 -
Branch / Tag:
refs/tags/v1.0.1 - Owner: https://github.com/haddocking
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@351c27da6d8239e29e14894f21d2612416753977 -
Trigger Event:
release
-
Statement type:
File details
Details for the file deeprank_gnn_esm-1.0.1-py3-none-any.whl.
File metadata
- Download URL: deeprank_gnn_esm-1.0.1-py3-none-any.whl
- Upload date:
- Size: 631.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e8a770760009463632d4e7a95cdb65294dcd5444ccd2bafc3a2f4784196461ce
|
|
| MD5 |
6ec75d0e741d4ae5effadf8197a5d9ca
|
|
| BLAKE2b-256 |
315d358462d3371e763436170c56d977df20fe64c76a62aa44a163e072d4c81e
|
Provenance
The following attestation bundles were made for deeprank_gnn_esm-1.0.1-py3-none-any.whl:
Publisher:
publish.yml on haddocking/deeprank-gnn-esm
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
deeprank_gnn_esm-1.0.1-py3-none-any.whl -
Subject digest:
e8a770760009463632d4e7a95cdb65294dcd5444ccd2bafc3a2f4784196461ce - Sigstore transparency entry: 1065668978
- Sigstore integration time:
-
Permalink:
haddocking/deeprank-gnn-esm@351c27da6d8239e29e14894f21d2612416753977 -
Branch / Tag:
refs/tags/v1.0.1 - Owner: https://github.com/haddocking
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@351c27da6d8239e29e14894f21d2612416753977 -
Trigger Event:
release
-
Statement type: