Graphormer Based Protein Sequence Design Package: GPD
Project description
GPD
Graphormer-based Protein Design (GPD) model deploys the Transformer on a graph-based representation of 3D protein structures and supplements it with Gaussian noise and a sequence random mask applied to node features, thereby enhancing sequence recovery and diversity. The performance of GPD model was significantly better than that of state-of-the-art model for ProteinMPNN on multiple independent tests, especially for sequence diversity.
Install
Quick Start
One can use pip to directly install our package
pip install fair-GPD
Install with conda
conda create -n GPD
source activate GPD
conda install pytorch==1.12.1 -c pytorch
conda install -c conda-forge mdtraj==1.9.8
conda install -c anaconda networkx==3.1
Note that GPD could be used with cuda, you can install the cudatoolkit package according to your own gpu version.
Also, one could use our given environment.yml
file to create an environment
conda env create -f environment.yml
Install with pip
One can use our given requirements.txt
file for pip installation
pip install -r requirements.txt
Example
cd example/
sh submit_example_2_fixed.sh (simple example)
sh submit_example_1.sh (fix some residue positions)
Output example:
outputs/example_1_outputs/1tca.fasta
> predicted model_0 acc: 0.3501577287066246 length: 317
APTGAAPPLTLPPATLRAQLAAKGASPEDLKNPVLILHGPGTDGAEDFAGFLVRLLKSKGYTPAYVDPDPN
ALDDIADDLEALALAAKYLAAGLGNKPFNVITHSLGGVALLTALAYHPELRDKIKRVVLVSPLPTGSDSLR
ALLAANTLRLLQFLSVKGSALDDAARKAGALTPLVPTTVIGHANDPLHYPTSLGSPASGAYVPDARVIDLY
SVYGPDFTVDHAEAVFSSLVRKALKAALTSSSGYARASDVGKSLRVSDPAKDLSAEQREAFLNLLAPAAAA
IANGKTGNACPPLPPEYLPAAPGAKGAGGVLTP
> predicted model_1 acc: 0.334384858044164 length: 317
APTGEPLPLLLPDATLLANVEADGADIDEVTNPVLLLHGLGSDGEEALGASLVALLKALGYTPLGVDPDPN
YTDDILDDAQALAAAARALAAGLGNKPLLVVGHSLGGVVVLLALRYNPALADLIASVILVAPAPRGSSEAR
PLIAAKILRPEDFLLLYGSALADALRAAGLDVPLVPTTVIDSADDPLHSPNALLSAESAAYVPGGTVVDLS
DIFGPDFTVSHAGAVLSPFLRKLLEAALASPTGVPREEDVGASLLDLDLAADLTAEERAAALNALAAYAAR
IAAGARFNAYPALPPELVPAAKGATDAAGTLKP
- acc is recovery. Recovery was the proportion of the same amino acids at equivalent position between the native sequence and the designed sequence
- length is the length of designed sequence.
Training the GPD model
Dataset
The GPD model was trained using the CATH 40% sequential non-redundancy dataset, with a split ratio of 29868:1000:103 for the training, validation, and testing sets, respectively. We further evaluated the performance of GPD using 39 de novo proteins, including 14 de novo proteins that exhibit significant structural differences from proteins belonging to natural folds.
- data/cath-dataset-nonredundant-S40-v4_3_0.pdb is CATH 40% sequential non-redundancy dataset downloaded from http://download.cathdb.info/cath/releases/all-releases/v4_3_0/non-redundant-data-sets/cath-dataset-nonredundant-S40-v4_3_0.pdb.tgz
- data/sc103 is 103 single chain proteins
- data/denovo39 is 39 de novo proteins
- data/denovo14 is 14 de novo proteins
Training the GPD model
train/train_encoder3.py Its training lasted 1 days and utilized 1 NVIDIA 40G A100 GPUs
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file fair-GPD-0.0.2.tar.gz
.
File metadata
- Download URL: fair-GPD-0.0.2.tar.gz
- Upload date:
- Size: 10.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 colorama/0.4.4 importlib-metadata/4.6.4 keyring/23.5.0 pkginfo/1.8.2 readme-renderer/34.0 requests-toolbelt/0.9.1 requests/2.25.1 rfc3986/1.5.0 tqdm/4.57.0 urllib3/1.26.5 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | feae75051e9a16b41b3fc590e7039a48d5b7300eef2591959bef5b232d8ba20b |
|
MD5 | a0f2c6ea8f33f31400bdf2a21ed870ca |
|
BLAKE2b-256 | 0d388d8d2b3680eb938f1086904c65d45e76e7af8fccdbe144cc9be6f1879881 |
File details
Details for the file fair_GPD-0.0.2-py3-none-any.whl
.
File metadata
- Download URL: fair_GPD-0.0.2-py3-none-any.whl
- Upload date:
- Size: 10.2 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 colorama/0.4.4 importlib-metadata/4.6.4 keyring/23.5.0 pkginfo/1.8.2 readme-renderer/34.0 requests-toolbelt/0.9.1 requests/2.25.1 rfc3986/1.5.0 tqdm/4.57.0 urllib3/1.26.5 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | eae79256d9da93cd57cc5f92e2925a2df16e80cfa2019f71d6f4f7c213b73b0d |
|
MD5 | d535d02206817c29030eecbad564f44f |
|
BLAKE2b-256 | a98e50b9cce066c35e8ea983abc79e38f326d915e506c40011df7d438130f0a8 |