Skip to main content

Graphormer Based Protein Sequence Design Package: GPD

Project description

GPD

Graphormer-based Protein Design (GPD) model deploys the Transformer on a graph-based representation of 3D protein structures and supplements it with Gaussian noise and a sequence random mask applied to node features, thereby enhancing sequence recovery and diversity. The performance of GPD model was significantly better than that of state-of-the-art model for ProteinMPNN on multiple independent tests, especially for sequence diversity.

image

Install

Quick Start

One can use pip to directly install our package

pip install fair-GPD

Install with conda

conda create -n GPD
source activate GPD
conda install pytorch==1.12.1 -c pytorch
conda install -c conda-forge mdtraj==1.9.8
conda install -c anaconda networkx==3.1

Note that GPD could be used with cuda, you can install the cudatoolkit package according to your own gpu version. Also, one could use our given environment.yml file to create an environment

conda env create -f environment.yml

Install with pip

One can use our given requirements.txt file for pip installation

pip install -r requirements.txt

Example

cd example/
sh submit_example_2_fixed.sh  (simple example)
sh submit_example_1.sh (fix some residue positions)

Output example:

outputs/example_1_outputs/1tca.fasta

> predicted model_0	acc: 0.3501577287066246	length: 317
APTGAAPPLTLPPATLRAQLAAKGASPEDLKNPVLILHGPGTDGAEDFAGFLVRLLKSKGYTPAYVDPDPN
ALDDIADDLEALALAAKYLAAGLGNKPFNVITHSLGGVALLTALAYHPELRDKIKRVVLVSPLPTGSDSLR
ALLAANTLRLLQFLSVKGSALDDAARKAGALTPLVPTTVIGHANDPLHYPTSLGSPASGAYVPDARVIDLY
SVYGPDFTVDHAEAVFSSLVRKALKAALTSSSGYARASDVGKSLRVSDPAKDLSAEQREAFLNLLAPAAAA
IANGKTGNACPPLPPEYLPAAPGAKGAGGVLTP
> predicted model_1	acc: 0.334384858044164	length: 317
APTGEPLPLLLPDATLLANVEADGADIDEVTNPVLLLHGLGSDGEEALGASLVALLKALGYTPLGVDPDPN
YTDDILDDAQALAAAARALAAGLGNKPLLVVGHSLGGVVVLLALRYNPALADLIASVILVAPAPRGSSEAR
PLIAAKILRPEDFLLLYGSALADALRAAGLDVPLVPTTVIDSADDPLHSPNALLSAESAAYVPGGTVVDLS
DIFGPDFTVSHAGAVLSPFLRKLLEAALASPTGVPREEDVGASLLDLDLAADLTAEERAAALNALAAYAAR
IAAGARFNAYPALPPELVPAAKGATDAAGTLKP
  • acc is recovery. Recovery was the proportion of the same amino acids at equivalent position between the native sequence and the designed sequence
  • length is the length of designed sequence.

Training the GPD model

Dataset

The GPD model was trained using the CATH 40% sequential non-redundancy dataset, with a split ratio of 29868:1000:103 for the training, validation, and testing sets, respectively. We further evaluated the performance of GPD using 39 de novo proteins, including 14 de novo proteins that exhibit significant structural differences from proteins belonging to natural folds.

Training the GPD model

train/train_encoder3.py Its training lasted 1 days and utilized 1 NVIDIA 40G A100 GPUs

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fair-GPD-0.0.2.tar.gz (10.2 MB view details)

Uploaded Source

Built Distribution

fair_GPD-0.0.2-py3-none-any.whl (10.2 MB view details)

Uploaded Python 3

File details

Details for the file fair-GPD-0.0.2.tar.gz.

File metadata

  • Download URL: fair-GPD-0.0.2.tar.gz
  • Upload date:
  • Size: 10.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 colorama/0.4.4 importlib-metadata/4.6.4 keyring/23.5.0 pkginfo/1.8.2 readme-renderer/34.0 requests-toolbelt/0.9.1 requests/2.25.1 rfc3986/1.5.0 tqdm/4.57.0 urllib3/1.26.5 CPython/3.10.12

File hashes

Hashes for fair-GPD-0.0.2.tar.gz
Algorithm Hash digest
SHA256 feae75051e9a16b41b3fc590e7039a48d5b7300eef2591959bef5b232d8ba20b
MD5 a0f2c6ea8f33f31400bdf2a21ed870ca
BLAKE2b-256 0d388d8d2b3680eb938f1086904c65d45e76e7af8fccdbe144cc9be6f1879881

See more details on using hashes here.

File details

Details for the file fair_GPD-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: fair_GPD-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 10.2 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 colorama/0.4.4 importlib-metadata/4.6.4 keyring/23.5.0 pkginfo/1.8.2 readme-renderer/34.0 requests-toolbelt/0.9.1 requests/2.25.1 rfc3986/1.5.0 tqdm/4.57.0 urllib3/1.26.5 CPython/3.10.12

File hashes

Hashes for fair_GPD-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 eae79256d9da93cd57cc5f92e2925a2df16e80cfa2019f71d6f4f7c213b73b0d
MD5 d535d02206817c29030eecbad564f44f
BLAKE2b-256 a98e50b9cce066c35e8ea983abc79e38f326d915e506c40011df7d438130f0a8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page