plm-cs

Protein chemical shift prediction based on Protein Language Model

These details have not been verified by PyPI

Project links

Homepage

Project description

PLM-CS

Predict protein chemical shifts from sequence

Train your model

If you want to train your own PLM-CS model, this repository provides all the tools and data.

Requirement

'torch == 2.5.0',
'torchaudio == 2.5.0',
'torchvision == 0.20.0',
'fair-esm == 2.0.0',
'numpy == 2.1.2',
'biopython == 1.84',
'pandas == 2.2.3'

Train with RefDB dataset

If you want to train with the data we provide and get the results in the paper, all the processes are already provided in the ipynb file train_your_model.

Training set

We provide the complete training set data in RefDB training dataset. Each file in this folder is in nmrstar format, and each file corresponds to a protein. All proteins contained in the SHIFTX test are removed from it.

Training parameters

Different atom types correspond to different optimizer strategies.You can modify the corresponding parameters in the train.py according to your trained model. The default number of steps for an iteration is 20,000, but you can change it to 5,000 to achieve very close performance while reducing training time

parameters	Cα	Cβ	C	Hα	H	N
learning rate	0.02	5e-4	0.002	0.01	5e-4	5e-4
optimizer	SGD	Adam	Adam	SGD	Adam	Adam

Train with your own dataset

Training set processing

For convenience, the reasoning process of the ESM model is separate from the training process of our regression model. Therefore, we first use ESM-650M to process the data. In esm_process.py we provide a transformation function for the esm model, you need to provide three parameters：protein sequence, chemical shifts, mask. The sequence representing the protein, the sequence specifying the chemical shift of the atom, and the mask sequence (if any of the tags for a particular sequence are missing). These three sequences should be of equal length. The function outputs four processed data, you need to concat multiple sequences of data in the batch size dimension and save them as the tensordataset in this manner.

dataset = TensorDataset(all_esm_vec, all_label, all_mask, all_padding_mask)

The final dimension of each parameter should be: b×512×1280, b×512×1, b×512×1, b×512×1

Train

Modify the path in the train.py to your own parh. Also, be aware that this can only train a model of one type of atom at a time.

Use PLM-CS through python SDK

Install with pip

pip install plm-cs

Or install after git clone

After cloning the complete project file locally, run the following command in the folder containing setup.py

pip install .

Use plm-cs

Using commands similar to the one below, enter the protein sequence and the path to save the result to generate a csv file predicting the chemical shift at the specified location

plm-cs YOURSEQUENCE -your_save_path

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

3.4

Jan 2, 2025

3.3

Jan 1, 2025

3.2

Dec 30, 2024

3.1

Dec 29, 2024

3.0

Dec 29, 2024

2.9

Dec 28, 2024

2.4

Dec 28, 2024

2.3

Dec 26, 2024

2.2

Dec 20, 2024

2.0

Dec 20, 2024

1.9

Dec 18, 2024

1.8

Dec 18, 2024

1.7

Dec 18, 2024

This version

1.6

Dec 18, 2024

1.5

Dec 17, 2024

1.4

Dec 17, 2024

1.3

Dec 17, 2024

1.1

Dec 17, 2024

0.3

Sep 3, 2024

0.1

Aug 29, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

plm-cs-1.6.tar.gz (7.5 kB view details)

Uploaded Dec 18, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

plm_cs-1.6-py3-none-any.whl (8.0 kB view details)

Uploaded Dec 18, 2024 Python 3

File details

Details for the file plm-cs-1.6.tar.gz.

File metadata

Download URL: plm-cs-1.6.tar.gz
Upload date: Dec 18, 2024
Size: 7.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.10.14

File hashes

Hashes for plm-cs-1.6.tar.gz
Algorithm	Hash digest
SHA256	`7f4b05cafcf90fa5dc3c3335914ee7afe152ae0de41253189582156b921c808f`
MD5	`2390f0010ac5d7bee9ca28480ba7702d`
BLAKE2b-256	`3ce948e089c7de2f47e83207e768b031453a4959615ea7069c92ad6b50c503bf`

See more details on using hashes here.

File details

Details for the file plm_cs-1.6-py3-none-any.whl.

File metadata

Download URL: plm_cs-1.6-py3-none-any.whl
Upload date: Dec 18, 2024
Size: 8.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.10.14

File hashes

Hashes for plm_cs-1.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d58945e8c4fe874c4bbc94bafc7111c7b1fe3d2ca37f63fc58f19ea80aeb4d3c`
MD5	`4ee0f31aa0eb3ded1108a507b7697845`
BLAKE2b-256	`800e65674ddd2495ce27e2fe3f8093f4af37379c1e036dae3bd77c2a13bc5979`

See more details on using hashes here.

plm-cs 1.6

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

PLM-CS

Predict protein chemical shifts from sequence

Train your model

Requirement

Train with RefDB dataset

Training set

Training parameters

Train with your own dataset

Training set processing

Train

Use PLM-CS through python SDK

Install with pip

Or install after git clone

Use plm-cs

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes