Skip to main content

A tool to massively calculate protein scores using PDB files.

Project description

PDB-score

中文文档请点击

Features

Implemented GDT scoring for a large number of predicted and actual protein models, along with calculating RMSD after coordinate alignment, which can be used to evaluate the prediction models.

Installation

pip install PDB-score

Usage

psc [-h] -c CONTROL -t TREATMENT -o OUTPUT [-T THREAT] [-B BATCH] [-m {default,prealign}] [-d THRESHOLD] [-i MAX_ITERATIONS] [-s SAVE_LIMIT]
  • -c Directory where the experimental PDB files are stored.
  • -t Directory where the predicted PDB files are stored.
  • -o Directory for saving the output scores.
  • -T Specify the number of cores, default is 4.
  • -B Specify the Batch size, default is 5000.
  • -m Specifies the atomic alignment method, defaulting to the Biopython implementation.

The following parameters apply only in -m prealign mode:

  • -d Specifies the distance threshold (in Å) for excluding atoms during each iteration of optimization, defaulting to 1.0.
  • -i Specifies the maximum number of iterations, defaulting to 10.
  • -s Specifies the minimum proportion of atomic points allowed to be retained, defaulting to 1.0.

Notes

  • -s should be precise to two decimal places and cannot exceed 1.0. It applies to the smaller molecule among two of different sizes.
  • -o should only specify the output directory and not the file name.
  • The output is a .csv file with a fixed file name, so take care to avoid overwriting it.
  • Only .pdb and .ent files with the same name (excluding extensions) in the two input directories will be analyzed.
  • The prealign mode has not been performance-optimized and may exhibit suboptimal performance.

Output

/path/to/output/protein_scores.csv

name RMSD 1A 2A ... 128A Average
Protein1 rmsd (float) Score Score ... Score Score
Protein2 rmsd (float) Score Score ... Score Score
... ... ... ... ... ... ...

Calculation Method

Default Mode:

  • Use Biopython for coordinate alignment.
  • Calculate scores using the GDT algorithm.
  • Remove all ligands; represent residue coordinates using the central carbon atom coordinate.
  • If the number of central carbon atoms is unequal, excess/fewer residues are directly ignored (regardless of precision).

Prealign Mode:

  • Use a custom align algorithm for rigid coordinate alignment.
  • Calculate scores using the GDT algorithm.
  • Remove all ligands; represent residue coordinates using the central carbon atom coordinate.
  • If the number of central carbon atoms is unequal, the most matched chains and fragments are selected using the LCS method.

Performance

  • Test Environment:
    • Default parameters: -T 4 -B 5000
    • Test Machine: Windows 11 PC, CPU Intel 12600k
    • Single sample size: 146KB with 154 residues
  • Comparing 50,000 samples took 387061ms.
  • Memory usage is less than 6GB.

Others

  • When -s is less than 1.0, it indicates that an equivalent proportion of information (excluding certain atoms) is allowed to be lost during the alignment process. This often results in better alignment performance but may affect the accuracy of the results.
  • If the protein data contains significant noise (e.g., the centroid is not located at the origin, includes irrelevant chains, or has a significantly unequal number of central carbon atoms), the prealign alignment method typically performs better.
  • It is recommended to ensure minimal contamination of protein data before using the default alignment method to achieve more accurate results.

Acknowledgments

@SiriNatsume
Wishing you happiness :)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdb_score-1.1.2.tar.gz (10.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

PDB_score-1.1.2-py3-none-any.whl (13.1 kB view details)

Uploaded Python 3

File details

Details for the file pdb_score-1.1.2.tar.gz.

File metadata

  • Download URL: pdb_score-1.1.2.tar.gz
  • Upload date:
  • Size: 10.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.10.15

File hashes

Hashes for pdb_score-1.1.2.tar.gz
Algorithm Hash digest
SHA256 2923a1902e7ac91af14e43da3d7bb19872f3229a3ed682df1ad86d02b967e782
MD5 4b7bf78d612589f4e1f8eb2dfa054f9a
BLAKE2b-256 def9856c25431ef284c263cc3a22d350592a290442dddbae3ba5a915638a66cb

See more details on using hashes here.

File details

Details for the file PDB_score-1.1.2-py3-none-any.whl.

File metadata

  • Download URL: PDB_score-1.1.2-py3-none-any.whl
  • Upload date:
  • Size: 13.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.10.15

File hashes

Hashes for PDB_score-1.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 70fd28ffea24f36e77622ee2798a1bdbdf3c3e305f9baa7800c722a20aaa4af0
MD5 7a6bc345014b8b1da5354db17a4f3191
BLAKE2b-256 e9dbe3e397bee246f114248ed369e272a93769ebeae55933ab246b79fe86cfac

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page