A tool to massively calculate protein scores using PDB files.
Project description
PDB-score
Features
Implemented GDT scoring for a large number of predicted and actual protein models, along with calculating RMSD after coordinate alignment, which can be used to evaluate the prediction models.
Installation
pip install PDB-score
Usage
psc [-h] -c CONTROL -t TREATMENT -o OUTPUT [-T THREAT] [-B BATCH] [-m {default,prealign}] [-d THRESHOLD] [-i MAX_ITERATIONS] [-s SAVE_LIMIT]
-cDirectory where the experimental PDB files are stored.-tDirectory where the predicted PDB files are stored.-oDirectory for saving the output scores.-TSpecify the number of cores, default is 4.-BSpecify the Batch size, default is 5000.-mSpecifies the atomic alignment method, defaulting to the Biopython implementation.
The following parameters apply only in -m prealign mode:
-dSpecifies the distance threshold (in Å) for excluding atoms during each iteration of optimization, defaulting to 1.0.-iSpecifies the maximum number of iterations, defaulting to 10.-sSpecifies the minimum proportion of atomic points allowed to be retained, defaulting to 1.0.
Notes
-sshould be precise to two decimal places and cannot exceed 1.0. It applies to the smaller molecule among two of different sizes.-oshould only specify the output directory and not the file name.- The output is a
.csvfile with a fixed file name, so take care to avoid overwriting it. - Only
.pdband.entfiles with the same name (excluding extensions) in the two input directories will be analyzed. - The
prealignmode has not been performance-optimized and may exhibit suboptimal performance.
Output
/path/to/output/protein_scores.csv
| name | RMSD | 1A | 2A | ... | 128A | Average |
|---|---|---|---|---|---|---|
| Protein1 | rmsd (float) | Score | Score | ... | Score | Score |
| Protein2 | rmsd (float) | Score | Score | ... | Score | Score |
| ... | ... | ... | ... | ... | ... | ... |
Calculation Method
Default Mode:
- Use Biopython for coordinate alignment.
- Calculate scores using the GDT algorithm.
- Remove all ligands; represent residue coordinates using the central carbon atom coordinate.
- If the number of central carbon atoms is unequal, excess/fewer residues are directly ignored (regardless of precision).
Prealign Mode:
- Use a custom
alignalgorithm for rigid coordinate alignment. - Calculate scores using the GDT algorithm.
- Remove all ligands; represent residue coordinates using the central carbon atom coordinate.
- If the number of central carbon atoms is unequal, the most matched chains and fragments are selected using the LCS method.
Performance
- Test Environment:
- Default parameters:
-T 4 -B 5000 - Test Machine:
Windows 11 PC, CPU Intel 12600k - Single sample size: 146KB with 154 residues
- Default parameters:
- Comparing 50,000 samples took 387061ms.
- Memory usage is less than 6GB.
Others
- When
-sis less than 1.0, it indicates that an equivalent proportion of information (excluding certain atoms) is allowed to be lost during the alignment process. This often results in better alignment performance but may affect the accuracy of the results. - If the protein data contains significant noise (e.g., the centroid is not located at the origin, includes irrelevant chains, or has a significantly unequal number of central carbon atoms), the
prealignalignment method typically performs better. - It is recommended to ensure minimal contamination of protein data before using the default alignment method to achieve more accurate results.
Acknowledgments
@SiriNatsume
Wishing you happiness :)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pdb_score-1.1.2.tar.gz.
File metadata
- Download URL: pdb_score-1.1.2.tar.gz
- Upload date:
- Size: 10.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.10.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2923a1902e7ac91af14e43da3d7bb19872f3229a3ed682df1ad86d02b967e782
|
|
| MD5 |
4b7bf78d612589f4e1f8eb2dfa054f9a
|
|
| BLAKE2b-256 |
def9856c25431ef284c263cc3a22d350592a290442dddbae3ba5a915638a66cb
|
File details
Details for the file PDB_score-1.1.2-py3-none-any.whl.
File metadata
- Download URL: PDB_score-1.1.2-py3-none-any.whl
- Upload date:
- Size: 13.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.10.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
70fd28ffea24f36e77622ee2798a1bdbdf3c3e305f9baa7800c722a20aaa4af0
|
|
| MD5 |
7a6bc345014b8b1da5354db17a4f3191
|
|
| BLAKE2b-256 |
e9dbe3e397bee246f114248ed369e272a93769ebeae55933ab246b79fe86cfac
|