Deduce the protein from a EM density
Project description
protein-detective
Python package to detect proteins in EM density maps.
It uses
- Uniprot Sparql endpoint to search for proteins and their measured or predicted 3D structures.
- powerfit to fit protein structure in a Electron Microscopy (EM) density map.
An example workflow:
graph LR;
search{Search UniprotKB} --> |uniprot_accessions|fetchpdbe{Retrieve PDBe}
search{Search UniprotKB} --> |uniprot_accessions|fetchad{Retrieve AlphaFold}
fetchpdbe -->|mmcif_files| residuefilter{Filter on nr residues + write chain A}
fetchad -->|pdb_files| densityfilter{Filter out low confidence}
residuefilter -->|pdb_files| powerfit
densityfilter -->|pdb_files| powerfit
powerfit -->|*/solutions.out| solutions{Best scoring solutions}
solutions -->|dataframe| fitmodels{Fit models}
Install
pip install protein-detective
Or to use the latest development version:
pip install git+https://github.com/haddocking/protein-detective.git
Usage
The main entry point is the protein-detective
command line tool which has multiple subcommands to perform actions.
To use programmaticly, see the notebooks and API documentation.
Search Uniprot for structures
protein-detective search \
--taxon-id 9606 \
--reviewed \
--subcellular-location-uniprot nucleus \
--subcellular-location-go GO:0005634 \
--molecular-function-go GO:0003677 \
--limit 100 \
./mysession
(GO:0005634 is "Nucleus" and GO:0003677 is "DNA binding")
In ./mysession
directory, you will find session.db file, which is a DuckDB database with search results.
To retrieve a bunch of structures
protein-detective retrieve ./mysession
In ./mysession
directory, you will find mmCIF files from PDBe and PDB files and AlphaFold DB.
To filter AlphaFold structures on confidence
Filter AlphaFoldDB structures based on density confidence. Keeps entries with requested number of residues which have a confidence score above the threshold. Also writes pdb files with only those residues.
protein-detective density-filter \
--confidence-threshold 50 \
--min-residues 100 \
--max-residues 1000 \
./mysession
To prune PDBe files
Make PDBe files smaller by only keeping first chain of found uniprot entry and renaming to chain A.
protein-detective prune-pdbs \
--min-residues 100 \
--max-residues 1000 \
./mysession
Powerfit
Generate the powerfit commands for the filtered and pruned structures.
protein-detective powerfit commands ../powerfit-tutorial/ribosome-KsgA.map 13 docs/session1
This will print commands to the terminal, which you can then run in whatever way you prefer. Like just sequentially, or with GNU parallel or as a Slurm array job.
Alternatively, you can use the protein-detective powerfit run ...
command to run powerfit commands sequentially, which is useful for small datasets with rough options.
To print top 10 solutions to the terminal, you can use:
```shell
protein-detective powerfit report docs/session1
Outputs something like:
powerfit_run_id,structure,rank,cc,fishz,relz,translation,rotation,pdb_id,pdb_file,uniprot_acc
10,A8MT69_pdb4e45.ent_B2A,1,0.432,0.463,10.091,227.18:242.53:211.83,0.0:1.0:1.0:0.0:0.0:1.0:1.0:0.0:0.0,4E45,docs/session1/single_chain/A8MT69_pdb4e45.ent_B2A.pdb,A8MT69
10,A8MT69_pdb4ne5.ent_B2A,1,0.423,0.452,10.053,227.18:242.53:214.9,0.0:-0.0:-0.0:-0.604:0.797:0.0:0.797:0.604:0.0,4NE5,docs/session1/single_chain/A8MT69_pdb4ne5.ent_B2A.pdb,A8MT69
...
To generate model PDB files rotated/translated to PowerFit solutions, you can use:
protein-detective powerfit fit-models docs/session1
Contributing
For development information and contribution guidelines, please see CONTRIBUTING.md.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file protein_detective-0.3.0.tar.gz
.
File metadata
- Download URL: protein_detective-0.3.0.tar.gz
- Upload date:
- Size: 8.6 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
11067d317fae3fffe58a3c61e12fe21aaa3b821a9cf359bd24ed1748b45d4e1e
|
|
MD5 |
771872202bfd622849e28a25e450f52e
|
|
BLAKE2b-256 |
1874fb2c5b3c700f5eea212df0a51665963dd819dfaa47c32c84590cd8cef018
|
Provenance
The following attestation bundles were made for protein_detective-0.3.0.tar.gz
:
Publisher:
pypi-publish.yml
on haddocking/protein-detective
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1
-
Predicate type:
https://docs.pypi.org/attestations/publish/v1
-
Subject name:
protein_detective-0.3.0.tar.gz
-
Subject digest:
11067d317fae3fffe58a3c61e12fe21aaa3b821a9cf359bd24ed1748b45d4e1e
- Sigstore transparency entry: 257477797
- Sigstore integration time:
-
Permalink:
haddocking/protein-detective@514af719e4cd8b1b7152124ed4caa34ab4e8b4c2
-
Branch / Tag:
refs/tags/v0.3.0
- Owner: https://github.com/haddocking
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com
-
Runner Environment:
github-hosted
-
Publication workflow:
pypi-publish.yml@514af719e4cd8b1b7152124ed4caa34ab4e8b4c2
-
Trigger Event:
release
-
Statement type:
File details
Details for the file protein_detective-0.3.0-py3-none-any.whl
.
File metadata
- Download URL: protein_detective-0.3.0-py3-none-any.whl
- Upload date:
- Size: 42.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
c24673d275e6485b05390a1e940394894220b6e46ad4b32d7629175b83c9e536
|
|
MD5 |
c07543f4a341f280f822ceb87562b666
|
|
BLAKE2b-256 |
8d8662093c36b752bf5b8c82931026cb763d242e08a28534cafd6fb7bdf20dea
|
Provenance
The following attestation bundles were made for protein_detective-0.3.0-py3-none-any.whl
:
Publisher:
pypi-publish.yml
on haddocking/protein-detective
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1
-
Predicate type:
https://docs.pypi.org/attestations/publish/v1
-
Subject name:
protein_detective-0.3.0-py3-none-any.whl
-
Subject digest:
c24673d275e6485b05390a1e940394894220b6e46ad4b32d7629175b83c9e536
- Sigstore transparency entry: 257477804
- Sigstore integration time:
-
Permalink:
haddocking/protein-detective@514af719e4cd8b1b7152124ed4caa34ab4e8b4c2
-
Branch / Tag:
refs/tags/v0.3.0
- Owner: https://github.com/haddocking
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com
-
Runner Environment:
github-hosted
-
Publication workflow:
pypi-publish.yml@514af719e4cd8b1b7152124ed4caa34ab4e8b4c2
-
Trigger Event:
release
-
Statement type: