Skip to main content

Deduce the protein from a EM density

Project description

protein-detective

Documentation CI Research Software Directory Badge PyPI DOI

Python package to detect proteins in EM density maps.

It uses

  • Uniprot Sparql endpoint to search for proteins and their measured or predicted 3D structures.
  • powerfit to fit protein structure in a Electron Microscopy (EM) density map.

An example workflow:

graph LR;
    search{Search UniprotKB} --> |uniprot_accessions|fetchpdbe{Retrieve PDBe}
    search{Search UniprotKB} --> |uniprot_accessions|fetchad{Retrieve AlphaFold}
    fetchpdbe -->|mmcif_files| residuefilter{Filter on nr residues + write chain A}
    fetchad -->|pdb_files| densityfilter{Filter out low confidence}
    residuefilter -->|pdb_files| powerfit
    densityfilter -->|pdb_files| powerfit
    powerfit -->|*/solutions.out| solutions{Best scoring solutions}
    solutions -->|dataframe| fitmodels{Fit models}

Install

pip install protein-detective

Or to use the latest development version:

pip install git+https://github.com/haddocking/protein-detective.git

Usage

The main entry point is the protein-detective command line tool which has multiple subcommands to perform actions.

To use programmaticly, see the notebooks and API documentation.

Search Uniprot for structures

protein-detective search \
    --taxon-id 9606 \
    --reviewed \
    --subcellular-location-uniprot nucleus \
    --subcellular-location-go GO:0005634 \
    --molecular-function-go GO:0003677 \
    --limit 100 \
    ./mysession

(GO:0005634 is "Nucleus" and GO:0003677 is "DNA binding")

In ./mysession directory, you will find session.db file, which is a DuckDB database with search results.

To retrieve a bunch of structures

protein-detective retrieve ./mysession

In ./mysession directory, you will find mmCIF files from PDBe and PDB files and AlphaFold DB.

To filter AlphaFold structures on confidence

Filter AlphaFoldDB structures based on density confidence. Keeps entries with requested number of residues which have a confidence score above the threshold. Also writes pdb files with only those residues.

protein-detective density-filter \
    --confidence-threshold 50 \
    --min-residues 100 \
    --max-residues 1000 \
    ./mysession

To prune PDBe files

Make PDBe files smaller by only keeping first chain of found uniprot entry and renaming to chain A.

protein-detective prune-pdbs \
    --min-residues 100 \
    --max-residues 1000 \
    ./mysession

Powerfit

Generate the powerfit commands for the filtered and pruned structures.

protein-detective powerfit commands ../powerfit-tutorial/ribosome-KsgA.map 13 docs/session1

This will print commands to the terminal, which you can then run in whatever way you prefer. Like just sequentially, or with GNU parallel or as a Slurm array job.

Alternatively, you can use the protein-detective powerfit run ... command to run powerfit commands sequentially, which is useful for small datasets with rough options.

To print top 10 solutions to the terminal, you can use:

```shell
protein-detective powerfit report docs/session1

Outputs something like:

powerfit_run_id,structure,rank,cc,fishz,relz,translation,rotation,pdb_id,pdb_file,uniprot_acc
10,A8MT69_pdb4e45.ent_B2A,1,0.432,0.463,10.091,227.18:242.53:211.83,0.0:1.0:1.0:0.0:0.0:1.0:1.0:0.0:0.0,4E45,docs/session1/single_chain/A8MT69_pdb4e45.ent_B2A.pdb,A8MT69
10,A8MT69_pdb4ne5.ent_B2A,1,0.423,0.452,10.053,227.18:242.53:214.9,0.0:-0.0:-0.0:-0.604:0.797:0.0:0.797:0.604:0.0,4NE5,docs/session1/single_chain/A8MT69_pdb4ne5.ent_B2A.pdb,A8MT69
...

To generate model PDB files rotated/translated to PowerFit solutions, you can use:

protein-detective powerfit fit-models docs/session1

Contributing

For development information and contribution guidelines, please see CONTRIBUTING.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

protein_detective-0.3.0.tar.gz (8.6 MB view details)

Uploaded Source

Built Distribution

protein_detective-0.3.0-py3-none-any.whl (42.7 kB view details)

Uploaded Python 3

File details

Details for the file protein_detective-0.3.0.tar.gz.

File metadata

  • Download URL: protein_detective-0.3.0.tar.gz
  • Upload date:
  • Size: 8.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for protein_detective-0.3.0.tar.gz
Algorithm Hash digest
SHA256 11067d317fae3fffe58a3c61e12fe21aaa3b821a9cf359bd24ed1748b45d4e1e
MD5 771872202bfd622849e28a25e450f52e
BLAKE2b-256 1874fb2c5b3c700f5eea212df0a51665963dd819dfaa47c32c84590cd8cef018

See more details on using hashes here.

Provenance

The following attestation bundles were made for protein_detective-0.3.0.tar.gz:

Publisher: pypi-publish.yml on haddocking/protein-detective

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file protein_detective-0.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for protein_detective-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c24673d275e6485b05390a1e940394894220b6e46ad4b32d7629175b83c9e536
MD5 c07543f4a341f280f822ceb87562b666
BLAKE2b-256 8d8662093c36b752bf5b8c82931026cb763d242e08a28534cafd6fb7bdf20dea

See more details on using hashes here.

Provenance

The following attestation bundles were made for protein_detective-0.3.0-py3-none-any.whl:

Publisher: pypi-publish.yml on haddocking/protein-detective

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page