Skip to main content

Search/retrieve/filter proteins and protein structures

Project description

protein-quest

Documentation CI Research Software Directory Badge bio.tools PyPI DOI Codacy Badge FAIR checklist badge fair-software.eu Copy/paste detector

Python package to search/retrieve/filter proteins and protein structures.

It uses

The package is used by

An example workflow:

graph TB;
    taxonomy[/Search taxon/] -. taxon_ids .-> searchuniprot[/Search UniprotKB/]
    goterm[/Search GO term/] -. go_ids .-> searchuniprot[/Search UniprotKB/]
    searchuniprot --> |uniprot_accessions|searchpdbe[/Search PDBe/]
    searchuniprot --> |uniprot_accessions|searchaf[/Search Alphafold/]
    searchuniprot -. uniprot_accessions .-> searchemdb[/Search EMDB/]
    searchuniprot -. uniprot_accessions .-> searchuniprotdetails[/Search UniProt details/]
    searchintactionpartners[/Search interaction partners/] -.-x |uniprot_accessions|searchuniprot
    searchcomplexes[/Search complexes/]
    searchpdbe -->|pdb_ids|fetchpdbe[Retrieve PDBe]
    searchaf --> |uniprot_accessions|fetchad(Retrieve AlphaFold)
    searchemdb -. emdb_ids .->fetchemdb[Retrieve EMDB]
    fetchpdbe -->|mmcif_files| chainfilter{{Filter on chain of uniprot}}
    chainfilter --> |mmcif_files| residuefilter{{Filter on chain length}}
    fetchad -->|mmcif_files| confidencefilter{{Filter out low confidence}}
    confidencefilter --> |mmcif_files| ssfilter{{Filter on secondary structure}}
    residuefilter --> |mmcif_files| ssfilter
    ssfilter -. mmcif_files .-> convert2cif([Convert to cif])
    ssfilter -. mmcif_files .-> convert2uniprot_accessions([Convert to UniProt accessions])
    classDef dashedBorder stroke-dasharray: 5 5;
    goterm:::dashedBorder
    taxonomy:::dashedBorder
    searchemdb:::dashedBorder
    fetchemdb:::dashedBorder
    searchintactionpartners:::dashedBorder
    searchcomplexes:::dashedBorder
    searchuniprotdetails:::dashedBorder
    convert2cif:::dashedBorder
    convert2uniprot_accessions:::dashedBorder

(Dotted nodes and edges are side-quests.)

Install

pip install protein-quest

Or to use the latest development version:

pip install git+https://github.com/haddocking/protein-quest.git

Usage

The main entry point is the protein-quest command line tool which has multiple subcommands to perform actions.

To use programmaticly, see the Jupyter notebooks and API documentation.

While downloading or copying files it uses a global cache (located at ~/.cache/protein-quest) and hardlinks to save disk space and improve speed. This behavior can be customized with the --no-cache, --cache-dir, and --copy-method command line arguments.

Search Uniprot accessions

protein-quest search uniprot \
    --taxon-id 9606 \
    --reviewed \
    --subcellular-location-uniprot "nucleus" \
    --subcellular-location-go GO:0005634 \
    --molecular-function-go GO:0003677 \
    --limit 100 \
    uniprot_accs.txt

(GO:0005634 is "Nucleus" and GO:0003677 is "DNA binding")

Search for PDBe structures of uniprot accessions

protein-quest search pdbe uniprot_accs.txt pdbe.csv

pdbe.csv file is written containing the the PDB id and chain of each uniprot accession.

Search for Alphafold structures of uniprot accessions

protein-quest search alphafold uniprot_accs.txt alphafold.csv

Search for EMDB structures of uniprot accessions

protein-quest search emdb uniprot_accs.txt emdbs.csv

To retrieve PDB structure files

protein-quest retrieve pdbe pdbe.csv downloads-pdbe/

To retrieve AlphaFold structure files

protein-quest retrieve alphafold alphafold.csv downloads-af/

For each entry downloads the cif file.

To retrieve EMDB volume files

protein-quest retrieve emdb emdbs.csv downloads-emdb/

To filter AlphaFold structures on confidence

Filter AlphaFoldDB structures based on confidence (pLDDT). Keeps entries with requested number of residues which have a confidence score above the threshold. Also writes pdb files with only those residues.

protein-quest filter confidence \
    --confidence-threshold 50 \
    --min-residues 100 \
    --max-residues 1000 \
    ./downloads-af ./filtered

To filter PDBe files on chain of uniprot accession

Make PDBe files smaller by only keeping first chain of found uniprot entry and renaming to chain A.

protein-quest filter chain \
    pdbe.csv \
    ./downloads-pdbe ./filtered-chains

To filter PDBe files on nr of residues

protein-quest filter residue  \
    --min-residues 100 \
    --max-residues 1000 \
    ./filtered-chains ./filtered

To filter on secondary structure

To filter on structure being mostly alpha helices and have no beta sheets. See the following notebook to determine the ratio of secondary structure elements.

protein-quest filter secondary-structure \
    --ratio-min-helix-residues 0.5 \
    --ratio-max-sheet-residues 0.0 \
    --write-stats filtered-ss/stats.csv \
    ./filtered-chains ./filtered-ss

Search Taxonomy

protein-quest search taxonomy "Homo sapiens" -

Search Gene Ontology (GO)

You might not know what the identifier of a Gene Ontology term is at protein-quest search uniprot. You can use following command to search for a Gene Ontology (GO) term.

protein-quest search go --limit 5 --aspect cellular_component apoptosome -

Search for interaction partners

Use https://www.ebi.ac.uk/complexportal to find interaction partners of given UniProt accession.

protein-quest search interaction-partners Q05471 interaction-partners-of-Q05471.txt

The interaction-partners-of-Q05471.txt file contains uniprot accessions (one per line).

Search for complexes

Given Uniprot accessions search for macromolecular complexes at https://www.ebi.ac.uk/complexportal and return the complex entries and their members.

echo Q05471 | protein-quest search complexes - complexes.csv

The complexes.csv looks like

query_protein,complex_id,complex_url,complex_title,members
Q05471,CPX-2122,https://www.ebi.ac.uk/complexportal/complex/CPX-2122,Swr1 chromatin remodelling complex,P31376;P35817;P38326;P53201;P53930;P60010;P80428;Q03388;Q03433;Q03940;Q05471;Q06707;Q12464;Q12509

Search for UniProt details

To get details (like protein name, sequence length, organism) for a list of UniProt accessions.

protein-quest search uniprot-details uniprot_accs.txt uniprot_details.csv

The uniprot_details.csv looks like:

uniprot_accession,uniprot_id,sequence_length,reviewed,protein_name,taxon_id,taxon_name
A0A087WUV0,ZN892_HUMAN,522,True,Zinc finger protein 892,9606,Homo sapiens

Convert structure files to .cif format

Some tools (for example powerfit) only work with .cif files and not *.cif.gz or *.bcif files.

protein-quest convert structures --format cif --output-dir ./filtered-cif ./filtered-ss

Convert structure files to UniProt accessions

After running some filters you might want to know which UniProt accessions are still present in the filtered structures.

protein-quest convert uniprot ./filtered-ss uniprot_accs.filtered.txt

Model Context Protocol (MCP) server

Protein quest can also help LLMs like Claude Sonnet 4 by providing a set of tools for protein structures.

Protein Quest MCP workflow

To run mcp server you have to install the mcp extra with:

pip install protein-quest[mcp]

The server can be started with:

protein-quest mcp

The mcp server contains an prompt template to search/retrieve/filter candidate structures.

Shell autocompletion

The protein-quest command line tool supports shell autocompletion using shtab.

Initialize for bash shell with:

mkdir -p ~/.local/share/bash-completion/completions
protein-quest --print-completion bash > ~/.local/share/bash-completion/completions/protein-quest

Initialize for zsh shell with:

mkdir -p ~/.local/share/zsh/site-functions
protein-quest --print-completion zsh > ~/.local/share/zsh/site-functions/_protein-quest
fpath=("$HOME/.local/share/zsh/site-functions" $fpath)
autoload -Uz compinit && compinit

Contributing

For development information and contribution guidelines, please see CONTRIBUTING.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

protein_quest-1.0.0.tar.gz (4.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

protein_quest-1.0.0-py3-none-any.whl (67.0 kB view details)

Uploaded Python 3

File details

Details for the file protein_quest-1.0.0.tar.gz.

File metadata

  • Download URL: protein_quest-1.0.0.tar.gz
  • Upload date:
  • Size: 4.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for protein_quest-1.0.0.tar.gz
Algorithm Hash digest
SHA256 c9a4736c9c9ff6567e97d5aa40a9efec843b6fe09fd7a645622d1119919453f6
MD5 ad740849bcd4384eec65397a894a94ac
BLAKE2b-256 5f8f7890649e37173433683ba0738820ce3f0ab0eb51dbde04ffbefee3fe534f

See more details on using hashes here.

Provenance

The following attestation bundles were made for protein_quest-1.0.0.tar.gz:

Publisher: pypi-publish.yml on haddocking/protein-quest

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file protein_quest-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: protein_quest-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 67.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for protein_quest-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e03617a0c3ab61cf5e888eb9fabc844bf02a5460cb6f772092f23ed50a9a1a6e
MD5 6181bc1594c1317cc01439d193a032c6
BLAKE2b-256 d8e133f5fbf9f18e6b2d84e89572b1825fac0c88121ba41040b970c597e47899

See more details on using hashes here.

Provenance

The following attestation bundles were made for protein_quest-1.0.0-py3-none-any.whl:

Publisher: pypi-publish.yml on haddocking/protein-quest

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page