Skip to main content

Find gene clusters via protein structure similarity

Project description

cfoldseeker

Docs Downloads Bioconda Docker Image Version PyPI version

Description

cfoldseeker finds homologous gene clusters via protein structural similarity. It searches structural homologs for your query protein structures using foldseek (both local and remote target databases supported) and identifies the genomically colocalised hits among these by fetching the genomic location of each protein's coding sequence (fetched from various remote cross-referencing APIs, or from a locally prepared database).

cfoldseeker has been designed as the structural similarity-driven sister tool of cblaster, which it tighly integrates for generating outputs. As such, cfoldseeker can naturally produce cblaster-style output and clinker visualisations.

[!TIP] Although cfoldseeker can be used as a stand-alone tool, it is the structural similarity-based discovery engine of the ✨ csuite ✨, our new integrated toolbox featuring streamlined workflows for both sequence- and protein structure-based gene cluster mining. Try it out!

workflow

Features

  • A remote search mode for searches against the AlphaFoldDB, leveraging the Foldseek webserver and various cross-referencing APIs for fetching genomic locations (kegg_pull, UniProt ID mapping, ENA Browser API).
  • A local search mode for searches against a local protein structure DB prepared with foldseek.
  • A local-clustered search mode for searches against a local foldseek DB of representative proteins derived from a sequence set preclustered with MMseqs2. If the representative protein of a sequence cluster is identified as a homolog, all other members are added to the hit set.
  • A helper tool to construct local genomic context databases: cfoldseeker-cds
  • Tight integration with cblaster, facilitating similar output and interactive clinker visualisations

Installation, documentation and more

For installation instructions, usage, a tutorial and more, head over to the cfoldseeker docs!

Citations

If you found cfoldseeker useful, please cite our manuscript:

De Vrieze, L., Masschelein, J. (2026) In preparation

cfoldseeker relies heavily on the following tools, so please give these proper credit as well.

Gilchrist, C.L.M., Booth, T.J., van Wersch, B., van Grieken, L., Medema, M.H., & Chooi, Y-H. (2021). cblaster: a remote search tool for rapid identification and visualisation of homologous gene clusters. Bioinformatics Advances, https://doi.org/10.1093/bioadv/vbab016
van Kempen, M., Kim, S.S., Tumescheit, C., Mirdita, M., Lee, J., Gilchrist, C.L.M., Söding, J., Steinegger, M. (2024). Fast and accurate protein structure search with Foldseek. Nature Biotechnology, 42, https://doi.org/10.1038/s41587-023-01773-0
Huckvale, E., Moseley, H.N.B. (2023). kegg_pull: a software package for the RESTful access and pulling from the Kyoto Encyclopedia of Gene and Genomes. BMC Bioinformatics, 24(78), https://doi.org/10.1186/s12859-023-05208-0

License

cfoldseeker is freely available under an MIT license.

Use of the third-party software, libraries or code referred to in the References section above may be governed by separate terms and conditions or license provisions. Your use of the third-party software, libraries or code is subject to any such terms and you should check that you can comply with any applicable restrictions or terms and conditions before use.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cfoldseeker-0.1.0.tar.gz (42.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cfoldseeker-0.1.0-py3-none-any.whl (45.6 kB view details)

Uploaded Python 3

File details

Details for the file cfoldseeker-0.1.0.tar.gz.

File metadata

  • Download URL: cfoldseeker-0.1.0.tar.gz
  • Upload date:
  • Size: 42.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.10.0

File hashes

Hashes for cfoldseeker-0.1.0.tar.gz
Algorithm Hash digest
SHA256 d5a395ca1615573919af920011a1e6bbd877103296e8739fdc494f7f8229bf37
MD5 6a4ebd63e3b001248a5bbffd00d7feb3
BLAKE2b-256 462d98b290a7e516acc2e5052c4da861792ba680c1c031f86327aa2bea530015

See more details on using hashes here.

File details

Details for the file cfoldseeker-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: cfoldseeker-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 45.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.10.0

File hashes

Hashes for cfoldseeker-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c32b2106aca0203c3d1ed5b5e1dbd050a6a1c14ac04dcd11a9a6bd92983344d7
MD5 6ad2ed61117e123184ca0ae4e3a7a984
BLAKE2b-256 5f8942f78a7e3223560fa202d081065db8b80ab0a07c07795dd2255307cd957c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page