Find gene clusters via protein structure similarity
Project description
cfoldseeker
Description
cfoldseeker finds homologous gene clusters via protein structural similarity. It searches structural homologs for your query protein structures using foldseek (both local and remote target databases supported) and identifies the genomically colocalised hits among these by fetching the genomic location of each protein's coding sequence (fetched from various remote cross-referencing APIs, or from a locally prepared database).
cfoldseeker has been designed as the structural similarity-driven sister tool of cblaster, which it tighly integrates for generating outputs. As such, cfoldseeker can naturally produce cblaster-style output and clinker visualisations.
[!TIP] Although
cfoldseekercan be used as a stand-alone tool, it is the structural similarity-based discovery engine of the ✨csuite✨, our new integrated toolbox featuring streamlined workflows for both sequence- and protein structure-based gene cluster mining. Try it out!
Features
- A remote search mode for searches against the AlphaFoldDB, leveraging the Foldseek webserver and various cross-referencing APIs for fetching genomic locations (
kegg_pull, UniProt ID mapping, ENA Browser API). - A local search mode for searches against a local protein structure DB prepared with
foldseek. - A local-clustered search mode for searches against a local
foldseekDB of representative proteins derived from a sequence set preclustered withMMseqs2. If the representative protein of a sequence cluster is identified as a homolog, all other members are added to the hit set. - A helper tool to construct local genomic context databases:
cfoldseeker-cds - Tight integration with
cblaster, facilitating similar output and interactiveclinkervisualisations
Installation, documentation and more
For installation instructions, usage, a tutorial and more, head over to the cfoldseeker docs!
Citations
If you found cfoldseeker useful, please cite our manuscript:
De Vrieze, L., Masschelein, J. (2026) In preparation
cfoldseeker relies heavily on the following tools, so please give these proper credit as well.
Gilchrist, C.L.M., Booth, T.J., van Wersch, B., van Grieken, L., Medema, M.H., & Chooi, Y-H. (2021). cblaster: a remote search tool for rapid identification and visualisation of homologous gene clusters. Bioinformatics Advances, https://doi.org/10.1093/bioadv/vbab016
van Kempen, M., Kim, S.S., Tumescheit, C., Mirdita, M., Lee, J., Gilchrist, C.L.M., Söding, J., Steinegger, M. (2024). Fast and accurate protein structure search with Foldseek. Nature Biotechnology, 42, https://doi.org/10.1038/s41587-023-01773-0
Huckvale, E., Moseley, H.N.B. (2023). kegg_pull: a software package for the RESTful access and pulling from the Kyoto Encyclopedia of Gene and Genomes. BMC Bioinformatics, 24(78), https://doi.org/10.1186/s12859-023-05208-0
License
cfoldseeker is freely available under an MIT license.
Use of the third-party software, libraries or code referred to in the References section above may be governed by separate terms and conditions or license provisions. Your use of the third-party software, libraries or code is subject to any such terms and you should check that you can comply with any applicable restrictions or terms and conditions before use.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cfoldseeker-0.1.0.tar.gz.
File metadata
- Download URL: cfoldseeker-0.1.0.tar.gz
- Upload date:
- Size: 42.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.10.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d5a395ca1615573919af920011a1e6bbd877103296e8739fdc494f7f8229bf37
|
|
| MD5 |
6a4ebd63e3b001248a5bbffd00d7feb3
|
|
| BLAKE2b-256 |
462d98b290a7e516acc2e5052c4da861792ba680c1c031f86327aa2bea530015
|
File details
Details for the file cfoldseeker-0.1.0-py3-none-any.whl.
File metadata
- Download URL: cfoldseeker-0.1.0-py3-none-any.whl
- Upload date:
- Size: 45.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.10.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c32b2106aca0203c3d1ed5b5e1dbd050a6a1c14ac04dcd11a9a6bd92983344d7
|
|
| MD5 |
6ad2ed61117e123184ca0ae4e3a7a984
|
|
| BLAKE2b-256 |
5f8942f78a7e3223560fa202d081065db8b80ab0a07c07795dd2255307cd957c
|