Skip to main content

Streamlined workflows for sequence and protein structure similarity-based gene cluster mining

Project description

csuite

Docs Downloads Bioconda Docker Image Version PyPI version

Description

The csuite is an orchestrator tool that integrates several query-based gene cluster mining tools into streamlined end-to-end workflows, removing the sizeable file plumbing and settings transferring overhead. It supports both searches using sequence or protein structure similarity, dereplicates hit sets respecting both gene cluster and host diversity, and makes attractive alignments and visualisations.

[!TIP] The csuite bundles several stand-alone gene cluster mining and processing tools. Its workflow commands have a similar design philosophy as MMseqs2's and FoldSeek's easy-* commands. They are end-to-end workflows with a reduced number of options, while the stand-alone tools provide more fine-grained control of the settings. By installing csuite, you install all these tools at once!

workflows

Features

  • Query-based gene cluster mining using sequence similarity (driven by cblaster).
  • Query-based gene cluster mining using protein structure similarity (driven by cfoldseeker).
  • Dereplicating hit sets with respect for both gene cluster and host taxonomic diversity (driven by CAGEcleaner).
  • Attractive interactive gene cluster alignment visualisations (driven by clinker).
  • Multiple workflows to facilitate each combination of search mode and data source.
  • Support for both local and remote search modes (such as NCBI nr, or AlphaFoldDB, resp.).
  • Automatic genomic context database construction from sets of protein sequences or structures (driven by cblaster makedb, or cfoldseeker-cds).
  • Support for extracting gene cluster Genbank files (driven by cblaster extract_clusters, or cfoldseeker-seqs).

Installation, documentation and more

For installation instructions, usage, explanations and more, head over to the csuite docs!

Citations

If you found csuite useful, please cite our manuscript:

De Vrieze, L., Masschelein, J. (2026) In preparation

The csuite member tools rely heavily on the following tools, so please give these proper credit as well:

Gilchrist, C.L.M., Booth, T.J., van Wersch, B., van Grieken, L., Medema, M.H., & Chooi, Y-H. (2021). cblaster: a remote search tool for rapid identification and visualisation of homologous gene clusters. Bioinformatics Advances, https://doi.org/10.1093/bioadv/vbab016
van Kempen, M., Kim, S.S., Tumescheit, C., Mirdita, M., Lee, J., Gilchrist, C.L.M., Söding, J., Steinegger, M. (2024). Fast and accurate protein structure search with Foldseek. Nature Biotechnology, 42, https://doi.org/10.1038/s41587-023-01773-0
Steinegger, M., & Söding, J. (2017). MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nature Biotechnology, 35, https://doi.org/10.1038/nbt.3988
Huckvale, E., Moseley, H.N.B. (2023). kegg_pull: a software package for the RESTful access and pulling from the Kyoto Encyclopedia of Gene and Genomes. BMC Bioinformatics, 24(78), https://doi.org/10.1186/s12859-023-05208-0
Salamzade, R., & Kalan, L. R. (2025). skDER and CiDDER: two scalable approaches for microbial genome dereplication. Microbial Genomics, 11(7), https://doi.org/10.1099/mgen.0.001438
Shaw, J., & Yu, Y. W. (2023). Fast and robust metagenomic sequence comparison through sparse chaining with skani. Nature Methods, 20(11), 1661–1665. https://doi.org/10.1038/s41592-023-02018-3

License

csuite is freely available under an MIT license.

Use of the third-party software, libraries or code referred to in the References section above may be governed by separate terms and conditions or license provisions. Your use of the third-party software, libraries or code is subject to any such terms and you should check that you can comply with any applicable restrictions or terms and conditions before use.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

csuite-0.1.0.tar.gz (20.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

csuite-0.1.0-py3-none-any.whl (20.1 kB view details)

Uploaded Python 3

File details

Details for the file csuite-0.1.0.tar.gz.

File metadata

  • Download URL: csuite-0.1.0.tar.gz
  • Upload date:
  • Size: 20.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.10.0

File hashes

Hashes for csuite-0.1.0.tar.gz
Algorithm Hash digest
SHA256 3f80fcffacb57b55c4f6c228790eabb77d27078444a5c06bbfc8c7407d8c3e33
MD5 121b5b1c1a959c6aa81ae922161d0125
BLAKE2b-256 fbb970340a60c2d36f41441815d625495a5f3556c2110310aebedf0703b5b76e

See more details on using hashes here.

File details

Details for the file csuite-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: csuite-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 20.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.10.0

File hashes

Hashes for csuite-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4ab56fade60cb09025ecbbbcee4da9539ad789a5b6747a0c6bfd9042ab536517
MD5 6019a43959e11b5d56297231b668381b
BLAKE2b-256 3fefd28a76aadf6c39fc44d99b3e190dd20dd0d9f7b297bdb03f32d20b7a59f5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page