Skip to main content

Genomic redundancy removal tool for cblaster hit sets

Project description

CAGEcleaner

install with bioconda Conda Preprint DOI

[!NOTE] CAGEcleaner currently only has full support for sessions from cblaster's remote mode! Local mode support coming soon!

[!TIP] CAGEcleaner will be integrated into cblaster! You can already check out the development version at this fork.

Outline

CAGEcleaner removes genomic redundancy from gene cluster hit sets identified by cblaster. The redundancy in target databases used by cblaster often propagates into the result set, requiring extensive manual curation before downstream analyses and visualisation can be carried out.

Given a session file from a cblaster run (or from a CAGECAT run), CAGEcleaner retrieves all hit-associated genome assemblies, groups these into assembly clusters by ANI and identifies a representative assembly for each assembly cluster using skDER. In addition, CAGEcleaner can retain hits that are divergent at the gene cluster level but are associated with non-representative genomes. Finally, CAGEcleaner returns a filtered cblaster session file as well as a list of retained gene cluster IDs for more straightforward downstream analysis.

workflow

Installation and more

For installation instructions, usage, explanations and more, head over to the CAGEcleaner wiki!

[!NOTE] CAGEcleaner has no direct Windows support. If you happen to have it installed successfully on your Windows system, you probably just installed v1.1.0, an older version with known bugs! There are alternative options to run CAGEcleaner on Windows.

Citations

If you found CAGEcleaner useful, please cite our manuscript:

De Vrieze, L., Biltjes, M., Lukashevich, S., Tsurumi, K., Masschelein, J. (2025) CAGEcleaner: reducing genomic redundancy in gene cluster mining. Bioinformatics https://doi.org/10.1093/bioinformatics/btaf373

CAGEcleaner relies heavily on the skDER genome dereplication tool and its main dependency skani, so please give these proper credit as well.

Salamzade, R., & Kalan, L. R. (2023). skDER: microbial genome dereplication approaches for comparative and metagenomic applications. bioRxiv https://doi.org/10.1101/2023.09.27.559801`
Shaw, J., & Yu, Y. W. (2023). Fast and robust metagenomic sequence comparison through sparse chaining with skani. Nature Methods, 20(11), 1661–1665. https://doi.org/10.1038/s41592-023-02018-3

License

CAGEcleaner is freely available under an MIT license.

Use of the third-party software, libraries or code referred to in the References section above may be governed by separate terms and conditions or license provisions. Your use of the third-party software, libraries or code is subject to any such terms and you should check that you can comply with any applicable restrictions or terms and conditions before use.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cagecleaner-1.2.3.tar.gz (17.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cagecleaner-1.2.3-py3-none-any.whl (17.4 kB view details)

Uploaded Python 3

File details

Details for the file cagecleaner-1.2.3.tar.gz.

File metadata

  • Download URL: cagecleaner-1.2.3.tar.gz
  • Upload date:
  • Size: 17.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.10.0

File hashes

Hashes for cagecleaner-1.2.3.tar.gz
Algorithm Hash digest
SHA256 e547bb1b261cb6b777cac5c2e21dba04f599e5916dacf00b0193cd4641cbba95
MD5 d55337269427d6634541c0a398b2e955
BLAKE2b-256 8fca4a229d1f78936ee27934b6cd1d6cc455a28643294fa81954cf97edba59ce

See more details on using hashes here.

File details

Details for the file cagecleaner-1.2.3-py3-none-any.whl.

File metadata

  • Download URL: cagecleaner-1.2.3-py3-none-any.whl
  • Upload date:
  • Size: 17.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.10.0

File hashes

Hashes for cagecleaner-1.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 a9ecc423234e626fa0f6c587c5ce4c0da33c309b827bcd96a5c71df1dc685aef
MD5 51d3dc387b3ae6f4989aaebb04af4810
BLAKE2b-256 65fe4361926289efa8334fdf62c6242b1a9de747a54fc99f97fcd268a9ec7119

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page