Skip to main content

Genomic redundancy removal tool for cblaster hit sets

Project description

CAGEcleaner

DOI

>>> CAGEcleaner will be integrated into cblaster! <<<

Outline

CAGEcleaner removes genomic redundancy from gene cluster hit sets identified by cblaster. The redundancy in target databases used by cblaster often propagates into the result set, requiring extensive manual curation before downstream analyses and visualisation can be carried out.

Given a session file from a cblaster run (or from a CAGECAT run), CAGEcleaner retrieves all hit-associated genome assemblies, groups these into assembly clusters by ANI and identifies a representative assembly for each assembly cluster using skDER. In addition, CAGEcleaner can reinclude hits that are different at the gene cluster level despite the genomic redundancy, and this by different gene cluster content and/or by outlier cblaster scores. Finally, CAGEcleaner returns a filtered cblaster session file as well as a list of retained gene cluster IDs for easier downstream analysis.

For installation instructions, usage, explanations and more, head over to the CAGEcleaner wiki!

workflow

Citations

If you found CAGEcleaner useful, please cite our manuscript:

De Vrieze, L., Biltjes, M., Lukashevich, S., Tsurumi, K., Masschelein, J. (2025) CAGEcleaner: reducing genomic redundancy in gene cluster mining. bioRxiv https://doi.org/10.1101/2025.02.19.639057

CAGEcleaner relies heavily on the skDER genome dereplication tool and its main dependency skani, so please give these proper credit as well.

Salamzade, R., & Kalan, L. R. (2023). skDER: microbial genome dereplication approaches for comparative and metagenomic applications. https://doi.org/10.1101/2023.09.27.559801`
Shaw, J., & Yu, Y. W. (2023). Fast and robust metagenomic sequence comparison through sparse chaining with skani. Nature Methods, 20(11), 1661–1665. https://doi.org/10.1038/s41592-023-02018-3

License

CAGEcleaner is freely available under an MIT license.

Use of the third-party software, libraries or code referred to in the References section above may be governed by separate terms and conditions or license provisions. Your use of the third-party software, libraries or code is subject to any such terms and you should check that you can comply with any applicable restrictions or terms and conditions before use.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cagecleaner-1.2.0.tar.gz (16.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cagecleaner-1.2.0-py3-none-any.whl (16.2 kB view details)

Uploaded Python 3

File details

Details for the file cagecleaner-1.2.0.tar.gz.

File metadata

  • Download URL: cagecleaner-1.2.0.tar.gz
  • Upload date:
  • Size: 16.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.10.0

File hashes

Hashes for cagecleaner-1.2.0.tar.gz
Algorithm Hash digest
SHA256 a4e018ec71cd9cb67417bb8000fd40849e2a50823594e05e57dc4d900b556121
MD5 f1bd250b9d2b9df22a29f1afd1f8701d
BLAKE2b-256 cfa79fa682ca930393b82024d8417213a6ccc24cf4a1c596d9fd1fb4a6ee00c9

See more details on using hashes here.

File details

Details for the file cagecleaner-1.2.0-py3-none-any.whl.

File metadata

  • Download URL: cagecleaner-1.2.0-py3-none-any.whl
  • Upload date:
  • Size: 16.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.10.0

File hashes

Hashes for cagecleaner-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a7812c67396b5e761f9ba79804360307faae70028a8005a553ffefafc4808d25
MD5 ed6ee5dd86c48db74154b8ef1d13b1ab
BLAKE2b-256 07124bc2cc82164b3a38efa8615b4ef25d09e3f8ef13c213df10e574e0ab7949

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page