Skip to main content

Chemical HierArchy for secondary Metabolism clusters Obtained In Silico

Project description

๐Ÿ CHAMOIS Stars

Chemical Hierarchy Approximation for secondary Metabolism clusters Obtained In Silico.

Actions PyPI Bioconda Wheel Python Versions License Source Mirror Issues Docs Changelog Preprint

๐Ÿ—บ๏ธ ๏ธOverview

CHAMOIS is a fast method for predicting chemical features of natural products produced by Biosynthetic Gene Clusters (BGCs) using only their genomic sequence. It can be used to get chemical features from BGCs predicted in silico with tools such as GECCO or antiSMASH.

๐Ÿ’ก Usage

This section shows only the basic commands for installing and running CHAMOIS. The online documentation contains a more detailed installation guide, examples, an API reference, and a CLI reference

๐Ÿ”ง Installing CHAMOIS

CHAMOIS is implemented in Python, and supports all versions from Python 3.7 onwards. It requires additional libraries that can be installed directly from PyPI, the Python Package Index.

$ pip install chamois-tool

Installing the package is instantaneous, but requires downloading an extra 44 MiB of data (profile HMMs) from GitHub, which will add to the install time depending on the speed of your Internet connection.

Since release v0.2.1, CHAMOIS can now run on Windows! This uses the PyHMMER v0.12.0 experimental MinGW-w64 build which supports Windows 10 and later. See the PyHMMER documentation for more information about Windows support.

๐Ÿงฌ Running CHAMOIS

Once CHAMOIS is installed, you can run it from the terminal by providing it with one or more GenBank file the genomic records of the BGCs to analyze, and an output path where to write the results in HDF5 format. For instance to predict the classes for BGC0000703, a kanamycin-producing BGC from MIBiG:

$ chamois predict -i tests/data/BGC0000703.4.gbk -o tests/data/BGC0000703.4.hdf5

This takes about 3 seconds and 600 MiB of RAM on a higher-end laptop (Linux 6.13.8, i7-1255U @ 4.70 GHz). The runtime and memory usage scales linearly with the number of BGCs to process.

Additional examples for running CHAMOIS can be found in the online documentation.

๐Ÿ”Ž Viewing results

The output file can be loaded with the anndata package, and corresponds to a probability matrix where rows are the input BGCs, and columns are the ChemOnt classes.

To get a summary for each predicted BGC, use the render command:

$ chamois render -i tests/data/BGC0000703.4.hdf5

Predictions for each BGC will be shown as a tree with their computed probabilities:

CHEMONTID:0000002 (Organoheterocyclic compounds): 0.996
โ”œโ”€โ”€ CHEMONTID:0002012 (Oxanes): 0.996โ”‚
โ””โ”€โ”€ CHEMONTID:0004140 (Oxacyclic compounds): 0.976
CHEMONTID:0004150 (Hydrocarbon derivatives): 0.999
CHEMONTID:0004557 (Organopnictogen compounds): 0.948
CHEMONTID:0004603 (Organic oxygen compounds): 1.000
โ””โ”€โ”€ CHEMONTID:0000323 (Organooxygen compounds): 1.000
    โ”œโ”€โ”€ CHEMONTID:0000011 (Carbohydrates and carbohydrate conjugates): 0.996
    โ”‚   โ”œโ”€โ”€ CHEMONTID:0001540 (Monosaccharides): 0.996
    โ”‚   โ”œโ”€โ”€ CHEMONTID:0002105 (Glycosyl compounds): 0.977
    โ”‚   โ”‚   โ””โ”€โ”€ CHEMONTID:0002207 (O-glycosyl compounds): 0.977
    โ”‚   โ””โ”€โ”€ CHEMONTID:0003305 (Aminosaccharides): 0.995
    โ”‚       โ””โ”€โ”€ CHEMONTID:0000282 (Aminoglycosides): 0.995
    โ”‚           โ””โ”€โ”€ CHEMONTID:0001675 (Aminocyclitol glycosides): 0.995
    โ”‚               โ””โ”€โ”€ CHEMONTID:0003575 (2-deoxystreptamine aminoglycosides): 0.961
    โ”œโ”€โ”€ CHEMONTID:0000129 (Alcohols and polyols): 1.000
    โ”‚   โ”œโ”€โ”€ CHEMONTID:0000286 (Primary alcohols): 0.891
    โ”‚   โ”œโ”€โ”€ CHEMONTID:0001292 (Cyclic alcohols and derivatives): 0.998
    โ”‚   โ”‚   โ””โ”€โ”€ CHEMONTID:0002509 (Cyclitols and derivatives): 0.996
    โ”‚   โ”‚       โ””โ”€โ”€ CHEMONTID:0002510 (Aminocyclitols and derivatives): 0.987
    โ”‚   โ”œโ”€โ”€ CHEMONTID:0001661 (Secondary alcohols): 0.999
    โ”‚   โ”‚   โ””โ”€โ”€ CHEMONTID:0002647 (Cyclohexanols): 0.995
    โ”‚   โ””โ”€โ”€ CHEMONTID:0002286 (Polyols): 0.972
    โ””โ”€โ”€ CHEMONTID:0000254 (Ethers): 0.959
        โ””โ”€โ”€ CHEMONTID:0001656 (Acetals): 0.959
CHEMONTID:0004707 (Organic nitrogen compounds): 0.999
โ””โ”€โ”€ CHEMONTID:0000278 (Organonitrogen compounds): 0.999
    โ”œโ”€โ”€ CHEMONTID:0002449 (Amines): 0.999
    โ”‚   โ”œโ”€โ”€ CHEMONTID:0002450 (Primary amines): 0.989
    โ”‚   โ”‚   โ””โ”€โ”€ CHEMONTID:0000469 (Monoalkylamines): 0.989
    โ”‚   โ””โ”€โ”€ CHEMONTID:0002460 (Alkanolamines): 0.999
    โ”‚       โ””โ”€โ”€ CHEMONTID:0001897 (1,2-aminoalcohols): 0.992
    โ””โ”€โ”€ CHEMONTID:0002674 (Cyclohexylamines): 0.987

๐ŸŽ›๏ธ Training CHAMOIS

Training CHAMOIS is also done with the CLI, provided you have training data available. You can use the CHAMOIS datasets released on Zenodo to reproduce our results.

For instance, to train on the MIBiG 3.1 BGCs, the dataset used to train the CHAMOIS classifier distributed with the code, run the following command:

$ chamois train -f data/datasets/mibig3.1/features.hdf5 -c data/datasets/mibig3.1/classes.hdf5 -o model.json

This takes about 12 seconds and 600 MiB of RAM on a higher-end laptop (Linux 6.13.8, i7-1255U @ 4.70 GHz).

๐Ÿ“ Requirements

๐Ÿ–ฅ๏ธ System requirements

CHAMOIS is a pure-python package but requires HMMER, which only runs on PowerPC, x86-64 and Aarch64 systems, and only on POSIX operating systems (Linux, MacOS, BSD, Windows w/ MinGW-w64).

CHAMOIS is tested on Linux (Ubuntu 22.04) using the GitHub Actions continuous integration platform.

๐Ÿ Software requirements

CHAMOIS supports (and is tested) on all Python versions from Python 3.7 onwards. It requires the following Python packages:

Minimum Tested Latest
anndata >=0.8 0.9.2 PyPI
gb-io >=0.3.1 0.3.4 PyPI
lz4 >=4.0 4.3.3 PyPI
numpy >=1.0 2.2.4 PyPI
pandas >=1.3 2.2.3 PyPI
platformdirs >=3.0 4.3.6 PyPI
pyhmmer >=0.11.0 0.11.0 PyPI
pyrodigal >=3.0 3.6.3 PyPI
rich >=12.4 13.9.4 PyPI
rich-argparse >=1.1 1.6.0 PyPI
scipy >=1.4 1.15.2 PyPI

๐Ÿ”– Reference

CHAMOIS can be cited using the following preprint:

Machine learning inference of natural product chemistry across biosynthetic gene cluster types. Martin Larralde, Georg Zeller. bioRxiv 2025.03.13.642868; doi:10.1101/2025.03.13.642868

๐Ÿ’ญ Feedback

โš ๏ธ Issue Tracker

Found a bug ? Have an enhancement request ? Head over to the GitHub issue tracker if you need to report or ask something. If you are filing in on a bug, please include as much information as you can about the issue, and try to recreate the same bug in a simple, easily reproducible situation.

๐Ÿ—๏ธ Contributing

Contributions are more than welcome! See CONTRIBUTING.md for more details.

โš–๏ธ License

This software is provided under the GNU General Public License v3.0 or later. CHAMOIS is developped by the Zeller Lab at the European Molecular Biology Laboratory in Heidelberg and the Leiden University Medical Center in Leiden.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chamois_tool-0.2.2.tar.gz (482.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

chamois_tool-0.2.2-py3-none-any.whl (12.4 MB view details)

Uploaded Python 3

File details

Details for the file chamois_tool-0.2.2.tar.gz.

File metadata

  • Download URL: chamois_tool-0.2.2.tar.gz
  • Upload date:
  • Size: 482.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for chamois_tool-0.2.2.tar.gz
Algorithm Hash digest
SHA256 20e8162577ebfa155acab6d5f98de59163505602094e595346728b43b9a2927b
MD5 978fec212c2ec3eab3ffab7af1c35921
BLAKE2b-256 dfce1540f053014fece8c7991ee4f710f66386488dd9cc09896e8a09715c4a8f

See more details on using hashes here.

Provenance

The following attestation bundles were made for chamois_tool-0.2.2.tar.gz:

Publisher: test.yml on zellerlab/CHAMOIS

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file chamois_tool-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: chamois_tool-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 12.4 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for chamois_tool-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 c78ad1e3c853b558df27ed69f2c7f3a66a5b28b2e029d81cf145f93f26370965
MD5 9d0432f534acf81b3b13c576fe581998
BLAKE2b-256 dade90569b715cd8ecdea8a8f52d2e45cfaed996cb1e906a84b25c8bcdd88cbb

See more details on using hashes here.

Provenance

The following attestation bundles were made for chamois_tool-0.2.2-py3-none-any.whl:

Publisher: test.yml on zellerlab/CHAMOIS

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page