The ESKAPE Model is a machine learning-based online resource to facilitate discovery of novel antibiotics against the ESKAPE pathogens.
Project description
The ESKAPE Model Standalone
This repository provides a standalone application to the web version of ESKAPE Model at eskape.mcmaster.ca. The ESKAPE Model is a machine learning-based online resource to facilitate discovery of novel antibiotics against the ESKAPE pathogens, a group of multidrug-resistant bacteria that are responsible for the majority of hospital-acquired infections.
The ESKAPE Model predicts the antibacterial activity of inputted molecules against each of the following ESKAPE pathogens:
- EF - Enterococcus faecium
- SA - Staphylococcus aureus
- KP - Klebsiella pneumoniae
- AB - Acinetobacter baumannii
- PA - Pseudomonas aeruginosa
- BW - Escherichia coli (wildtype)
- DKO - Escherichia coli (hyperpermeable and efflux deficient)
Models were trained on in-house growth inhibition screening datasets against common laboratory strains of each pathogen. A total of 21 models were trained - three model architectures for each pathogen:
- Random forest using Morgan fingerprints
- Chemprop graph neural network
- Chemprop with RDKit features
How to Use
Input:
Molecules are inputted as a CSV file containing SMILES (one per row) with the column heading "smiles". An example csv file with two SMILES (eskape_test_input.csv) is available on this repository.
Output:
Results are outputted as a TSV file containing the following:
- Prediction scores from each of the 21 models are computed for each molecule. A prediction score is a value between 0 and 1 that denotes how confident the model is that a molecule is antibacterial. Predicted antibacterial molecules will have prediction scores closer to 1, while predicted non-antibacterial molecules will have prediction scores closer to 0.
- For any input compounds that were tested against the ESKAPE pathogens during training data acquisition, this tool will additionally output the experimental optical density (OD) values in the "validated" row. OD is a measure of bacterial cell growth, where a high OD means the bacteria grew in the presence of the compound, and a low OD means the compound was able to inhibit the growth of the bacteria. For reference, an OD less than 0.06 denotes full growth inhibition. All OD values were normalized by plate based on the interquartile mean.
- Several metrics are also calculated for each compound:
- Sum of PS: Sum of prediction scores from all pathogen models for one compound. This metric can be used to prioritize broad-spectrum antibacterial compounds.
- PPF: The ratio of the highest prediction score for a compound (PS1) to the second highest (PS2). This metric can be used to prioritize pathogen-prioritized antibacterial compounds.
- Molecular weight: Size of the molecule in g/mol
- clogP: Calculated octanol-water partition coefficient, where high clogP values mean the compound is more lipophilic. clogP is an important metric for solubility and bioavailability.
- TNN: The TNN similarity measures the structural similarity (value between 0-1) of an input molecule to the most similar molecule (nearest neighbour) from the training set. TNN similarity closer to 1 indicates the molecules are more similar (TNN similarity = 1 means the molecules are equal). Predictions on compounds that are more similar to the training set are likely to be more accurate. Nearest neighbor SMILES from the training set are included in the TSV.
Interpretation:
While all models were trained on the same datasets using the same training scheme, the three model types differ in terms of architecture and molecular representation. Prediction scores for the same molecule and pathogen will therefore vary based on the model type. Note that prediction scores do not correlate directly with likelihood of activity or potency, but rather represent model confidence.
Runtime:
Note: Predictions on 1 molecule takes ~2 minutes. Predictions on 100 molecules takes ~3.5 minutes.
Installation
The tool requires Python 3.10. Python versions more recent than 3.10 have been tested and do not work. Installation takes ~5 minutes.
Create a virtual environment
python3 -m venv eskape_env
source eskape_env/bin/activate
Install eskape_model using pip
The latest release can be installed directly from pip or this repository which will also install the dependencies chemprop and chemfunc.
pip install eskape_model
Or
Install eskape_model using tarball
Install the eskape_model application within the created eskape_model python environment using a tarball.
(eskape_env) amos@Amogelangs-MacBook-Pro % python3 -m pip install /path/to/eskape_model-1.0.0.tar.gz
Dependencies
The following are required dependencies (listed below):
- chemprop version 1.6.1 - https://github.com/chemprop/chemprop.git
- chemfunc version 1.0.10 - https://github.com/swansonk14/chemfunc.git
Install dependencies
install chemprop v1.6.1
wget https://github.com/chemprop/chemprop/archive/refs/tags/v1.6.1.tar.gz
python3 -m pip install v1.6.1.tar.gz
install chemfunc v_1.0.10
wget https://github.com/swansonk14/chemfunc/archive/refs/tags/v_1.0.10.tar.gz
python3 -m pip install v_1.0.10.tar.gz
install specific scikit-learn and numpy
(eskape_env) amos@Amogelangs-MacBook-Pro % pip install scikit-learn==1.3.2
(eskape_env) amos@Amogelangs-MacBook-Pro % pip install numpy==1.26.4
test functions
(eskape_env) amos@Amogelangs-MacBook-Pro % chemprop_predict -h
(eskape_env) amos@Amogelangs-MacBook-Pro % sklearn_predict -h
(eskape_env) amos@Amogelangs-MacBook-Pro % chemfunc -h
(eskape_env) amos@Amogelangs-MacBook-Pro % eskape_model -h
Download ESKAPE model models from eskape.mcmaster.ca or GitHub
Please download the models and training data at GitHub.
Create a directory db with two sub-directories canonical_data and models. From the downloaded models data, add training_data_canonical.csv to db/canonical_data/ directory. Add all models to directory db/models/all/.
The tree structure of db should look like so:
(eskape_env) amos@Amogelangs-MacBook-Pro db % tree -L 3
.
├── canonical_data
│ └── training_data_canonical.csv
└── models
└── all
├── AB_chemprop
├── AB_rdkit
├── AB_rf
├── BW_chemprop
├── BW_rdkit
├── BW_rf
├── DKO_chemprop
├── DKO_rdkit
├── DKO_rf
├── EF_chemprop
├── EF_rdkit
├── EF_rf
├── KP_chemprop
├── KP_rdkit
├── KP_rf
├── PA_chemprop
├── PA_rdkit
├── PA_rf
├── SA_chemprop
├── SA_rdkit
└── SA_rf
Run eskape_model
(eskape_env) amos@Amogelangs-MacBook-Pro % eskape_model \
--input_file input.txt \
--output_directory output \
--models_directory db \
--debug > run.log 2>&1 &
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file eskape_model-1.0.3.tar.gz.
File metadata
- Download URL: eskape_model-1.0.3.tar.gz
- Upload date:
- Size: 10.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
52076903cadf2bac6f7668dcb4499c5e11a78e54413a9038c5f6394e657bf4c4
|
|
| MD5 |
1a238258df9822a43b06ce3cb36238b9
|
|
| BLAKE2b-256 |
479e6f907a97309e510eb1d147b165f1381cf15c9635b5dd5709db2aa40d5ef1
|
Provenance
The following attestation bundles were made for eskape_model-1.0.3.tar.gz:
Publisher:
release.yml on raphenya/eskape-model-standalone
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
eskape_model-1.0.3.tar.gz -
Subject digest:
52076903cadf2bac6f7668dcb4499c5e11a78e54413a9038c5f6394e657bf4c4 - Sigstore transparency entry: 998274488
- Sigstore integration time:
-
Permalink:
raphenya/eskape-model-standalone@4029710d95e683b254f1f23d7f6dacb31e04ef60 -
Branch / Tag:
refs/tags/1.0.3 - Owner: https://github.com/raphenya
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@4029710d95e683b254f1f23d7f6dacb31e04ef60 -
Trigger Event:
release
-
Statement type:
File details
Details for the file eskape_model-1.0.3-py3-none-any.whl.
File metadata
- Download URL: eskape_model-1.0.3-py3-none-any.whl
- Upload date:
- Size: 10.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
093558e27fa225aae2a26fe438e68fc8b963a41ece00b2ab2def04488dc5b7a5
|
|
| MD5 |
4ac8c78a6ddf02b0333e20870a08d4b6
|
|
| BLAKE2b-256 |
330235dfa9e0e77c77feaa9d972c55a714ee4201584cb8a7fcbd9df9814658a7
|
Provenance
The following attestation bundles were made for eskape_model-1.0.3-py3-none-any.whl:
Publisher:
release.yml on raphenya/eskape-model-standalone
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
eskape_model-1.0.3-py3-none-any.whl -
Subject digest:
093558e27fa225aae2a26fe438e68fc8b963a41ece00b2ab2def04488dc5b7a5 - Sigstore transparency entry: 998274611
- Sigstore integration time:
-
Permalink:
raphenya/eskape-model-standalone@4029710d95e683b254f1f23d7f6dacb31e04ef60 -
Branch / Tag:
refs/tags/1.0.3 - Owner: https://github.com/raphenya
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@4029710d95e683b254f1f23d7f6dacb31e04ef60 -
Trigger Event:
release
-
Statement type: