AutoML system for building trustworthy peptide bioactivity predictors
Project description
- Documentation: https://ibm.github.io/AutoPeptideML
- Source Code: https://github.com/IBM/AutoPeptideML
- Webserver: http://peptide.ucd.ie/autopeptideml
- Google Collaboratory Notebook: AutoPeptideML_Collab.ipynb
- Blog post: Portal - AutoPeptideML v. 1.0 Tutorial
- Papers:
AutoPeptideML allows researchers without prior knowledge of machine learning to build models that are:
- Trustworthy: Robust evaluation following community guidelines for ML evaluation reporting in life sciences DOME.
- Interpretable: Output contains a PDF summary of the model evaluation explaining how to interpret the results to understand how reliable the model is.
- Reproducible: Output contains all necessary information for other researchers to reproduce the training and verify the results.
- State-of-the-art: Models generated with this system are competitive with state-of-the-art handcrafted approaches.
To use version 1.0, which may be necessary for retrocompatibility with previously built models, please defer to the branch: AutoPeptideML v.1
Contents
Table of Contents
Model builder
In order to build a new model, AutoPeptideML (v.2.0) guides you through the process through a series of prompts.
autopeptideml build-model
This launches an interactive CLI that walks you through:
- Choosing a modeling task (classification or regression)
- Loading and parsing datasets (csv, tsv, or fasta)
- Picking models and representations
- Automatically sampling negatives
You’ll be prompted to answer various questions like:
- What is the modelling problem you're facing? (Classification or Regression)
- How do you want to define your peptides? (Macromolecules or Sequences)
- What models would you like to consider? (knn, adaboost, rf, etc.)
And so on. The final config is written to:
<outputdir>/setup-config.yml
This config file allows for easy reproducibility of the results, so that anyone can repeat the training processes. You can check the configuration file and make any changes you deem necessary. Finally, you can build the model by simply running:
autopeptideml build-model --outdir <outdir> --config-path <outputdir>/setup-config.yml
Prediction
In order to use a model that has already built you can run:
autopeptideml predict <result_dir> <features_path> --feature-field <feature_field> --output-path <my_predictions_path.csv>
Where <features_path> is the path to a CSV file with a column <features_field> that contains the peptide sequences/SMILES. The output file <my_predictions_path> will contain the original data with two additional columns score (which are the predictions) and std which is the standard deviation between the predictions of the models in the ensemble, which can be used as a measure of the uncertainty of the prediction.
Benchmark data
Data used to benchmark our approach has been selected from the benchmarks collected by Du et al, 2023. A new set of benchmarks was constructed from the original set following the new data acquisition and dataset partitioning methods within AutoPeptideML. To download the datasets:
- Original UniDL4BioPep Benchmarks: Please check the project Github Repository.
- ⚠️ New AutoPeptideML Benchmarks (Amended version): Can be downloaded from this link. Please note that these are not exactly the same benchmarks as used in the paper (see Issue #24 for more details).
- PeptideGeneralizationBenchmarks: Benchmarks evaluating how peptide representation methods generalize from canonical (peptides composed of the 20 standard amino acids) to non-canonical (peptides with non-standard amino acids or other chemical modifications). Check out the paper pre-print. They have their own dedicated repository: PeptideGeneralizationBenchmarks Github repository.
Installation
Installing in a conda environment is recommended. For creating the environment, please run:
conda create -n autopeptideml python
conda activate autopeptideml
conda install quarto -c conda-forge
1. Python Package
1.1.From PyPI
pip install autopeptideml
1.2. Directly from source
pip install git+https://github.com/IBM/AutoPeptideML
2. Third-party dependencies
To use MMSeqs2 https://github.com/steineggerlab/mmseqs2
# static build with AVX2 (fastest) (check using: cat /proc/cpuinfo | grep avx2)
wget https://mmseqs.com/latest/mmseqs-linux-avx2.tar.gz; tar xvfz mmseqs-linux-avx2.tar.gz; export PATH=$(pwd)/mmseqs/bin/:$PATH
# static build with SSE4.1 (check using: cat /proc/cpuinfo | grep sse4)
wget https://mmseqs.com/latest/mmseqs-linux-sse41.tar.gz; tar xvfz mmseqs-linux-sse41.tar.gz; export PATH=$(pwd)/mmseqs/bin/:$PATH
# static build with SSE2 (slowest, for very old systems) (check using: cat /proc/cpuinfo | grep sse2)
wget https://mmseqs.com/latest/mmseqs-linux-sse2.tar.gz; tar xvfz mmseqs-linux-sse2.tar.gz; export PATH=$(pwd)/mmseqs/bin/:$PATH
# MacOS
brew install mmseqs2
To use Needleman-Wunch, either:
conda install -c bioconda emboss
or
sudo apt install emboss
To use ECFP fingerprints:
pip install rdkit
To use MAPc fingeprints:
pip install mapchiral
To use PepFuNN fingeprints:
pip install git+https://github.com/novonordisk-research/pepfunn
To use PeptideCLM:
pip install smilesPE
More details about API
Please check the Code reference documentation
License
AutoPeptideML is an open-source software licensed under the MIT Clause License. Check the details in the LICENSE file.
Credits
Special thanks to Silvia González López for designing the AutoPeptideML logo and to Marcos Martínez Galindo for his aid in setting up the AutoPeptideML webserver.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file autopeptideml-2.0.5.tar.gz.
File metadata
- Download URL: autopeptideml-2.0.5.tar.gz
- Upload date:
- Size: 929.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
31a3861f66bf7b19d91a965abe65cc8a3218ef84510f683038d35c6bb03d0558
|
|
| MD5 |
4bfd76ad3f9312d201fa44856e164529
|
|
| BLAKE2b-256 |
18219ceb8c54b6dbcac89d6493f8650a6920bcdd044fc2878846fd002c482072
|
File details
Details for the file autopeptideml-2.0.5-py3-none-any.whl.
File metadata
- Download URL: autopeptideml-2.0.5-py3-none-any.whl
- Upload date:
- Size: 1.8 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dfdf578ca7ef5077cd1936b5c49e60bd2c54b8ab689002c2dceda03a32e248e6
|
|
| MD5 |
daecad1f0844517581d7ea5c26622834
|
|
| BLAKE2b-256 |
2df1c4979a9d8b0441c98ae8188422ddb0041ddde19c59c054a1dee8ce068d24
|