AutoML system for building trustworthy peptide bioactivity predictors

These details have not been verified by PyPI

Project links

Homepage

Project description

AutoPeptideML

AutoML system for building trustworthy peptide bioactivity predictors

Documentation: https://ibm.github.io/AutoPeptideML
Source Code: https://github.com/IBM/AutoPeptideML
Webserver: http://peptide.ucd.ie/AutoPeptideML
Google Collaboratory Notebook: AutoPeptideML_Collab.ipynb
Blog post: Portal - AutoPeptideML v. 1.0 Tutorial
Papers:
- AutoPeptideML (v. 1.0)
- ML Generalization from canonical to non-canonical peptides

AutoPeptideML allows researchers without prior knowledge of machine learning to build models that are:

Trustworthy: Robust evaluation following community guidelines for ML evaluation reporting in life sciences DOME.
Interpretable: Output contains a PDF summary of the model evaluation explaining how to interpret the results to understand how reliable the model is.
Reproducible: Output contains all necessary information for other researchers to reproduce the training and verify the results.
State-of-the-art: Models generated with this system are competitive with state-of-the-art handcrafted approaches.

To use version 1.0, which may be necessary for retrocompatibility with previously built models, please defer to the branch: AutoPeptideML v.1.0.6

Table of Contents

Model builder
Prediction
Benchmark Data
Intallation Guide
Documentation
License
Acknowledgements

Model builder

In order to build a new model, AutoPeptideML (v.2.0), introduces a new utility to automatically prepare an experiment configuration file, to i) improve the reproducibility of the pipeline and ii) to keep a user-friendly interface despite the much increased flexibility.

autopeptideml prepare-config --config-path <config-path>

This launches an interactive CLI that walks you through:

Choosing a modeling task (classification or regression)
Loading and parsing datasets (csv, tsv, or fasta)
Picking models and representations
Automatically sampling negatives

You’ll be prompted to answer various questions like:

- What is the modelling problem you're facing? (Classification or Regression)

- How do you want to define your peptides? (Macromolecules or Sequences)

- What models would you like to consider? (knn, adaboost, rf, etc.)

And so on. The final config is written to:

<config-path>.yml

This config file allows for easy reproducibility of the results, so that anyone can repeat the training processes. You can check the configuration file and make any changes you deem necessary. Finally, you can build the model by simply running:

autopeptideml build-model --outdir <outdir> --config-path <outputdir>/config.yml

Prediction

In order to use a model that has already built you can run:

autopeptideml predict <result_dir> <features_path> --feature-field <feature_field> --output-path <my_predictions_path.csv>

Where <features_path> is the path to a CSV file with a column <features_field> that contains the peptide sequences/SMILES. The output file <my_predictions_path> will contain the original data with two additional columns score (which are the predictions) and std which is the standard deviation between the predictions of the models in the ensemble, which can be used as a measure of the uncertainty of the prediction.

Benchmark data

Data used to benchmark our approach has been selected from the benchmarks collected by Du et al, 2023. A new set of benchmarks was constructed from the original set following the new data acquisition and dataset partitioning methods within AutoPeptideML. To download the datasets:

Original UniDL4BioPep Benchmarks: Please check the project Github Repository.
⚠️ New AutoPeptideML Benchmarks (Amended version): Can be downloaded from this link. Please note that these are not exactly the same benchmarks as used in the paper (see Issue #24 for more details).
PeptideGeneralizationBenchmarks: Benchmarks evaluating how peptide representation methods generalize from canonical (peptides composed of the 20 standard amino acids) to non-canonical (peptides with non-standard amino acids or other chemical modifications). Check out the paper pre-print. They have their own dedicated repository: PeptideGeneralizationBenchmarks Github repository.

Installation

Installing in a conda environment is recommended. For creating the environment, please run:

conda create -n autopeptideml python
conda activate autopeptideml

1. Python Package

1.1.From PyPI

pip install autopeptideml

1.2. Directly from source

pip install git+https://github.com/IBM/AutoPeptideML

2. Third-party dependencies

To use MMSeqs2 https://github.com/steineggerlab/mmseqs2

# static build with AVX2 (fastest) (check using: cat /proc/cpuinfo | grep avx2)
wget https://mmseqs.com/latest/mmseqs-linux-avx2.tar.gz; tar xvfz mmseqs-linux-avx2.tar.gz; export PATH=$(pwd)/mmseqs/bin/:$PATH

# static build with SSE4.1  (check using: cat /proc/cpuinfo | grep sse4)
wget https://mmseqs.com/latest/mmseqs-linux-sse41.tar.gz; tar xvfz mmseqs-linux-sse41.tar.gz; export PATH=$(pwd)/mmseqs/bin/:$PATH

# static build with SSE2 (slowest, for very old systems)  (check using: cat /proc/cpuinfo | grep sse2)
wget https://mmseqs.com/latest/mmseqs-linux-sse2.tar.gz; tar xvfz mmseqs-linux-sse2.tar.gz; export PATH=$(pwd)/mmseqs/bin/:$PATH

# MacOS
brew install mmseqs2

To use Needleman-Wunch, either:

conda install -c bioconda emboss

sudo apt install emboss

To use ECFP fingerprints:

pip install rdkit

To use MAPc fingeprints:

pip install mapchiral

To use PepFuNN fingeprints:

pip install git+https://github.com/novonordisk-research/pepfunn

To use PeptideCLM:

pip install smilesPE

Documentation

Configuration file

datasets:
  main:
    feat-fields: # Column with peptide sequence/SMILES
    label-field: # Column with labels/ "Assume all entries are positives"
    path: # Path to dataset
  neg-db:
    activities-to-exclude: # List of activities to exclude
      - activity-1
      - activity-2
      ...
    feat-fields: null # Column with peptide sequence/SMILES (only if using custom database)
    path: # Path to custom database or choose: canonical, non-canonical, both
device: # Device for computing representations. Choose: cpu, mps, cuda
direction: # Direction of optimization. Choose: maximize or minimize
metric: # Metric for optimization. mse, mae require direction minimize
models: # List of machine learning algorithms to explore. List:
        # knn, svm, rf, gradboost, xgboost, lightgbm
  - model-1
  - model-2
  ...
n-trials: # Number of optimization steps. Recommended 100-200
pipeline: to-smiles # Pipeline for preprocessing. Choose: to-smiles, to-sequences
reps: # List of peptide representations to explore. List:
      # ecfp, chemberta-2, molformer-xl, peptide-clm, esm2-8m, ...
  - rep-1
  - rep-2
  ...

split-strategy: min # Strategy for splitting train/test. Choose: min, random. 
task: class # Machine learning type of problem. Choose: class or reg.
n-jobs: # Number of processes to launch. -1 uses all possible CPU cores.

More details about API

Please check the Code reference documentation

License

AutoPeptideML is an open-source software licensed under the MIT Clause License. Check the details in the LICENSE file.

Credits

Special thanks to Silvia González López for designing the AutoPeptideML logo and to Marcos Martínez Galindo for his aid in setting up the AutoPeptideML webserver.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

2.0.6

Mar 2, 2026

2.0.5

Feb 12, 2026

2.0.4

Jan 20, 2026

2.0.3

Aug 30, 2025

This version

2.0.2

Jul 18, 2025

2.0.1

Jun 16, 2025

2.0.0

Apr 18, 2025

1.0.6

Mar 13, 2025

1.0.5

Jan 13, 2025

1.0.4

Dec 10, 2024

1.0.3

Nov 11, 2024

1.0.2

Oct 30, 2024

1.0.1

Sep 11, 2024

1.0.0

Aug 23, 2024

0.3.4

Aug 15, 2024

0.3.3

Aug 15, 2024

0.3.2

Aug 14, 2024

0.3.1

Aug 13, 2024

0.3.0

Jul 17, 2024

0.2.13

Jul 17, 2024

0.2.12

May 10, 2024

0.2.10

May 2, 2024

0.2.9

Feb 1, 2024

0.2.8

Jan 31, 2024

0.2.7

Jan 31, 2024

0.2.6

Jan 30, 2024

0.1.1

Nov 13, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

autopeptideml-2.0.2.tar.gz (930.3 kB view details)

Uploaded Jul 18, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

autopeptideml-2.0.2-py3-none-any.whl (1.8 MB view details)

Uploaded Jul 18, 2025 Python 3

File details

Details for the file autopeptideml-2.0.2.tar.gz.

File metadata

Download URL: autopeptideml-2.0.2.tar.gz
Upload date: Jul 18, 2025
Size: 930.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for autopeptideml-2.0.2.tar.gz
Algorithm	Hash digest
SHA256	`ccaa10a2ca367ac5a615e330c876311172f7d8b7afb21757d0446ec5228c468c`
MD5	`6c19063320f4f409405b61c7ec236ec0`
BLAKE2b-256	`8b9f675e7b462b9348b0ad026dd516d550263145c36431ad4003cd4bf73bb1a7`

See more details on using hashes here.

File details

Details for the file autopeptideml-2.0.2-py3-none-any.whl.

File metadata

Download URL: autopeptideml-2.0.2-py3-none-any.whl
Upload date: Jul 18, 2025
Size: 1.8 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for autopeptideml-2.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`abc73739a93a6bed78aa82690c9d0c772a08b8487c6850011310ea3b84994c35`
MD5	`75c4ffc9d19c510e4dd7014930818fc1`
BLAKE2b-256	`f272375b567f3cf88edf5e1bc685ea5889b39a720a915b81fa9aabb316542343`

See more details on using hashes here.

autopeptideml 2.0.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

AutoPeptideML

Contents

Model builder

Prediction

Benchmark data

Installation

1. Python Package

1.1.From PyPI

1.2. Directly from source

2. Third-party dependencies

Documentation

Configuration file

More details about API

License

Credits

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes