Multi-objective optimization for protein design
Project description
multiObjectiveDesign
MultiObjective Design for Protein Engineering
Overview
multiObjectiveDesign is a Python toolkit for running iterative, multi-objective
optimisation loops on protein sequences. At its core it combines a genetic
algorithm with a pluggable catalogue of metrics (e.g. ProteinMPNN, PyRosetta,
FrustraR), allowing you to trade off stability, designability, frustration and
custom objectives while keeping full visibility into each iteration.
Installation
The core package is a standard Python project defined by pyproject.toml and
can be installed with either pip or mamba/conda.
With pip
# from PyPI (once published)
pip install protein_mood
# from a clone of the repository
pip install .
# editable / development install
pip install -e ".[dev]"
# straight from GitHub
pip install "git+https://github.com/AlbertCS/multiObjectiveOptimizationDesign.git"
A plain install pulls in only the lightweight scientific stack (numpy, pandas, scipy, scikit-learn, matplotlib, seaborn, biopython, overrides, tqdm, icecream). The heavier predictors are grouped into optional extras:
| Extra | Enables |
|---|---|
esm |
Deep-learning metrics — torch, transformers (ESM2, ESMC, ESMFold2, LigandMPNN) |
rosetta |
pyrosetta-installer for the PyRosetta-based metrics |
dev |
pytest, ipykernel, jupyterlab |
all |
esm + rosetta + dev |
pip install ".[esm]" # deep-learning metrics
pip install ".[all]" # everything
PyRosetta is not distributed on PyPI. After installing the rosetta
extra, download the wheel with:
python -m pyrosetta_installer # or: pyrosetta-installer
With mamba / conda
An environment.yml is provided that builds an env named mood with the core
stack (plus notebook tooling and PyTorch) and pip-installs the package in
editable mode:
mamba env create -f environment.yml # or: conda env create -f environment.yml
mamba activate mood
Once the conda-forge package is published you can also install it directly:
mamba install -c conda-forge protein_mood
For fully-pinned, reproducible environments used on HPC, see the lock files in
configs/ (mood-dev.yml, mood-esmc.yml, mood-esmfold2.yml).
ProteinMPNN model weights
The ProteinMPNN weights (~70 MB) are not bundled with the package. The
first time a ProteinMPNN metric runs, the required .pt file is downloaded,
checksum-verified and cached under ~/.cache/mood/ (override with
$MOOD_CACHE_DIR). You can control this:
| Variable | Effect |
|---|---|
MOOD_PROTEINMPNN_WEIGHTS |
Directory of pre-staged weights to use as-is — no download. Use this on offline/air-gapped clusters (e.g. MareNostrum). Layout: <type>_model_weights/<name>.pt. |
MOOD_CACHE_DIR |
Where downloaded weights are cached (default ~/.cache/mood). |
MOOD_PROTEINMPNN_BASE_URL |
Base URL to download from (default: upstream dauparas/ProteinMPNN). |
Passing path_to_model_weights to the ProteinMPNN metric bypasses all of the
above and uses that directory directly.
Verify the install
mood --help # console entry point
python -c "import mood; print('ok')"
Versioning & releases
The version is derived from git tags by
setuptools-scm — there is no version
number to edit by hand. Releases are cut by tagging vX.Y.Z, which is
automated by .github/workflows/release.yml
on every push to main:
- default for any merge / commit → patch (
1.0.0 → 1.0.1) - PR labelled
minor→ minor (1.0.0 → 1.1.0) - PR labelled
major→ major (1.9.0 → 2.0.0) - manual runs (
workflow_dispatch) let you pick the bump
The workflow creates the tag and a GitHub Release; the version is also embedded in the built sdist so PyPI/conda-forge builds resolve it without git metadata.
Project layout
mood/multiObjectiveOptimization.py– high-level orchestration that prepares folders, restores previous runs, evaluates metrics and persists artefacts.mood/optimizers/– optimisation strategies. Currently the genetic algorithm is implemented with modular crossover/mutation helpers and a rich mutation biasing subsystem.mood/metrics/– collection of metric classes with a shared interface. Each metric computes a dataframe of scores for the candidate sequences and exposes selection orientation (min/max) metadata used during ranking.mood/base/– lightweight data structures (sequences, logging, state) shared across the codebase.mood/utils/– utilities for structure handling, plotting, ProteinMPNN wrappers and misc helpers required by the optimiser/metrics.configs/– ready-to-run JSON configurations demonstrating typical setups.tests/– pytest/unittest suites covering the core pieces (GA, selection strategies, CLI integration, metrics) plus example notebooks for exploratory runs.
For a detailed catalogue of available metrics, their objectives, and example configurations, see For a detailed catalogue of available metrics, see Metrics overview
CLI Generator
The CLI now generates ready-to-run replica scripts instead of executing an optimisation in-place. Given a JSON/YAML config it produces:
setUp_<folder_name>_<replica>.pyscripts mirroring our manual setup style.- A SLURM array runner that dispatches the correct setup per
SLURM_ARRAY_TASK_ID.
Quick start
python3 -m mood.cli \
--config configs/toy_example.json \
--replicas 2 \
--seed-start 1234 \
--seed-step 1
Outputs are written to folder_name/ (or --output-prefix). Each replica inherits your config, with {seed} placeholders replaced by seed-start + index * seed-step.
HPC-friendly generation
Provide a preamble snippet and Python interpreter to match your cluster:
python3 -m mood.cli \
--config configs/toy_example.json \
--replicas 4 \
--seed-start 1235 \
--python-exec /path/to/conda/env/bin/python \
--preamble-file configs/runner_preamble.sh \
--ntasks 80 --cpus-per-task 1 --time 02-00:00:00
Submit the generated folder_name/runner_array.sh via sbatch. Re-run the CLI with --overwrite to refresh existing scripts. Check python3 -m mood.cli --help for the full option list.
Typical workflow
- Prepare inputs – provide a native PDB (or scaffold), choose metrics, and declare mutable/fixed positions in your config. The metrics module exposes helpers to pre-compute ProteinMPNN priors or frustration files if needed.
- Tune optimisation knobs – set population size, mutation/crossover cycle, parent-selection strategy (rank, crowding, objective bias) and iteration count in the configuration file.
- Generate runners or integrate directly – use the CLI generator to emit
replica setup scripts for HPC runs, or instantiate
MultiObjectiveOptimizationdirectly inside your own pipeline. - Inspect outputs – each iteration folder contains pickled sequences, per-chain dataframes, optional Pareto plots, and the metric-specific raw artefacts. The most recent iteration can be resumed without losing progress.
Development notes
- All metrics inherit from
mood.metrics.metric.Metric; to add a new metric, implementcompute,setup_iterations_inputs, andclean, populating thestate(orientation) andobjectiveslists. - Optimisers rely on
AlgorithmDataSingletonto store sequences; when implementing alternative algorithms, follow the contract exposed bymood/optimizers/optimizer.py. - The repository includes high-level regression tests (
tests/test_top7.py,tests/test_mood.py) and targeted unit tests for selection strategies, mutation handlers, and metrics. Runpython -m pytestbefore submitting changes.
Further reading
- Zitzler, Deb, Thiele (2000) – origin of the ZDT benchmarks used in
mood/optimizers/benchmarks.py. - ProteinMPNN and PyRosetta official documentation for understanding the external predictors invoked by this project.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file protein_mood-1.0.2.tar.gz.
File metadata
- Download URL: protein_mood-1.0.2.tar.gz
- Upload date:
- Size: 62.5 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1ff9132b4bcbd2b0c5cc9bc8760bbc7d782f566494027b8d9cabdbae02c4156c
|
|
| MD5 |
e05997683c2734ab9ab56b96106d089c
|
|
| BLAKE2b-256 |
634d04533e25ce6eea615e26ca52476b2b131d3d150b0c5d5b255b951d90a79a
|
Provenance
The following attestation bundles were made for protein_mood-1.0.2.tar.gz:
Publisher:
publish.yml on AlbertCS/multiObjectiveOptimizationDesign
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
protein_mood-1.0.2.tar.gz -
Subject digest:
1ff9132b4bcbd2b0c5cc9bc8760bbc7d782f566494027b8d9cabdbae02c4156c - Sigstore transparency entry: 2046667373
- Sigstore integration time:
-
Permalink:
AlbertCS/multiObjectiveOptimizationDesign@97eba1cae7e69054b3511030c87016618f771c5d -
Branch / Tag:
refs/heads/main - Owner: https://github.com/AlbertCS
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@97eba1cae7e69054b3511030c87016618f771c5d -
Trigger Event:
workflow_run
-
Statement type:
File details
Details for the file protein_mood-1.0.2-py3-none-any.whl.
File metadata
- Download URL: protein_mood-1.0.2-py3-none-any.whl
- Upload date:
- Size: 211.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
173ab0573ba629d2bbd078c4d298ddd15e870ccccc7eafdf7148f139b2ee1f9f
|
|
| MD5 |
6e7a8506fedd1988de2af92d1e0de830
|
|
| BLAKE2b-256 |
e67c7797609230c4db85df8bd6602750b3596c9d7103794322ef73cffdedd21e
|
Provenance
The following attestation bundles were made for protein_mood-1.0.2-py3-none-any.whl:
Publisher:
publish.yml on AlbertCS/multiObjectiveOptimizationDesign
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
protein_mood-1.0.2-py3-none-any.whl -
Subject digest:
173ab0573ba629d2bbd078c4d298ddd15e870ccccc7eafdf7148f139b2ee1f9f - Sigstore transparency entry: 2046667468
- Sigstore integration time:
-
Permalink:
AlbertCS/multiObjectiveOptimizationDesign@97eba1cae7e69054b3511030c87016618f771c5d -
Branch / Tag:
refs/heads/main - Owner: https://github.com/AlbertCS
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@97eba1cae7e69054b3511030c87016618f771c5d -
Trigger Event:
workflow_run
-
Statement type: