Skip to main content

Multi-objective optimization for protein design

Project description

multiObjectiveDesign

MultiObjective Design for Protein Engineering

Overview

multiObjectiveDesign is a Python toolkit for running iterative, multi-objective optimisation loops on protein sequences. At its core it combines a genetic algorithm with a pluggable catalogue of metrics (e.g. ProteinMPNN, PyRosetta, FrustraR), allowing you to trade off stability, designability, frustration and custom objectives while keeping full visibility into each iteration.

Installation

The core package is a standard Python project defined by pyproject.toml and can be installed with either pip or mamba/conda.

With pip

# from PyPI (once published)
pip install protein_mood

# from a clone of the repository
pip install .

# editable / development install
pip install -e ".[dev]"

# straight from GitHub
pip install "git+https://github.com/AlbertCS/multiObjectiveOptimizationDesign.git"

A plain install pulls in only the lightweight scientific stack (numpy, pandas, scipy, scikit-learn, matplotlib, seaborn, biopython, overrides, tqdm, icecream). The heavier predictors are grouped into optional extras:

Extra Enables
esm Deep-learning metrics — torch, transformers (ESM2, ESMC, ESMFold2, LigandMPNN)
rosetta pyrosetta-installer for the PyRosetta-based metrics
dev pytest, ipykernel, jupyterlab
all esm + rosetta + dev
pip install ".[esm]"      # deep-learning metrics
pip install ".[all]"      # everything

PyRosetta is not distributed on PyPI. After installing the rosetta extra, download the wheel with:

python -m pyrosetta_installer  # or: pyrosetta-installer

With mamba / conda

An environment.yml is provided that builds an env named mood with the core stack (plus notebook tooling and PyTorch) and pip-installs the package in editable mode:

mamba env create -f environment.yml   # or: conda env create -f environment.yml
mamba activate mood

Once the conda-forge package is published you can also install it directly:

mamba install -c conda-forge protein_mood

For fully-pinned, reproducible environments used on HPC, see the lock files in configs/ (mood-dev.yml, mood-esmc.yml, mood-esmfold2.yml).

ProteinMPNN model weights

The ProteinMPNN weights (~70 MB) are not bundled with the package. The first time a ProteinMPNN metric runs, the required .pt file is downloaded, checksum-verified and cached under ~/.cache/mood/ (override with $MOOD_CACHE_DIR). You can control this:

Variable Effect
MOOD_PROTEINMPNN_WEIGHTS Directory of pre-staged weights to use as-is — no download. Use this on offline/air-gapped clusters (e.g. MareNostrum). Layout: <type>_model_weights/<name>.pt.
MOOD_CACHE_DIR Where downloaded weights are cached (default ~/.cache/mood).
MOOD_PROTEINMPNN_BASE_URL Base URL to download from (default: upstream dauparas/ProteinMPNN).

Passing path_to_model_weights to the ProteinMPNN metric bypasses all of the above and uses that directory directly.

Verify the install

mood --help                 # console entry point
python -c "import mood; print('ok')"

Versioning & releases

The version is derived from git tags by setuptools-scm — there is no version number to edit by hand. Releases are cut by tagging vX.Y.Z, which is automated by .github/workflows/release.yml on every push to main:

  • default for any merge / commit → patch (1.0.0 → 1.0.1)
  • PR labelled minorminor (1.0.0 → 1.1.0)
  • PR labelled majormajor (1.9.0 → 2.0.0)
  • manual runs (workflow_dispatch) let you pick the bump

The workflow creates the tag and a GitHub Release; the version is also embedded in the built sdist so PyPI/conda-forge builds resolve it without git metadata.

Project layout

  • mood/multiObjectiveOptimization.py – high-level orchestration that prepares folders, restores previous runs, evaluates metrics and persists artefacts.
  • mood/optimizers/ – optimisation strategies. Currently the genetic algorithm is implemented with modular crossover/mutation helpers and a rich mutation biasing subsystem.
  • mood/metrics/ – collection of metric classes with a shared interface. Each metric computes a dataframe of scores for the candidate sequences and exposes selection orientation (min/max) metadata used during ranking.
  • mood/base/ – lightweight data structures (sequences, logging, state) shared across the codebase.
  • mood/utils/ – utilities for structure handling, plotting, ProteinMPNN wrappers and misc helpers required by the optimiser/metrics.
  • configs/ – ready-to-run JSON configurations demonstrating typical setups.
  • tests/ – pytest/unittest suites covering the core pieces (GA, selection strategies, CLI integration, metrics) plus example notebooks for exploratory runs.

For a detailed catalogue of available metrics, their objectives, and example configurations, see For a detailed catalogue of available metrics, see Metrics overview

CLI Generator

The CLI now generates ready-to-run replica scripts instead of executing an optimisation in-place. Given a JSON/YAML config it produces:

  • setUp_<folder_name>_<replica>.py scripts mirroring our manual setup style.
  • A SLURM array runner that dispatches the correct setup per SLURM_ARRAY_TASK_ID.

Quick start

python3 -m mood.cli \
  --config configs/toy_example.json \
  --replicas 2 \
  --seed-start 1234 \
  --seed-step 1

Outputs are written to folder_name/ (or --output-prefix). Each replica inherits your config, with {seed} placeholders replaced by seed-start + index * seed-step.

HPC-friendly generation

Provide a preamble snippet and Python interpreter to match your cluster:

python3 -m mood.cli \
  --config configs/toy_example.json \
  --replicas 4 \
  --seed-start 1235 \
  --python-exec /path/to/conda/env/bin/python \
  --preamble-file configs/runner_preamble.sh \
  --ntasks 80 --cpus-per-task 1 --time 02-00:00:00

Submit the generated folder_name/runner_array.sh via sbatch. Re-run the CLI with --overwrite to refresh existing scripts. Check python3 -m mood.cli --help for the full option list.

Typical workflow

  1. Prepare inputs – provide a native PDB (or scaffold), choose metrics, and declare mutable/fixed positions in your config. The metrics module exposes helpers to pre-compute ProteinMPNN priors or frustration files if needed.
  2. Tune optimisation knobs – set population size, mutation/crossover cycle, parent-selection strategy (rank, crowding, objective bias) and iteration count in the configuration file.
  3. Generate runners or integrate directly – use the CLI generator to emit replica setup scripts for HPC runs, or instantiate MultiObjectiveOptimization directly inside your own pipeline.
  4. Inspect outputs – each iteration folder contains pickled sequences, per-chain dataframes, optional Pareto plots, and the metric-specific raw artefacts. The most recent iteration can be resumed without losing progress.

Development notes

  • All metrics inherit from mood.metrics.metric.Metric; to add a new metric, implement compute, setup_iterations_inputs, and clean, populating the state (orientation) and objectives lists.
  • Optimisers rely on AlgorithmDataSingleton to store sequences; when implementing alternative algorithms, follow the contract exposed by mood/optimizers/optimizer.py.
  • The repository includes high-level regression tests (tests/test_top7.py, tests/test_mood.py) and targeted unit tests for selection strategies, mutation handlers, and metrics. Run python -m pytest before submitting changes.

Further reading

  • Zitzler, Deb, Thiele (2000) – origin of the ZDT benchmarks used in mood/optimizers/benchmarks.py.
  • ProteinMPNN and PyRosetta official documentation for understanding the external predictors invoked by this project.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

protein_mood-1.0.2.tar.gz (62.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

protein_mood-1.0.2-py3-none-any.whl (211.0 kB view details)

Uploaded Python 3

File details

Details for the file protein_mood-1.0.2.tar.gz.

File metadata

  • Download URL: protein_mood-1.0.2.tar.gz
  • Upload date:
  • Size: 62.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for protein_mood-1.0.2.tar.gz
Algorithm Hash digest
SHA256 1ff9132b4bcbd2b0c5cc9bc8760bbc7d782f566494027b8d9cabdbae02c4156c
MD5 e05997683c2734ab9ab56b96106d089c
BLAKE2b-256 634d04533e25ce6eea615e26ca52476b2b131d3d150b0c5d5b255b951d90a79a

See more details on using hashes here.

Provenance

The following attestation bundles were made for protein_mood-1.0.2.tar.gz:

Publisher: publish.yml on AlbertCS/multiObjectiveOptimizationDesign

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file protein_mood-1.0.2-py3-none-any.whl.

File metadata

  • Download URL: protein_mood-1.0.2-py3-none-any.whl
  • Upload date:
  • Size: 211.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for protein_mood-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 173ab0573ba629d2bbd078c4d298ddd15e870ccccc7eafdf7148f139b2ee1f9f
MD5 6e7a8506fedd1988de2af92d1e0de830
BLAKE2b-256 e67c7797609230c4db85df8bd6602750b3596c9d7103794322ef73cffdedd21e

See more details on using hashes here.

Provenance

The following attestation bundles were made for protein_mood-1.0.2-py3-none-any.whl:

Publisher: publish.yml on AlbertCS/multiObjectiveOptimizationDesign

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page