Skip to main content

Structure-conditioned protein sequence design using message passing neural networks

Project description

proteinmpnn - A cli adaptation of Kuhlman lab's fork

[!WARNING] This is a work-in-progress.

This repo contains a clean-up of Kuhlman lab's fork of ProteinMPNN, converting it into an easy-to-use cli.

This modernization includes

  • using uv for dependency and package management.
  • using typer to construct a CLI with plenty of flavor.

Clone this repo and run

uv run proteinmpnn --help

Current features

  • Running inference for a single pdb using proteinmpnn run-single. Use --help to get a look into the optional arguments. This is meant to replace the single-protein analyses.
  • Computing conditional/unconditional probabilities of amino-acids per location. Check proteinmpnn compute-probs --help for more context.

Other improvements on Kuhlman's fork

  • The usual two-step sequence with generate_json.py and then running it is no longer necessary.
  • Unit testing using pytest, as well as backwards compatibility test (making sure that we don't deviate from the original behavior).
  • Linting using ruff to make the code more developer-friendly.

Original readme

This repo includes the Kuhlman Lab fork of ProteinMPNN. It includes all the functionality of the original ProteinMPNN repo (linked here), with the following additions:

  • Improved input parsing for custom design runs
  • Multi-state design support
  • Additional utilities to provide integration with EvoPro

ProteinMPNN Read ProteinMPNN paper.

Installation:

git clone git@github.com:Kuhlman-Lab/proteinmpnn.git
cd proteinmpnn
mamba create env -f setup/proteinmpnn.yml

NOTE (July 2025):

ProteinMPNN uses CUDA 11.3, which is too old for the new H100 GPUs (CUDA 11.8+). This means it may hang if run from the default mpnn environment.

To fix this, we can generate a CUDA 12.4 environment as follows:

# Install original env without torch/cuda dependencies
mamba env create -f setup/proteinmpnn_cu12.4.yml -n mpnn_cu12.4

# Install torch/cuda 12.4 dependencies
pip install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 --index-url https://download.pytorch.org/whl/cu124

To use this, simply replace conda activate mpnn with conda activate mpnn_cu2.4 wherever present.

Usage Guidelines:

General Usage

The different input arguments available for each script can be viewed by adding -h to your python call (e.g., python generate_json.py -h).

ProteinMPNN accepts PDB files as input and produces FASTA files as output.

Unlike the original repo, our ProteinMPNN organizes the different input options (aka arguments) into .flag files:

  • json.flags is used to specify design constraints, like fixed residues and symmetry
  • proteinmpnn.flags is used to specify prediction flags, like which sampling temperature and model variant to use.

In general, there are two steps to running ProteinMPNN:

  1. Run the generate_json.py script and pass it the json.flags file.
  • This makes a new file called proteinmpnn_res_specs.json containing parsed design information.
  1. Run the run_protein_mpnn.py script and pass it proteinmpnn.flags and proteinmpnn_res_specs.json to obtain the actual ProteinMPNN prediction.

Useful Flags

Used in json.flags:

--default_design_setting: this is an optional filter to allow/disallow certain residue types during design. By default, it is set to all, which allows all 20 amino acids. Possible settings include: all-hydphob: exclude hydrophobic residues (CDEHKNPQRSTX) all-hydphil: exclude hydrophilic residues (ACFGILMPVWYX) all-CLD: exclude specific amino acids (in this case, Cys, Leu, and Asp) L+polar: mix-and-match amino acids and categories (in this case, allow all polar amino acids and also Leu)

Used in proteinmpnn.flags: --model_name: specifies which ProteinMPNN model checkpoint to use. Possible options include: v_48_002: vanilla (default) model with k=48 neighbors and 0.02A noise s_48_010: soluble protein model with k=48 neighbors and 0.1A noise

--sampling_temp: specifies the sampling temperature, which changes how diverse the generated sequences will be. Ranges from 0 to 1, inclusive. A temperature of 0 returns the "best" prediction every time (zero diversity), while a temperature of 1 will return completely random samples. Recommended range is 0.0 - 0.3 or so.

--dump_probs: if included, ProteinMPNN will save the predicted sequence probability table for each scaffold. This will be a numpy array of shape [L, 21], for a protein of length L. If multiple sequences are generated per scaffold, probabilities will be averaged before saving. A helper script for visualizing these tables is included at run/helper_scripts/other_tools/view_probs.py.

Example Cases

Example input and expected output files, as well as jobscripts and flag files, for many different design tasks are included in examples/. For a summary and explanation of each example, see examples/EXAMPLES.md. Currently supported protocols include:

  1. Monomer Design (with user-friendly parsing of designable residues)
  2. Binder Design
  3. Oligomer Design (with support for abitrary symmetries in homooligomers)
  4. Multi-state Design (with support for multiple complex design constraints)

Unit Testing

TODO

Code organization:

  • run/run_protein_mpnn.py - the main script to initialialize and run the model.
  • run/generate_json.py - function to automatically generate json of design constraints.
  • run/helper_scripts/ - helper functions to parse PDBs, assign which chains to design, which residues to fix, adding AA bias, tying residues etc.
  • examples/ - simple example inputs/outputs and runscripts for different tasks.
  • model_weights/ - trained proteinmpnn model weights.
    • v_48_... - vanilla proteinmpnn models trained at different noise levels.
    • s_48_... - solublempnn models trained at different noise levels.
    • ca_48_... - Ca-only models trained at different noise levels.

License

ProteinMPNN is distributed under an MIT license, which can be found at proteinmpnn/LICENSE. See license file for more details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

proteinmpnn_cli-0.2.0.tar.gz (239.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

proteinmpnn_cli-0.2.0-py3-none-any.whl (70.3 kB view details)

Uploaded Python 3

File details

Details for the file proteinmpnn_cli-0.2.0.tar.gz.

File metadata

  • Download URL: proteinmpnn_cli-0.2.0.tar.gz
  • Upload date:
  • Size: 239.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.17 {"installer":{"name":"uv","version":"0.11.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for proteinmpnn_cli-0.2.0.tar.gz
Algorithm Hash digest
SHA256 3e581b7163bcab407e4a8e54b5c2444a77a80f66bb61c11beea979bbcf3f5d51
MD5 1367d5317132dca082a014e9c8dd9a07
BLAKE2b-256 2fcdcb8e6208a2536b4d3c03c0a3d5813ec6e9c127ecde272555f50e3ccba542

See more details on using hashes here.

File details

Details for the file proteinmpnn_cli-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: proteinmpnn_cli-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 70.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.17 {"installer":{"name":"uv","version":"0.11.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for proteinmpnn_cli-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 358effc67ade00874b24e5730855263ec5ba4481ebe7a21fef43ef8dc591e1b5
MD5 159752f12b0b4bcb07228fd927385de5
BLAKE2b-256 d467caef2186e1aab7ee811e117efa9dd4e2c87a0a678284f9e1462729109803

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page