Skip to main content

Gymnasium-compatible RL environments for therapeutic peptide design

Project description

peptidegym

Gymnasium-Compatible RL Environments for Therapeutic Peptide Design

Python 3.10+ License: MIT Tests PyPI version


PeptideGym provides the first Gymnasium-compatible reinforcement learning environments for therapeutic peptide design. It models peptide construction as a sequential decision process — an RL agent builds peptide sequences residue-by-residue, receiving rewards from pluggable biophysical property predictors. PeptideGym enables researchers to benchmark any Gymnasium-compatible RL algorithm (PPO, DQN, SAC via Stable Baselines3, CleanRL, or RLlib) on peptide design without writing custom training loops.

Three environment families cover distinct therapeutic peptide classes:

  • Antimicrobial peptides (AMPs) — cationic, amphipathic sequences that disrupt microbial membranes
  • Cyclic peptides — macrocyclic binders with enhanced stability and oral bioavailability
  • Vaccine epitopes — short peptides optimized for MHC-I binding and T-cell recognition

Installation

pip install peptidegym              # Core (numpy, gymnasium)
pip install peptidegym[train]       # + SB3, PyTorch for RL training
pip install peptidegym[all]         # Everything

Development install:

git clone https://github.com/HassDhia/peptidegym.git
cd peptidegym
pip install -e ".[all]"

Quick Start

import gymnasium as gym
import peptidegym

env = gym.make("PeptideGym/AMP-v0")
obs, info = env.reset(seed=42)
for _ in range(100):
    action = env.action_space.sample()
    obs, reward, terminated, truncated, info = env.step(action)
    if terminated or truncated:
        print(f"Designed peptide: {info['sequence']} (reward: {reward:.3f})")
        obs, info = env.reset()
env.close()

Train a PPO Agent

from stable_baselines3 import PPO
import gymnasium as gym
import peptidegym

env = gym.make("PeptideGym/AMP-v0")
model = PPO("MlpPolicy", env, verbose=1)
model.learn(total_timesteps=90_000)

# Evaluate
obs, _ = env.reset()
done = False
while not done:
    action, _ = model.predict(obs, deterministic=True)
    obs, reward, terminated, truncated, info = env.step(action)
    done = terminated or truncated
print(f"Designed AMP: {info['sequence']}, Activity: {info.get('activity_score', 'N/A')}")

Environments

Environment Task Action Space Observation Difficulty Tiers
PeptideGym/AMP-v0 Design antimicrobial peptide Discrete(21) — 20 AAs + STOP Sequence + biophysical properties Easy, Medium, Hard
PeptideGym/CyclicPeptide-v0 Design cyclic peptide binder Discrete(24) — 20 AAs + 3 cyclization + linear stop Sequence + properties + cyclization validity Easy, Medium, Hard
PeptideGym/Epitope-v0 Optimize vaccine epitope Discrete(21) — 20 AAs + STOP Sequence + HLA encoding + binding estimate Easy, Medium, Hard

Each environment is available in three difficulty tiers (e.g., PeptideGym/AMP-Easy-v0, PeptideGym/AMP-Hard-v0) for a total of 9 benchmark configurations.

Architecture

┌─────────────────────────────────────────────────────┐
│                   RL Agent (PPO)                     │
│              via Stable Baselines3                   │
└────────────────────┬────────────────────────────────┘
                     │ action (amino acid or special)
                     ▼
┌─────────────────────────────────────────────────────┐
│              PeptideGym Environment                  │
│  ┌───────────┐  ┌──────────────┐  ┌───────────┐    │
│  │  AMP-v0   │  │CyclicPep-v0  │  │ Epitope-v0│    │
│  └─────┬─────┘  └──────┬───────┘  └─────┬─────┘    │
│        └───────────────┬┘───────────────┘            │
│                        ▼                             │
│            Pluggable RewardBackend                   │
│  ┌──────────────┐  ┌──────────────┐  ┌───────────┐ │
│  │  Heuristic   │  │   AMPlify    │  │ NetMHCpan │ │
│  │  (default)   │  │  (optional)  │  │ (optional)│ │
│  └──────────────┘  └──────────────┘  └───────────┘ │
└─────────────────────────────────────────────────────┘

All environments share the Gymnasium API (reset(), step(), observation_space, action_space). Default heuristic reward backends require no external dependencies. Optional backends (AMPlify, NetMHCpan, MHCflurry) can be swapped in for research-grade reward signals.

Paper

The accompanying paper is available at:

Citation

If you use peptidegym in your research, please cite:

@software{dhia2026peptidegym,
  author = {Dhia, Hass},
  title = {PeptideGym: Gymnasium-Compatible Reinforcement Learning Environments for Therapeutic Peptide Design},
  year = {2026},
  publisher = {Smart Technology Investments Research Institute},
  url = {https://github.com/HassDhia/peptidegym}
}

License

MIT License. See LICENSE for details.

Contact

Hass Dhia -- Smart Technology Investments Research Institute

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

peptidegym-0.1.0.tar.gz (27.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

peptidegym-0.1.0-py3-none-any.whl (30.4 kB view details)

Uploaded Python 3

File details

Details for the file peptidegym-0.1.0.tar.gz.

File metadata

  • Download URL: peptidegym-0.1.0.tar.gz
  • Upload date:
  • Size: 27.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.6

File hashes

Hashes for peptidegym-0.1.0.tar.gz
Algorithm Hash digest
SHA256 833b965f4690c13ace3526ed77070f871fad87cd7cb933db792c5f8cc73fe05b
MD5 de578d25f6e0259f22aeeb25773dbb36
BLAKE2b-256 baf9e2e18ad0b02d20c031be908bc8f6d1765f77ac0fbe75572fb9dbb15ba690

See more details on using hashes here.

File details

Details for the file peptidegym-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: peptidegym-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 30.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.6

File hashes

Hashes for peptidegym-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3ff3ace38634f62ecee0a180eff5dd0d3c16ce832941ad2175a397d0be301106
MD5 60e31f193207dc4e6e47bc6225e3901b
BLAKE2b-256 351044a1f986a07743690491dd0ac718fc3f205d3927aa4e7d1eb3f111a0cf35

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page