Skip to main content

PARL (Parallel-Agent Reinforcement Learning) - A training paradigm for coordinating multiple agents in parallel workflows

Project description

PARL: Parallel-Agent Reinforcement Learning

License Python 3.8+ PyTorch

Official open-source implementation of PARL (Parallel-Agent Reinforcement Learning), a novel training paradigm that enables AI models to decompose complex tasks into parallel subtasks and coordinate multiple agents simultaneously.

Overview

PARL is a training methodology that addresses the critical challenge of serial collapse in multi-agent systems, where models default to sequential execution despite having parallel computational capacity. By implementing staged reward shaping and a latency-oriented evaluation metric, PARL trains models to efficiently orchestrate up to 100 sub-agents across 1,500+ coordinated steps.

Key Features

  • Staged Reward Shaping: Dynamic reward annealing that encourages parallelism early in training and gradually shifts focus toward task success
  • Instantiation Reward: Incentivizes subagent creation and concurrent execution
  • Critical Steps Metric: Latency-oriented evaluation inspired by parallel computation's critical path concept
  • Differentiable Components: Fully compatible with gradient-based optimization
  • Orchestrator-Subagent Architecture: Trainable coordinator with frozen execution agents

Architecture

┌─────────────────────────────────────────────┐
│         Orchestrator Agent                  │
│  (Trainable Central Coordinator)            │
│  - Decomposes tasks into subtasks           │
│  - Manages parallel execution               │
│  - Coordinates subagent workflows           │
└──────────────┬──────────────────────────────┘
               │
               ├──────────┬──────────┬─────────┐
               │          │          │         │
          ┌────▼───┐ ┌───▼────┐ ┌──▼────┐  ┌─▼──────┐
          │Subagent│ │Subagent│ │Subagent│  │Subagent│
          │   1    │ │   2    │ │   3    │  │  ...N  │
          └────────┘ └────────┘ └────────┘  └────────┘
           (Frozen)   (Frozen)   (Frozen)    (Frozen)

Reward Function

PARL implements a two-component reward structure:

R_t = λ_aux(e) · r_parallel + (1 - λ_aux(e)) · (𝟙[success] · Q(τ))

Where:

  • λ_aux(e): Anneals from 0.1 → 0.0 over training
  • r_parallel: Instantiation reward encouraging parallelism
  • 𝟙[success]: Binary success indicator
  • Q(τ): End-to-end task quality metric

Critical Steps Metric

Instead of counting total steps, PARL uses a latency-oriented metric:

CriticalSteps = Σ(S_main^(t) + max_i S_sub,i^(t))

This metric captures the true execution time considering parallel operations.

Installation

Using Poetry (Recommended)

# Clone the repository
git clone https://github.com/The-Swarm-Corporation/PARL.git
cd PARL

# Install dependencies with Poetry
poetry install

# Activate the virtual environment
poetry shell

Using pip

# Clone the repository
git clone https://github.com/The-Swarm-Corporation/PARL.git
cd PARL

# Install dependencies
pip install -r requirements.txt

From PyPI

pip install parl-rl

Quick Start

import torch
from parl import PARLReward, CriticalStepsMetric

# Initialize the reward function
reward_fn = PARLReward(
    lambda_init=0.1,
    lambda_final=0.0,
    total_training_steps=10000,
    device='cuda' if torch.cuda.is_available() else 'cpu'
)

# Prepare episode data
num_subagents = torch.tensor([25, 30, 40])  # Number of subagents per episode
trajectory_features = torch.randn(3, 64)     # Trajectory features
success = torch.tensor([1.0, 1.0, 0.0])      # Success indicators
training_step = 5000                          # Current training step

# Compute rewards
rewards = reward_fn.compute_full_reward(
    num_subagents=num_subagents,
    trajectory_features=trajectory_features,
    success=success,
    training_step=training_step,
    max_subagents=100
)

print(f"Total Reward: {rewards['total_reward']}")
print(f"Lambda (λ_aux): {rewards['lambda_aux']:.4f}")
print(f"Parallelism Component: {rewards['instantiation_component']}")
print(f"Task Success Component: {rewards['task_component']}")

# Evaluate using Critical Steps metric
critical_steps_metric = CriticalStepsMetric()

main_steps = torch.ones(3, 5) * 0.1  # Orchestration overhead
sub_steps = torch.rand(3, 5, 10)      # Subagent steps

critical_steps = critical_steps_metric(main_steps, sub_steps)
print(f"Critical Steps: {critical_steps}")

API Reference

PARLReward

Main reward function implementing staged reward shaping.

Parameters:

  • lambda_init (float): Initial auxiliary reward weight (default: 0.1)
  • lambda_final (float): Final auxiliary reward weight (default: 0.0)
  • total_training_steps (int): Total training steps for annealing (default: 10000)
  • device (str): Device for computation ('cpu' or 'cuda')

Methods:

  • compute_full_reward(): Compute all reward components
  • compute_instantiation_reward(): Calculate parallelism incentive
  • compute_task_quality(): Calculate task success quality
  • anneal_lambda(): Get current λ_aux value

CriticalStepsMetric

Latency-oriented evaluation metric for parallel execution.

Parameters:

  • orchestration_overhead (float): Overhead for orchestrator coordination (default: 0.1)

Methods:

  • forward(): Compute critical steps for parallel workflows

Experiments

Run the example training simulation:

python -m parl.main

This will demonstrate reward evolution across training stages and critical steps computation.

Testing

Run the comprehensive test suite:

# Using pytest
pytest tests/ -v

# With coverage report
pytest tests/ --cov=parl --cov-report=html

# Run specific test file
pytest tests/test_parl.py -v

Research Paper

This implementation is based on the technical report:

"PARL: Parallel-Agent Reinforcement Learning for Large Language Models" Kimi AI Research Team, 2026

For technical details and experimental results, see: Kimi K2.5 Technical Report

Citation

If you use PARL in your research, please cite:

@article{parl2026,
  title={PARL: Parallel-Agent Reinforcement Learning for Large Language Models},
  author={Kimi AI Research Team},
  journal={Technical Report},
  year={2026},
  url={https://www.kimi.com/blog/kimi-k2-5.html}
}

Project Structure

PARL/
├── parl/
│   ├── __init__.py         # Package initialization
│   └── main.py             # Core PARL implementation
├── tests/
│   └── test_parl.py        # Comprehensive test suite
├── pyproject.toml          # Poetry configuration
├── README.md               # This file
├── LICENSE                 # Apache 2.0 License
└── .gitignore              # Git ignore rules

Requirements

  • Python >= 3.8
  • PyTorch >= 2.0.0
  • NumPy >= 1.24.0

Contributing

We welcome contributions! Please follow these steps:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Please ensure your code passes all tests and follows PEP 8 style guidelines.

License

This project is licensed under the Apache 2.0 License - see the LICENSE file for details.

Acknowledgments

  • Inspired by the Kimi K2.5 technical report
  • Built on PyTorch's efficient tensor operations
  • Thanks to the open-source ML community

Contact


Made with ⚡ by The Swarm Corporation

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

open_parl-0.1.0.tar.gz (10.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

open_parl-0.1.0-py2.py3-none-any.whl (11.2 kB view details)

Uploaded Python 2Python 3

File details

Details for the file open_parl-0.1.0.tar.gz.

File metadata

  • Download URL: open_parl-0.1.0.tar.gz
  • Upload date:
  • Size: 10.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.12.3 Darwin/24.5.0

File hashes

Hashes for open_parl-0.1.0.tar.gz
Algorithm Hash digest
SHA256 ec49506dec2410f27696b71c9a3ece16e0a0b52ce90a7d048f162deaa998e5c0
MD5 ad67e68ba463c7a5f50401d2c98f66b8
BLAKE2b-256 5fc7d24ad82f59abb9275b1dff53c1aae45b10e938038761994e5a85aee58f76

See more details on using hashes here.

File details

Details for the file open_parl-0.1.0-py2.py3-none-any.whl.

File metadata

  • Download URL: open_parl-0.1.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 11.2 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.12.3 Darwin/24.5.0

File hashes

Hashes for open_parl-0.1.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 b15e93935c02fa898e0510d650fcafc66962175a12defdb685eb4d982c03089b
MD5 ed2994b5654a273650ba70b3fc9ffb42
BLAKE2b-256 9bf2585480d0ac83c20463cad11d754dff2dfc5b53ebc857606e5153b4e3ed8a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page