PARL (Parallel-Agent Reinforcement Learning) - A training paradigm for coordinating multiple agents in parallel workflows
Project description
PARL: Parallel-Agent Reinforcement Learning
Official open-source implementation of PARL (Parallel-Agent Reinforcement Learning), a novel training paradigm that enables AI models to decompose complex tasks into parallel subtasks and coordinate multiple agents simultaneously.
Overview
PARL is a training methodology that addresses the critical challenge of serial collapse in multi-agent systems, where models default to sequential execution despite having parallel computational capacity. By implementing staged reward shaping and a latency-oriented evaluation metric, PARL trains models to efficiently orchestrate up to 100 sub-agents across 1,500+ coordinated steps.
Key Features
- Staged Reward Shaping: Dynamic reward annealing that encourages parallelism early in training and gradually shifts focus toward task success
- Instantiation Reward: Incentivizes subagent creation and concurrent execution
- Critical Steps Metric: Latency-oriented evaluation inspired by parallel computation's critical path concept
- Differentiable Components: Fully compatible with gradient-based optimization
- Orchestrator-Subagent Architecture: Trainable coordinator with frozen execution agents
Architecture
┌─────────────────────────────────────────────┐
│ Orchestrator Agent │
│ (Trainable Central Coordinator) │
│ - Decomposes tasks into subtasks │
│ - Manages parallel execution │
│ - Coordinates subagent workflows │
└──────────────┬──────────────────────────────┘
│
├──────────┬──────────┬─────────┐
│ │ │ │
┌────▼───┐ ┌───▼────┐ ┌──▼────┐ ┌─▼──────┐
│Subagent│ │Subagent│ │Subagent│ │Subagent│
│ 1 │ │ 2 │ │ 3 │ │ ...N │
└────────┘ └────────┘ └────────┘ └────────┘
(Frozen) (Frozen) (Frozen) (Frozen)
Reward Function
PARL implements a two-component reward structure:
R_t = λ_aux(e) · r_parallel + (1 - λ_aux(e)) · (𝟙[success] · Q(τ))
Where:
λ_aux(e): Anneals from 0.1 → 0.0 over trainingr_parallel: Instantiation reward encouraging parallelism𝟙[success]: Binary success indicatorQ(τ): End-to-end task quality metric
Critical Steps Metric
Instead of counting total steps, PARL uses a latency-oriented metric:
CriticalSteps = Σ(S_main^(t) + max_i S_sub,i^(t))
This metric captures the true execution time considering parallel operations.
Installation
Using Poetry (Recommended)
# Clone the repository
git clone https://github.com/The-Swarm-Corporation/PARL.git
cd PARL
# Install dependencies with Poetry
poetry install
# Activate the virtual environment
poetry shell
Using pip
# Clone the repository
git clone https://github.com/The-Swarm-Corporation/PARL.git
cd PARL
# Install dependencies
pip install -r requirements.txt
From PyPI
pip install parl-rl
Quick Start
import torch
from parl import PARLReward, CriticalStepsMetric
# Initialize the reward function
reward_fn = PARLReward(
lambda_init=0.1,
lambda_final=0.0,
total_training_steps=10000,
device='cuda' if torch.cuda.is_available() else 'cpu'
)
# Prepare episode data
num_subagents = torch.tensor([25, 30, 40]) # Number of subagents per episode
trajectory_features = torch.randn(3, 64) # Trajectory features
success = torch.tensor([1.0, 1.0, 0.0]) # Success indicators
training_step = 5000 # Current training step
# Compute rewards
rewards = reward_fn.compute_full_reward(
num_subagents=num_subagents,
trajectory_features=trajectory_features,
success=success,
training_step=training_step,
max_subagents=100
)
print(f"Total Reward: {rewards['total_reward']}")
print(f"Lambda (λ_aux): {rewards['lambda_aux']:.4f}")
print(f"Parallelism Component: {rewards['instantiation_component']}")
print(f"Task Success Component: {rewards['task_component']}")
# Evaluate using Critical Steps metric
critical_steps_metric = CriticalStepsMetric()
main_steps = torch.ones(3, 5) * 0.1 # Orchestration overhead
sub_steps = torch.rand(3, 5, 10) # Subagent steps
critical_steps = critical_steps_metric(main_steps, sub_steps)
print(f"Critical Steps: {critical_steps}")
API Reference
PARLReward
Main reward function implementing staged reward shaping.
Parameters:
lambda_init(float): Initial auxiliary reward weight (default: 0.1)lambda_final(float): Final auxiliary reward weight (default: 0.0)total_training_steps(int): Total training steps for annealing (default: 10000)device(str): Device for computation ('cpu' or 'cuda')
Methods:
compute_full_reward(): Compute all reward componentscompute_instantiation_reward(): Calculate parallelism incentivecompute_task_quality(): Calculate task success qualityanneal_lambda(): Get current λ_aux value
CriticalStepsMetric
Latency-oriented evaluation metric for parallel execution.
Parameters:
orchestration_overhead(float): Overhead for orchestrator coordination (default: 0.1)
Methods:
forward(): Compute critical steps for parallel workflows
Experiments
Run the example training simulation:
python -m parl.main
This will demonstrate reward evolution across training stages and critical steps computation.
Testing
Run the comprehensive test suite:
# Using pytest
pytest tests/ -v
# With coverage report
pytest tests/ --cov=parl --cov-report=html
# Run specific test file
pytest tests/test_parl.py -v
Research Paper
This implementation is based on the technical report:
"PARL: Parallel-Agent Reinforcement Learning for Large Language Models" Kimi AI Research Team, 2026
For technical details and experimental results, see: Kimi K2.5 Technical Report
Citation
If you use PARL in your research, please cite:
@article{parl2026,
title={PARL: Parallel-Agent Reinforcement Learning for Large Language Models},
author={Kimi AI Research Team},
journal={Technical Report},
year={2026},
url={https://www.kimi.com/blog/kimi-k2-5.html}
}
Project Structure
PARL/
├── parl/
│ ├── __init__.py # Package initialization
│ └── main.py # Core PARL implementation
├── tests/
│ └── test_parl.py # Comprehensive test suite
├── pyproject.toml # Poetry configuration
├── README.md # This file
├── LICENSE # Apache 2.0 License
└── .gitignore # Git ignore rules
Requirements
- Python >= 3.8
- PyTorch >= 2.0.0
- NumPy >= 1.24.0
Contributing
We welcome contributions! Please follow these steps:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
Please ensure your code passes all tests and follows PEP 8 style guidelines.
License
This project is licensed under the Apache 2.0 License - see the LICENSE file for details.
Acknowledgments
- Inspired by the Kimi K2.5 technical report
- Built on PyTorch's efficient tensor operations
- Thanks to the open-source ML community
Contact
- Repository: github.com/The-Swarm-Corporation/PARL
- Issues: github.com/The-Swarm-Corporation/PARL/issues
Made with ⚡ by The Swarm Corporation
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file open_parl-0.1.0.tar.gz.
File metadata
- Download URL: open_parl-0.1.0.tar.gz
- Upload date:
- Size: 10.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.3 CPython/3.12.3 Darwin/24.5.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ec49506dec2410f27696b71c9a3ece16e0a0b52ce90a7d048f162deaa998e5c0
|
|
| MD5 |
ad67e68ba463c7a5f50401d2c98f66b8
|
|
| BLAKE2b-256 |
5fc7d24ad82f59abb9275b1dff53c1aae45b10e938038761994e5a85aee58f76
|
File details
Details for the file open_parl-0.1.0-py2.py3-none-any.whl.
File metadata
- Download URL: open_parl-0.1.0-py2.py3-none-any.whl
- Upload date:
- Size: 11.2 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.3 CPython/3.12.3 Darwin/24.5.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b15e93935c02fa898e0510d650fcafc66962175a12defdb685eb4d982c03089b
|
|
| MD5 |
ed2994b5654a273650ba70b3fc9ffb42
|
|
| BLAKE2b-256 |
9bf2585480d0ac83c20463cad11d754dff2dfc5b53ebc857606e5153b4e3ed8a
|