PARL (Parallel-Agent Reinforcement Learning) - A training paradigm for coordinating multiple agents in parallel workflows

These details have not been verified by PyPI

Project links

Project description

PARL: Parallel-Agent Reinforcement Learning

Official open-source implementation of PARL (Parallel-Agent Reinforcement Learning), a novel training paradigm that enables AI models to decompose complex tasks into parallel subtasks and coordinate multiple agents simultaneously.

Overview

PARL is a training methodology that addresses the critical challenge of serial collapse in multi-agent systems, where models default to sequential execution despite having parallel computational capacity. By implementing staged reward shaping and a latency-oriented evaluation metric, PARL trains models to efficiently orchestrate up to 100 sub-agents across 1,500+ coordinated steps.

Key Features

Staged Reward Shaping: Dynamic reward annealing that encourages parallelism early in training and gradually shifts focus toward task success
Instantiation Reward: Incentivizes subagent creation and concurrent execution
Critical Steps Metric: Latency-oriented evaluation inspired by parallel computation's critical path concept
Differentiable Components: Fully compatible with gradient-based optimization
Orchestrator-Subagent Architecture: Trainable coordinator with frozen execution agents

Architecture

┌─────────────────────────────────────────────┐
│         Orchestrator Agent                  │
│  (Trainable Central Coordinator)            │
│  - Decomposes tasks into subtasks           │
│  - Manages parallel execution               │
│  - Coordinates subagent workflows           │
└──────────────┬──────────────────────────────┘
               │
               ├──────────┬──────────┬─────────┐
               │          │          │         │
          ┌────▼───┐ ┌───▼────┐ ┌──▼────┐  ┌─▼──────┐
          │Subagent│ │Subagent│ │Subagent│  │Subagent│
          │   1    │ │   2    │ │   3    │  │  ...N  │
          └────────┘ └────────┘ └────────┘  └────────┘
           (Frozen)   (Frozen)   (Frozen)    (Frozen)

Reward Function

PARL implements a two-component reward structure:

R_t = λ_aux(e) · r_parallel + (1 - λ_aux(e)) · (𝟙[success] · Q(τ))

Where:

λ_aux(e): Anneals from 0.1 → 0.0 over training
r_parallel: Instantiation reward encouraging parallelism
𝟙[success]: Binary success indicator
Q(τ): End-to-end task quality metric

Critical Steps Metric

Instead of counting total steps, PARL uses a latency-oriented metric:

CriticalSteps = Σ(S_main^(t) + max_i S_sub,i^(t))

This metric captures the true execution time considering parallel operations.

Installation

Using Poetry (Recommended)

# Clone the repository
git clone https://github.com/The-Swarm-Corporation/PARL.git
cd PARL

# Install dependencies with Poetry
poetry install

# Activate the virtual environment
poetry shell

Using pip

# Clone the repository
git clone https://github.com/The-Swarm-Corporation/PARL.git
cd PARL

# Install dependencies
pip install -r requirements.txt

From PyPI

pip install parl-rl

Quick Start

import torch
from parl import PARLReward, CriticalStepsMetric

# Initialize the reward function
reward_fn = PARLReward(
    lambda_init=0.1,
    lambda_final=0.0,
    total_training_steps=10000,
    device='cuda' if torch.cuda.is_available() else 'cpu'
)

# Prepare episode data
num_subagents = torch.tensor([25, 30, 40])  # Number of subagents per episode
trajectory_features = torch.randn(3, 64)     # Trajectory features
success = torch.tensor([1.0, 1.0, 0.0])      # Success indicators
training_step = 5000                          # Current training step

# Compute rewards
rewards = reward_fn.compute_full_reward(
    num_subagents=num_subagents,
    trajectory_features=trajectory_features,
    success=success,
    training_step=training_step,
    max_subagents=100
)

print(f"Total Reward: {rewards['total_reward']}")
print(f"Lambda (λ_aux): {rewards['lambda_aux']:.4f}")
print(f"Parallelism Component: {rewards['instantiation_component']}")
print(f"Task Success Component: {rewards['task_component']}")

# Evaluate using Critical Steps metric
critical_steps_metric = CriticalStepsMetric()

main_steps = torch.ones(3, 5) * 0.1  # Orchestration overhead
sub_steps = torch.rand(3, 5, 10)      # Subagent steps

critical_steps = critical_steps_metric(main_steps, sub_steps)
print(f"Critical Steps: {critical_steps}")

API Reference

`PARLReward`

Main reward function implementing staged reward shaping.

Parameters:

lambda_init (float): Initial auxiliary reward weight (default: 0.1)
lambda_final (float): Final auxiliary reward weight (default: 0.0)
total_training_steps (int): Total training steps for annealing (default: 10000)
device (str): Device for computation ('cpu' or 'cuda')

Methods:

compute_full_reward(): Compute all reward components
compute_instantiation_reward(): Calculate parallelism incentive
compute_task_quality(): Calculate task success quality
anneal_lambda(): Get current λ_aux value

`CriticalStepsMetric`

Latency-oriented evaluation metric for parallel execution.

Parameters:

orchestration_overhead (float): Overhead for orchestrator coordination (default: 0.1)

Methods:

forward(): Compute critical steps for parallel workflows

Experiments

Run the example training simulation:

python -m parl.main

This will demonstrate reward evolution across training stages and critical steps computation.

Testing

Run the comprehensive test suite:

# Using pytest
pytest tests/ -v

# With coverage report
pytest tests/ --cov=parl --cov-report=html

# Run specific test file
pytest tests/test_parl.py -v

Research Paper

This implementation is based on the technical report:

"PARL: Parallel-Agent Reinforcement Learning for Large Language Models" Kimi AI Research Team, 2026

For technical details and experimental results, see: Kimi K2.5 Technical Report

Citation

If you use PARL in your research, please cite:

@article{parl2026,
  title={PARL: Parallel-Agent Reinforcement Learning for Large Language Models},
  author={Kimi AI Research Team},
  journal={Technical Report},
  year={2026},
  url={https://www.kimi.com/blog/kimi-k2-5.html}
}

Project Structure

PARL/
├── parl/
│   ├── __init__.py         # Package initialization
│   └── main.py             # Core PARL implementation
├── tests/
│   └── test_parl.py        # Comprehensive test suite
├── pyproject.toml          # Poetry configuration
├── README.md               # This file
├── LICENSE                 # Apache 2.0 License
└── .gitignore              # Git ignore rules

Requirements

Python >= 3.8
PyTorch >= 2.0.0
NumPy >= 1.24.0

Contributing

We welcome contributions! Please follow these steps:

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Please ensure your code passes all tests and follows PEP 8 style guidelines.

License

This project is licensed under the Apache 2.0 License - see the LICENSE file for details.

Acknowledgments

Inspired by the Kimi K2.5 technical report
Built on PyTorch's efficient tensor operations
Thanks to the open-source ML community

Contact

Repository: github.com/The-Swarm-Corporation/PARL
Issues: github.com/The-Swarm-Corporation/PARL/issues

Made with ⚡ by The Swarm Corporation

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Jan 27, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

open_parl-0.1.0.tar.gz (10.5 kB view details)

Uploaded Jan 27, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

open_parl-0.1.0-py2.py3-none-any.whl (11.2 kB view details)

Uploaded Jan 27, 2026 Python 2Python 3

File details

Details for the file open_parl-0.1.0.tar.gz.

File metadata

Download URL: open_parl-0.1.0.tar.gz
Upload date: Jan 27, 2026
Size: 10.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.1.3 CPython/3.12.3 Darwin/24.5.0

File hashes

Hashes for open_parl-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`ec49506dec2410f27696b71c9a3ece16e0a0b52ce90a7d048f162deaa998e5c0`
MD5	`ad67e68ba463c7a5f50401d2c98f66b8`
BLAKE2b-256	`5fc7d24ad82f59abb9275b1dff53c1aae45b10e938038761994e5a85aee58f76`

See more details on using hashes here.

File details

Details for the file open_parl-0.1.0-py2.py3-none-any.whl.

File metadata

Download URL: open_parl-0.1.0-py2.py3-none-any.whl
Upload date: Jan 27, 2026
Size: 11.2 kB
Tags: Python 2, Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.1.3 CPython/3.12.3 Darwin/24.5.0

File hashes

Hashes for open_parl-0.1.0-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`b15e93935c02fa898e0510d650fcafc66962175a12defdb685eb4d982c03089b`
MD5	`ed2994b5654a273650ba70b3fc9ffb42`
BLAKE2b-256	`9bf2585480d0ac83c20463cad11d754dff2dfc5b53ebc857606e5153b4e3ed8a`

See more details on using hashes here.

open-parl 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

PARL: Parallel-Agent Reinforcement Learning

Overview

Key Features

Architecture

Reward Function

Critical Steps Metric

Installation

Using Poetry (Recommended)

Using pip

From PyPI

Quick Start

API Reference

PARLReward

CriticalStepsMetric

Experiments

Testing

Research Paper

Citation

Project Structure

Requirements

Contributing

License

Acknowledgments

Contact

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`PARLReward`

`CriticalStepsMetric`