Skip to main content

A General RL framerwork in a swarm environment

Project description

GenRL: Building Flexible, Decentralized Multi-Agent RL Environments

🌐 Visit Website    🧠 RL Swarm    ✖️ X    🤗 Hugging Face    💬 Discord

🔬 Research    📰 Latest News    💼 Work for Gensyn    📊 Dashboard

GenRL is a framework that provides native support for horizontally scalable, multi-agent, multi-stage RL with decentralized coordination and communication.

Customizable Components:

  • DataManager: Specifies and manages the particular data your RL environment will use. This could be a text dataset, an image dataset, a chessboard, or something else entirely.
  • RewardManager: This is where you implement your custom reward functions, directly shaping the RL objective for your agents.
  • Trainer: Performs two functions
    • Train: Manages the core learning process itself, this is where policy updates happen. Whether you're working with policy gradient optimization, value function approximation, or other RL paradigms, the algorithmic policy updates take place here.
    • Generation: Handles the generation of rollouts and agent interactions within the environment.

Core Components

  • GameManager: Seamlessly coordinates the data flow between the core components you define and the other agents in the multi-agent swarm.
  • CommunicationManager: Handles the communication between the agents in the swarm. Current backends include
    • HiveMind: A decentralized communication protocol that allows agents to communicate with each other.
    • Torch Distributed: A distributed training protocol that allows agents to train with each other.

Optional Components

  • Coordination: Handles coordination and orchestration between agents in a decentralized swarm. This is implemented using smart contracts on the blockchain and is only required when running in a decentralized swarm.

Framework Defined Progression

We track the progression of the game on a per-round basis. Each round the data manager initializes round data. The round data kicks off the game’s stages, for each stage rollouts are generated, appended to the game state, and communicated to the swarm. After the agent has progressed through the game’s predefined stages, rewards are evaluated and policies are updated. The user has full control over the update, which occurs in the Trainer.train method, and so has the opportunity to update the policy on a per stage or per round basis. orchestrated data flow through the framework

Example Usage

pip install .[examples]
export NUM_PROC_PER_NODE=1
export NUM_NODES=1
export MASTER_ADDR="localhost"
export MASTER_PORT=29500
./scripts/train.sh $NUM_NODES $NUM_PROC_PER_NODE multistage_math msm_dapodata_grpo.yaml

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gensyn_genrl-0.1.6.tar.gz (321.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gensyn_genrl-0.1.6-py3-none-any.whl (99.3 kB view details)

Uploaded Python 3

File details

Details for the file gensyn_genrl-0.1.6.tar.gz.

File metadata

  • Download URL: gensyn_genrl-0.1.6.tar.gz
  • Upload date:
  • Size: 321.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for gensyn_genrl-0.1.6.tar.gz
Algorithm Hash digest
SHA256 4d4ede5a7d527591b9670aff94643baa4877d876f88025f1c926bb4f8997ff78
MD5 29d84548d6ba7ce98a1631355ec22ef5
BLAKE2b-256 0d35c71dcbd45a82dcdb27b56e0a6e59bd45b5959d20831cca6e628a32119ef3

See more details on using hashes here.

File details

Details for the file gensyn_genrl-0.1.6-py3-none-any.whl.

File metadata

  • Download URL: gensyn_genrl-0.1.6-py3-none-any.whl
  • Upload date:
  • Size: 99.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for gensyn_genrl-0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 a89ae95c276cba3028c2b1bddff1d4c6ef845072ec5629b3b277f73875b77f35
MD5 14e01d2f41f71d24e5564a82c2b8ff95
BLAKE2b-256 663db1ed2733bedc6682db88b6b8fcd59ba9c662741924970af7db652094236c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page