Agent Gym - Pytorch
Project description
Agent Gym
Convert any model into a r1-like reasoning hyper-intelligent agent. Leverages TRL, Huggingface, and various other libraries. This is a work in progress. Our goal is to make it easy to train any model into a reasoning agent.
Installation
pip3 install -U agentgym
Usage
from agentgym.r1_pipeline import R1Pipeline, SFTConfig
r1_pipeline = R1Pipeline(sft_model="gpt2", sft_dataset="stanfordnlp/imdb", sft_args=SFTConfig(output_dir="/tmp"))
r1_pipeline.run()
Architecture
The architecture is as follows:
- SFT: Supervised Fine-Tuning
- GRPO: Generative Reinforcement Policy Optimization
-> model -> sft -> grpo -> model
graph TD;
A[model] --> B[sft]
B --> C[grpo]
C --> D[reasoning model]
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
agentgym-0.0.2.tar.gz
(8.9 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agentgym-0.0.2.tar.gz.
File metadata
- Download URL: agentgym-0.0.2.tar.gz
- Upload date:
- Size: 8.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.12.8 Darwin/23.3.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1a62ef68d0173f749dbc1eaf6c7f1e35d38d6ad2c7bb80baa3be0d4df1dcd181
|
|
| MD5 |
7e2704723c0f73cb577d66afe40841de
|
|
| BLAKE2b-256 |
b92b279cbbe392b6608dbf251032593b7cac7a40f8c8661d1e4206ac3182abe2
|
File details
Details for the file agentgym-0.0.2-py3-none-any.whl.
File metadata
- Download URL: agentgym-0.0.2-py3-none-any.whl
- Upload date:
- Size: 9.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.12.8 Darwin/23.3.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8f987431f6429283e5345bcddf8d9b66afc5940c3c54f3b06639c65e9c7cb022
|
|
| MD5 |
a3d52504fa03c6710d2bbe1024a2e5f9
|
|
| BLAKE2b-256 |
670ae4b5639b379da18f947018aef7e31bd3ca0cd81735b1f73d4a77ce229f95
|