Reinforcement learning environments for fine-tuning language models for reasoning tasks.
Project description
🤖 AI Gym
Reinforcement learning environments for AI fine-tuning
aigym is a library that provides a suite of reinforcement learning (RL)
environments primarily for the purpose of fine-tuning pre-trained models - namely
language models - for various reasoning tasks.
Built on top of the gymnasium API, the objective of this project is to expose a light-weight and extensible environments to fine-tune language models with techniques like PPO and GRPO.
It is designed to complement training frameworks like trl, transformers, pytorch, and pytorch lightning.
See the project roadmap here
Installation
pip install aigym
Development Installation
Install uv:
pip install uv
Create a virtual environment:
uv venv
Activate the virtual environment:
source .venv/bin/activate
Install the package:
uv sync --extra ollama --group dev
Install ollama to run a local model: https://ollama.com/download
Usage
The examples directory contains examples on how to use the aigym environments.
Run an ollama-based agent on the Wikipedia maze environment:
python examples/ollama_agent.py
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file aigym-0.0.2-py3-none-any.whl.
File metadata
- Download URL: aigym-0.0.2-py3-none-any.whl
- Upload date:
- Size: 17.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5fdd0e957f706daf7926b34bf759fc6f3a21193b223a0b55a514238cb8d12506
|
|
| MD5 |
6b37875045479833c69f8e8f2cfc26cc
|
|
| BLAKE2b-256 |
548a90fef647a7fd73903e9dd9ee563309ceba50788a9bbede0a9a3af80e5cc5
|