Skip to main content

Reinforcement learning environments for fine-tuning language models for reasoning tasks.

Project description

🤖 AI Gym

Reinforcement learning environments for AI fine-tuning

aigym is a library that provides a suite of reinforcement learning (RL) environments primarily for the purpose of fine-tuning pre-trained models - namely language models - for various reasoning tasks.

Built on top of the gymnasium API, the objective of this project is to expose a light-weight and extensible environments to fine-tune language models with techniques like PPO and GRPO.

It is designed to complement training frameworks like trl, transformers, pytorch, and pytorch lightning.

See the project roadmap here

Installation

pip install aigym

Development Installation

Install uv:

pip install uv

Create a virtual environment:

uv venv

Activate the virtual environment:

source .venv/bin/activate

Install the package:

uv sync --extra ollama --group dev

Install ollama to run a local model: https://ollama.com/download

Usage

The examples directory contains examples on how to use the aigym environments. Run an ollama-based agent on the Wikipedia maze environment:

python examples/ollama_agent.py

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

aigym-0.0.2-py3-none-any.whl (17.3 kB view details)

Uploaded Python 3

File details

Details for the file aigym-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: aigym-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 17.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for aigym-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 5fdd0e957f706daf7926b34bf759fc6f3a21193b223a0b55a514238cb8d12506
MD5 6b37875045479833c69f8e8f2cfc26cc
BLAKE2b-256 548a90fef647a7fd73903e9dd9ee563309ceba50788a9bbede0a9a3af80e5cc5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page