A reinforcement learning environment for the Search Race CG puzzle based on Gymnasium
Project description
Gymnasium Search Race
Gymnasium environment for the Search Race CodinGame optimization puzzle.
https://github.com/user-attachments/assets/1862b04b-9e33-4f55-a309-ad665a1db2f1
Action Space | Box([-1, 0], [1, 1], float64) |
Observation Space | Box([0, 0, 0, 0, 0, 0, 0, -1, -1, 0], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1], float64) |
import | gymnasium.make("gymnasium_search_race:gymnasium_search_race/SearchRace-v1") |
Installation
To install gymnasium-search-race
with pip, execute:
pip install gymnasium_search_race
From source:
git clone https://github.com/Quentin18/gymnasium-search-race
cd gymnasium-search-race/
pip install -e .
Environment
Action Space
The action is a ndarray
with 2 continuous variables:
- The rotation angle between -18 and 18 degrees, normalized between -1 and 1.
- The thrust between 0 and 200, normalized between 0 and 1.
Observation Space
The observation is a ndarray
of 10 continuous variables:
- 1 if the next checkpoint is the last one, 0 otherwise.
- The x and y coordinates of the next checkpoint.
- The x and y coordinates of the checkpoint after next checkpoint.
- The x and y coordinates of the car.
- The horizontal speed vx and vertical speed vy of the car.
- The facing angle of the car.
The values are normalized between 0 and 1, or -1 and 1 if negative values are allowed.
Reward
The goal is to visit all checkpoints as quickly as possible, as such the agent is penalised with a reward of -0.1
for
each timestep.
When a checkpoint is visited, the agent is awarded with a reward of 1000/total_checkpoints
.
Starting State
The starting state is generated by choosing a random CodinGame test case.
Episode End
The episode ends if either of the following happens:
- Termination: The car visit all checkpoints before the time is out.
- Truncation: Episode length is greater than 600.
Arguments
test_id
: test case id to generate the checkpoints (see choices here). The default value isNone
which selects a test case randomly when thereset
method is called.
import gymnasium as gym
gym.make("gymnasium_search_race:gymnasium_search_race/SearchRace-v1", test_id=1)
Version History
- v1: Add boolean to indicate if the next checkpoint is the last checkpoint in observation
- v0: Initial version
Usage
You can use RL Baselines3 Zoo to train and evaluate agents:
pip install rl_zoo3
Train an Agent
The hyperparameters are defined in hyperparams/ppo.yml
.
To train a PPO agent for the Search Race game, execute:
python -m rl_zoo3.train \
--algo ppo \
--env gymnasium_search_race/SearchRace-v1 \
--tensorboard-log logs \
--eval-freq 10000 \
--eval-episodes 10 \
--gym-packages gymnasium_search_race \
--conf-file hyperparams/ppo.yml \
--progress
Enjoy a Trained Agent
To see a trained agent in action on random test cases, execute:
python -m rl_zoo3.enjoy \
--algo ppo \
--env gymnasium_search_race/SearchRace-v1 \
--n-timesteps 10000 \
--deterministic \
--gym-packages gymnasium_search_race \
--load-best \
--progress
Run Test Cases
To run test cases with a trained agent, execute:
python -m scripts.run_test_cases \
--path rl-trained-agents/ppo/best_model.zip \
--record-video \
--record-metrics
Tests
To run tests, execute:
pytest
Citing
To cite the repository in publications:
@misc{gymnasium-search-race,
author = {Quentin Deschamps},
title = {Gymnasium Search Race},
year = {2024},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/Quentin18/gymnasium-search-race}},
}
References
Author
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for gymnasium_search_race-0.1.0.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5f45e4ae2e201fe77d9902ac038d38f9803e00cc2cc0b07059c27c169e82012f |
|
MD5 | 85aeccb5493357d93fb4d9c3d97e7104 |
|
BLAKE2b-256 | 685024ceea0da3754579a3c2a25097cd0d9298fdae323f762a0687d13a70bf8f |
Hashes for gymnasium_search_race-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a4d4ebdd333ae0146e8fd8ed2ff6d118880971ccad6fea28c623cd378386c604 |
|
MD5 | 5a0426d461968c169171f5ea252a4028 |
|
BLAKE2b-256 | 3c41999606d8def61da066e4025cc67905f253203245f11aab5586a9809fcb36 |