A reinforcement learning environment for the 2048 game based on Gymnasium
Project description
Gymnasium 2048
Gymnasium environment for the Game 2048 and game-playing agents using temporal difference learning of n-tuple networks.
https://github.com/Quentin18/gymnasium-2048/assets/58831477/c630a605-d1da-412a-a284-75f5c28bab46
Action Space | spaces.Discrete(4) |
Observation Space | spaces.Box(low=0, high=1, shape=(4, 4, 16), dtype=np.uint8) |
Import | gymnasium.make("gymnasium_2048:gymnasium_2048/TwentyFortyEight-v0") |
Installation
To install gymnasium-2048
with pip, execute:
pip install gymnasium_2048
From source:
git clone https://github.com/Quentin18/gymnasium-2048
cd gymnasium-2048/
pip install -e .
Environment
Action Space
The action is an integer representing the direction to slide the tiles:
Direction | Action |
---|---|
0 | UP |
1 | RIGHT |
2 | DOWN |
3 | LEFT |
Observation Space
The observation is a 3D ndarray
encoding the board state. It is encoded into 16 channels, where each channel is a 4x4
binary image. The i-th channel marks each cell of the game position that contains the i-th tile as 1, and 0 otherwise.
Each channel represents the positions of empty cells, 2-tiles, 4-tiles, ... , and 32768-tiles, respectively.
This representation is mostly used for deep convolutional neural networks (DCNN).
Rewards
At each step, for each tile merge, the player gains a reward equal to the value of the new tile. The total reward, corresponding to the game score, is the sum of rewards obtained throughout the game.
Starting State
The game starts with two randomly generated tiles. A 2-tile can be generated with probability 0.9 and a 4-tile with probability 0.1.
Episode End
The episode ends if there are no legal moves, i.e., all squares are occupied and there are no two adjacent tiles sharing the same value.
Arguments
size
: the size of the game board. The default value is 4.max_pow
: the maximum power of 2 allowed. The default value is 16.
import gymnasium as gym
gym.make("gymnasium_2048:gymnasium_2048/TwentyFortyEight-v0", size=4, max_pow=16)
Usage
To use the training and evaluation scripts, install the training
dependencies:
pip install .[training]
Play Manually
To play the game manually with the four arrows of your keyboard, execute:
python -m scripts.play
See the arguments with the help command:
python -m scripts.play -h
Train an Agent
To train an agent using temporal difference learning of n-tuple networks, execute:
python -m scripts.train \
--algo tdl \
-n 100000 \
--eval-freq 5000 \
--eval-episode 1000 \
--save-freq 5000 \
--seed 42 \
-o models/tdl
See the arguments with the help command:
python -m scripts.train -h
Plot Training Metrics
To plot training metrics from logs, execute:
python -m scripts.plot \
-i train.log \
-t "Temporal Difference Learning" \
-o figures/training_tdl.png
See the arguments with the help command:
python -m scripts.plot -h
Here are the training metrics of trained policies over episodes:
TDL small | TDL |
---|---|
Enjoy a Trained Agent
To see a trained agent in action, execute:
python -m scripts.enjoy \
--algo tdl \
-i models/tdl/best_n_tuple_network_policy.zip \
-n 1 \
--seed 42
See the arguments with the help command:
python -m scripts.enjoy -h
Evaluate a Trained Agent
To evaluate the performance of a trained agent, execute:
python -m scripts.evaluate \
--algo tdl \
-i models/tdl/best_n_tuple_network_policy.zip \
-n 1000 \
--seed 42 \
-t "Temporal Difference Learning" \
-o figures/stats_tdl.png
See the arguments with the help command:
python -m scripts.evaluate -h
Here are the performances of trained policies:
TDL small | TDL |
---|---|
Random policy performances
Tests
To run tests, execute:
pytest
Citing
To cite the repository in publications:
@misc{gymnasium-2048,
author = {Quentin Deschamps},
title = {Gymnasium 2048},
year = {2023},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/Quentin18/gymnasium-2048}},
}
References
- Szubert and Jaśkowski: Temporal Difference Learning of N-Tuple Networks for the Game 2048
- Guei and Wu: On Reinforcement Learning for the Game of 2048
Author
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file gymnasium_2048-0.0.1.tar.gz
.
File metadata
- Download URL: gymnasium_2048-0.0.1.tar.gz
- Upload date:
- Size: 2.6 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ab60a6c5f968ec6fcdbdc2408ad7a2a4aa8b8a9a3c5adce68efbb6c38725231f |
|
MD5 | 460334d52978a962d2fd2d4ac0188d17 |
|
BLAKE2b-256 | ddd3ff0c3330e7c6ecdba9e0f8d48e7b89502abb56bbe087bfa4b46e83469e85 |
File details
Details for the file gymnasium_2048-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: gymnasium_2048-0.0.1-py3-none-any.whl
- Upload date:
- Size: 14.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 28147aca9e9af1f022222d646129eafd34c9c1c2f23324e89a033fae45628d48 |
|
MD5 | c3e4bf3d4cd57b54a022e7f9eedaae82 |
|
BLAKE2b-256 | 9c04cc40e868993caf6beeb4ca620a76cbb6c148cdea84c3cd453b4dd31ab2e0 |