A reinforcement learning environment for the 2048 game based on Gymnasium

These details have not been verified by PyPI

Project links

Repository

Project description

Gymnasium 2048

Gymnasium environment for the Game 2048 and game-playing agents using temporal difference learning of n-tuple networks.

https://github.com/Quentin18/gymnasium-2048/assets/58831477/c630a605-d1da-412a-a284-75f5c28bab46

Action Space	`spaces.Discrete(4)`
Observation Space	`spaces.Box(low=0, high=1, shape=(4, 4, 16), dtype=np.uint8)`
Import	`gymnasium.make("gymnasium_2048:gymnasium_2048/TwentyFortyEight-v0")`

Installation

To install gymnasium-2048 with pip, execute:

pip install gymnasium_2048

From source:

git clone https://github.com/Quentin18/gymnasium-2048
cd gymnasium-2048/
pip install -e .

Environment

Action Space

The action is an integer representing the direction to slide the tiles:

Direction	Action
0	UP
1	RIGHT
2	DOWN
3	LEFT

Observation Space

The observation is a 3D ndarray encoding the board state. It is encoded into 16 channels, where each channel is a 4x4 binary image. The i-th channel marks each cell of the game position that contains the i-th tile as 1, and 0 otherwise. Each channel represents the positions of empty cells, 2-tiles, 4-tiles, ... , and 32768-tiles, respectively.

Observation

This representation is mostly used for deep convolutional neural networks (DCNN).

Rewards

At each step, for each tile merge, the player gains a reward equal to the value of the new tile. The total reward, corresponding to the game score, is the sum of rewards obtained throughout the game.

Starting State

The game starts with two randomly generated tiles. A 2-tile can be generated with probability 0.9 and a 4-tile with probability 0.1.

Episode End

The episode ends if there are no legal moves, i.e., all squares are occupied and there are no two adjacent tiles sharing the same value.

Arguments

size: the size of the game board. The default value is 4.
max_pow: the maximum power of 2 allowed. The default value is 16.

import gymnasium as gym

gym.make("gymnasium_2048:gymnasium_2048/TwentyFortyEight-v0", size=4, max_pow=16)

Usage

To use the training and evaluation scripts, install the training dependencies:

pip install .[training]

Play Manually

To play the game manually with the four arrows of your keyboard, execute:

python -m scripts.play

See the arguments with the help command:

python -m scripts.play -h

Train an Agent

To train an agent using temporal difference learning of n-tuple networks, execute:

python -m scripts.train \
  --algo tdl \
  -n 100000 \
  --eval-freq 5000 \
  --eval-episode 1000 \
  --save-freq 5000 \
  --seed 42 \
  -o models/tdl

See the arguments with the help command:

python -m scripts.train -h

Plot Training Metrics

To plot training metrics from logs, execute:

python -m scripts.plot \
  -i train.log \
  -t "Temporal Difference Learning" \
  -o figures/training_tdl.png

See the arguments with the help command:

python -m scripts.plot -h

Here are the training metrics of trained policies over episodes:

TDL small	TDL

Enjoy a Trained Agent

To see a trained agent in action, execute:

python -m scripts.enjoy \
  --algo tdl \
  -i models/tdl/best_n_tuple_network_policy.zip \
  -n 1 \
  --seed 42

See the arguments with the help command:

python -m scripts.enjoy -h

Evaluate a Trained Agent

To evaluate the performance of a trained agent, execute:

python -m scripts.evaluate \
  --algo tdl \
  -i models/tdl/best_n_tuple_network_policy.zip \
  -n 1000 \
  --seed 42 \
  -t "Temporal Difference Learning" \
  -o figures/stats_tdl.png

See the arguments with the help command:

python -m scripts.evaluate -h

Here are the performances of trained policies:

TDL small	TDL

Random policy performances

Random policy

Tests

To run tests, execute:

pytest

Citing

To cite the repository in publications:

@misc{gymnasium-2048,
  author = {Quentin Deschamps},
  title = {Gymnasium 2048},
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/Quentin18/gymnasium-2048}},
}

References

Author

Quentin Deschamps

Project details

These details have not been verified by PyPI

Project links

Repository

Release history Release notifications | RSS feed

This version

0.1.0

Oct 26, 2025

0.0.2

Dec 1, 2024

0.0.1

Jan 27, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gymnasium_2048-0.1.0.tar.gz (2.6 MB view details)

Uploaded Oct 26, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

gymnasium_2048-0.1.0-py3-none-any.whl (14.6 kB view details)

Uploaded Oct 26, 2025 Python 3

File details

Details for the file gymnasium_2048-0.1.0.tar.gz.

File metadata

Download URL: gymnasium_2048-0.1.0.tar.gz
Upload date: Oct 26, 2025
Size: 2.6 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for gymnasium_2048-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`52ac36bdcdb287570661aad9022aad00e76d183791e6a42482ae703aa2668334`
MD5	`bc49da5e8b68899a8027d014ad927e47`
BLAKE2b-256	`dc8c6a3bf64ef304df0e128e646539cd0d53442d23b4932e169d4e5a09529776`

See more details on using hashes here.

File details

Details for the file gymnasium_2048-0.1.0-py3-none-any.whl.

File metadata

Download URL: gymnasium_2048-0.1.0-py3-none-any.whl
Upload date: Oct 26, 2025
Size: 14.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for gymnasium_2048-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1196d7c20f1e4e6ac64e0846646059021a84e576ab9d0b68808c9443ef136ef3`
MD5	`2631cfe6fab765c99b2848cd50773711`
BLAKE2b-256	`6e5580fac72cc80251d4e605ea435e3e1878ddd177a5461ce7ddb1033213653c`

See more details on using hashes here.

gymnasium-2048 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Gymnasium 2048

Installation

Environment

Action Space

Observation Space

Rewards

Starting State

Episode End

Arguments

Usage

Play Manually

Train an Agent

Plot Training Metrics

Enjoy a Trained Agent

Evaluate a Trained Agent

Tests

Citing

References

Author

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes