Pickomino (Heckmeck) Gymnasium environment
Project description
Pickomino-Env
Description
An environment conforming to the Gymnasium API for the dice game Pickomino (Heckmeck am Bratwurmeck) Goal: train a Reinforcement Learning agent for optimal play. Meaning, decide which face of the dice to collect, when to roll and when to stop.
Differences from the Physical Game
If you know the physical game, note the following simplifications:
- Failed Attempt: the highest tile on the table is removed, not turned face-down.
- Tile selection: the best reachable tile is always taken automatically, you cannot choose a lower-valued tile like in the physical game.
- Stealing: always performed when possible, you cannot choose.
- Win condition: determined correctly when playing manually with GUI (most worms win, ties broken by the highest tile). When training without a renderer, no winner is declared; use total reward as your metric. But take care, stolen tiles do not reduce your reward, total reward can exceed your final score.
- Stack height: not included in the observation (visible in the physical game).
Action Space
The action space is MultiDiscrete([6, 2]). The step() method accepts both
the ndarray returned by action_space.sample() and a plain Python tuple.
action = (die_face (0–5), action_type (0=roll, 1=stop))
| Index | die_face | action_type |
|---|---|---|
| 0–5 | Die face to collect: 0→1 eye, 1→2 eyes, 2→3 eyes, 3→4 eyes, 4→5 eyes, 5→worm | — |
| 0–1 | — | 0 = roll again, 1 = stop and take a tile |
Observation Space
The observation is a dict with four keys:
| Key | Min | Max | Shape |
|---|---|---|---|
| dice_collected | 0 | 8 | (6,) |
| dice_rolled | 0 | 8 | (6,) |
| tiles_table | 0 | 1 | (16,) |
| tile_players | 0 | 36 | (number_of_players,) |
There are eight dice, each with faces 1–5 plus a worm. The worm is a sixth distinct die face, but it scores 5 points. The same as the 5-eye face — so it is not a sixth distinct point value.
Note: There are eight dice to roll and collect. A die has six sides with the number of eyes one through five, but a worm instead of a six. The values correspond to the number of eyes, with the worm also having the value five (and not six!). The 16 tiles are numbered 21 to 36 and have worm values from one to four spread in four groups. The game is for two to seven players. Here your Reinforcement Learning Agent is the first player. The other players are computer bots. The bots play, according to a heuristic. When you create the environment, you have to define the number of bots.
For a more detailed description of the rules, see the file pickomino-rulebook.pdf. You can play the game online here: https://www.maartenpoirot.com/pickomino/. The heuristic used by the bots is described here: https://frozenfractal.com/blog/2015/5/3/how-to-win-at-pickomino/.
Rewards
The goal is to collect tiles in a stack. The winner is the player, which at the end of the game has the most worms on her tiles. For the Reinforcement Learning Agent a reward equal to the value (worms) of a tile is given when the tile is picked. For a failed attempt (see rulebook), a corresponding negative reward is given. When a bot steals your tile, no negative reward is given. Hence, the total reward at the end of the game can be greater than the score.
For the full rules see the Pickomino rulebook or play online. To try the environment manually, see Play manually. The bot heuristic is described here.
Info Dictionary
The info dictionary is returned at every step. It is intended for debugging and
logging, not for learning.
| Key | Type | Description |
|---|---|---|
dice_collected |
list[int] |
Counts of each die face collected this turn |
dice_rolled |
list[int] |
Counts of each die face in the current roll |
terminated |
bool |
Whether the episode has terminated |
truncated |
bool |
Whether the game was truncated due to the last action |
tiles_table_vec |
numpy.ndarray[int8], shape (16,) |
Binary vector of tiles currently available on the table |
smallest_tile |
int |
Lowest-numbered tile still on the table |
explanation |
str |
Reason for the last termination, truncation, or failed attempt |
player_stack |
list[int] |
All tiles currently held by the agent |
player_score |
int |
Agent's current score (sum of worm values) |
current_player_index |
int |
Index of the player whose turn it is |
bot_scores |
list[int] |
Scores of all bots, in order |
Starting State
dice_collected= [0, 0, 0, 0, 0, 0].dice_rolled= [3, 0, 1, 2, 0, 2] Random dice, sum = 8.tiles_table= [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1].tile_players= [0, 0, 0] (with number_of_bots = 2).
Episode End
Termination occurs when there are no more tiles to take on the table — Game Over.
Truncation
Truncation occurs when the agent attempts an illegal action during dice selection or rolling (for example, selecting a face that was not rolled, selecting a face already collected this turn, or choosing to roll when no dice remain). The game continues, and a new valid action is required.
Invalid Actions
Out-of-range actions (outside [0–5] or [0–1]) raise a ValueError and do not
affect the episode state.
Failed Attempt
A Failed Attempt occurs when the agent fails to secure a tile. If the agent has a stack of already picked tiles, then the top tile is returned to the table, and a negative reward is applied. If the stack is empty, nothing happens, and the reward is zero. The game continues — the episode does not end.
Arguments
These must be specified.
| Parameter | Type | Default | Description |
|---|---|---|---|
number_of_bots |
int | 1 | Number of bot opponents (1-6) you want to play against |
render_mode |
str or None | None | Visualization mode: None (training), "human" (display), or "rgb_array" (recording) |
Bot Heuristic
The bots use the following heuristic, inspired by Frozen Fractal's strategy:
- Take the highest-contributing face. Select the die face where
count × face valueis greatest. Worms count as 5. - Tie-breaking. When two faces contribute equally: prefer worms over 5s. If still tied, prefer the face with the fewest dice keeping more dice available for future rolls. Hence, for example, three 4s are preferred over four 3s.
- Worm priority on early rolls. If no dice have been collected yet and this is the third roll or later, take worms if available, regardless of contribution.
- Stop as soon as a tile is reachable. Once the running total meets or exceeds the lowest available tile value, and a worm has been collected, the bot stops.
Setup
- Python 3.10–3.14
Installation
We recommend installing in a virtual environment:
python -m venv .venv
# macOS/Linux
source .venv/bin/activate
# Windows PowerShell
.venv\Scripts\Activate.ps1
# Windows cmd.exe
.venv\Scripts\activate.bat
# Windows Git Bash
source .venv/Scripts/activate
pip install pickomino-env
Verify your installation:
pickomino-play
Play manually
Playing a few games manually is a great way to understand the rules and game dynamics before training a Reinforcement Learning agent. Launch the game with the pygame GUI:
pickomino-play
To play against more bots:
pickomino-play --number-of-bots=3
Valid range: 1–6 bots.
To change the bot play speed, adjust the RENDER_DELAY constant in constants.py.
A higher value slows the bots down, a lower value speeds them up.
RENDER_DELAY: Final[float] = 2
Usage example
import gymnasium as gym
# render_mode options:
# None — no rendering, fastest (default, recommended for training)
# "human" — pygame window, requires a display
# "rgb_array" — returns RGB array, useful for recording
env = gym.make("Pickomino-v0", render_mode="human", number_of_bots=2)
# Reset and get initial observation
obs, info = env.reset(seed=42)
# Run one episode
terminated = False
truncated = False
total_reward = 0
while not terminated and not truncated:
# Agent selects action: (die_face, roll_choice)
action = env.action_space.sample() # Random action for demo
# Step environment
obs, reward, terminated, truncated, info = env.step(action)
total_reward += reward
if truncated:
print(f"Invalid action: {info['explanation']}")
print(f"Episode finished. Total reward: {total_reward}")
env.close()
Security & Bug Bounty
Found a bug? Valid reports are rewarded with a physical copy of the Pickomino board game. See SECURITY.md for scope, timelines, and how to report.
Resources
- Game Rules: Pickomino Rulebook
- Play Online: Maarteen Poirot's Pickomino
- Play Board Game Arena: Pickomino with elo system
- Bot Strategy: How to Win at Pickomino
- Repository: smallgig/Pickomino
- Gymnasium: https://gymnasium.farama.org/
License
MIT License. See LICENSE for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pickomino_env-1.4.1.tar.gz.
File metadata
- Download URL: pickomino_env-1.4.1.tar.gz
- Upload date:
- Size: 264.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8e956ebc2079b4e2d91dcde441c056e019c571f343804e32f785b6b16b0923d2
|
|
| MD5 |
2b889e4b175f581dd07d3c5aa09c1d08
|
|
| BLAKE2b-256 |
a439ae48788cd6e8887331f3373f93d59aa8c54a9921814ada3bbb37f17db24f
|
Provenance
The following attestation bundles were made for pickomino_env-1.4.1.tar.gz:
Publisher:
python-publish.yml on smallgig/Pickomino
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pickomino_env-1.4.1.tar.gz -
Subject digest:
8e956ebc2079b4e2d91dcde441c056e019c571f343804e32f785b6b16b0923d2 - Sigstore transparency entry: 1420375281
- Sigstore integration time:
-
Permalink:
smallgig/Pickomino@774f11cd27e852a6f384c72e023e81a3afa4bfb7 -
Branch / Tag:
refs/tags/v1.4.1 - Owner: https://github.com/smallgig
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@774f11cd27e852a6f384c72e023e81a3afa4bfb7 -
Trigger Event:
release
-
Statement type:
File details
Details for the file pickomino_env-1.4.1-py3-none-any.whl.
File metadata
- Download URL: pickomino_env-1.4.1-py3-none-any.whl
- Upload date:
- Size: 266.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c52d907273fb4611b610a93051059cbcad9f3fcb7a07397b8d07174339abcebf
|
|
| MD5 |
c6c4021a8e5cb05aff2627cd33841b42
|
|
| BLAKE2b-256 |
095ded7982be4556bd9e843cc2225a8c5eb784df448e8d281de1ff7a338a34cc
|
Provenance
The following attestation bundles were made for pickomino_env-1.4.1-py3-none-any.whl:
Publisher:
python-publish.yml on smallgig/Pickomino
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pickomino_env-1.4.1-py3-none-any.whl -
Subject digest:
c52d907273fb4611b610a93051059cbcad9f3fcb7a07397b8d07174339abcebf - Sigstore transparency entry: 1420375326
- Sigstore integration time:
-
Permalink:
smallgig/Pickomino@774f11cd27e852a6f384c72e023e81a3afa4bfb7 -
Branch / Tag:
refs/tags/v1.4.1 - Owner: https://github.com/smallgig
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@774f11cd27e852a6f384c72e023e81a3afa4bfb7 -
Trigger Event:
release
-
Statement type: