A Gymnasium environment for the Hitori puzzle game.
Project description
Hitori Gym 🧩
A Gymnasium environment for the Japanese puzzle game Hitori.
This environment is specifically designed to train Maskable Reinforcement Learning agents (like MaskablePPO), leveraging a dynamic action mask to prevent illegal moves and dramatically simplify the learning process.
🚀 Installation
pip install hitori-gym
🎮 Usage
Here is a simple example of how to use the Hitori environment with a random agent that respects the action mask.
import gymnasium as gym
import hitori_env
import numpy as np
# Create the Hitori environment
env = gym.make("hitori_env/Hitori-v2", size=5, render_mode="human")
# Reset the environment to get the initial observation
# You can also pass a seed for reproducibility and log the solution for debugging
observation, info = env.reset(seed=42, options={"log_solution": True})
# Run the environment for a certain number of steps
for step in range(1000):
# Render the environment
env.render()
# --- CRITICAL: Use the action_mask to find valid actions ---
action_mask = env.unwrapped.action_masks()
valid_actions = np.where(action_mask == 1)[0]
# Check if the agent is stuck (no valid moves left)
if len(valid_actions) == 0:
print("Agent is stuck! Game Over (Fail).")
terminated = True
else:
# Choose a random valid action
action = np.random.choice(valid_actions)
# Take a step in the environment
observation, reward, terminated, truncated, info = env.step(action)
if terminated and reward > 0:
print(f"Game Solved in {step + 1} steps!")
# If the episode is over, reset the environment
if terminated or truncated:
observation, info = env.reset()
env.close()
🤔 The Hitori Puzzle
Hitori is a logic puzzle played on a grid of numbers. The goal is to shade cells according to three rules:
- No Duplicates in Unshaded Cells: In each row and column, every unshaded number must be unique.
- No Adjacent Shaded Cells: Shaded cells cannot be adjacent to each other (horizontally or vertically).
- All Unshaded Cells Must Be Connected: The unshaded cells must form a single, continuous area.
The puzzle is solved when all three conditions are met.
💡 Why Maskable Reinforcement Learning?
Hitori is a perfect use case for maskable RL agents. At any given step, the vast majority of actions (shading a cell) are illegal.
- Massive Search Space: For a 5x5 grid, there are 25 possible actions, but often only a few are valid. A standard RL agent would waste an enormous amount of time learning to avoid illegal moves.
- Complex Rules: The rules for what makes a move illegal are complex and depend on the global state of the board.
This environment solves that problem by providing an action mask on every step. The agent can use this mask to "see" only the valid moves, pruning the decision tree and making learning dramatically more efficient.
Action Masking Logic
The action_mask is a binary vector where a 1 indicates a valid move. An action (shading a cell) is considered illegal if it violates any of the following core Hitori rules:
- Cell Already Shaded: The cell is already shaded.
- Creates Adjacent Shading: Shading the cell would place it next to an already shaded cell.
- Disconnects Unshaded Cells: Shading the cell would split the group of unshaded cells into two or more separate regions (i.e., it's an articulation point).
- Cannot Shade an Already-Unique Number: A cell cannot be shaded if its number is already the only one of its kind in its row and the only one of its kind in its column. Such a number can never be a "duplicate," so there is no reason to shade it.
By enforcing these rules, the environment guarantees that the agent can only take valid steps toward a solution.
🕹️ Demo
Here is a demonstration of a Hitori game in this environment:
The repository includes a playground.py script that allows you to manually play the Hitori game. This script is not part of the packaged library but is useful for testing and understanding the game mechanics.
To use it, run the following command:
python playground.py
🔍 Environment Details
Observation Space
The observation space is a dictionary containing the puzzle state:
game_grid: AnNxNgrid representing the puzzle board, with each cell containing a number from 1 toN.shaded: AnNxNbinary grid indicating which cells are currently shaded (1 for shaded, 0 for unshaded).
spaces.Dict({
"game_grid": spaces.Box(low=1, high=self.size, shape=(size, size), dtype=np.uint32),
"shaded": spaces.MultiBinary((size, size)),
})
Action Space
The action space is a Discrete space of size N*N, where each action corresponds to shading a cell in row-major order. The agent should only select actions where the action_mask is 1.
spaces.Discrete(size * size)
Rewards
The reward structure is designed to be simple and effective, especially since illegal moves are prevented by the mask.
| Outcome | Reward | Description | Termination |
|---|---|---|---|
| Win (Puzzle Solved) | +1.0 |
The current state is a complete and valid solution. | True |
| Stuck (No valid moves) | -1.0 |
The agent has no valid moves left and has not won. | True |
| Valid Step Taken | -0.01 |
A small penalty to encourage finding the shortest solution. | False |
⚙️ How It Works
The environment is built with a few key components:
hitori.py: The maingym.Envimplementation. It handles the game logic, state transitions, rendering, and—most importantly—the dynamic generation of theaction_maskon every step.hitori_generator.py: A utility that generates valid, solvable Hitori puzzles of a given size.hitori_solution.py: A backtracking solver that can find a valid solution for a given Hitori puzzle. This is used internally for debugging and can be enabled via an option inenv.reset().
💻 Development
To set up the project for development, clone the repository and install it in editable mode:
git clone https://github.com/your-username/hitori-gym.git
cd hitori-gym
pip install -e .
📄 License
This project is licensed under the MIT License. See the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file hitori_gym-0.0.3.tar.gz.
File metadata
- Download URL: hitori_gym-0.0.3.tar.gz
- Upload date:
- Size: 32.7 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
32437d139593b748e5474004ea296d43ad67b7567d513274bc90ed1b610da704
|
|
| MD5 |
81c4fd2b69055cbf5d9695791ddc4ec9
|
|
| BLAKE2b-256 |
4f351f85a342f6216c01d3936ba928480c881962920333ef924afa9f0d7cf08e
|
File details
Details for the file hitori_gym-0.0.3-py3-none-any.whl.
File metadata
- Download URL: hitori_gym-0.0.3-py3-none-any.whl
- Upload date:
- Size: 18.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eab26e03d7cec8400eb5e09e47da6c2d593d075c8f88200f1b286b22844aba11
|
|
| MD5 |
30c24bde0642eee96ba6590af3889f7b
|
|
| BLAKE2b-256 |
62c52a3e4a6e9ad58969adb56e7a65df43fe753f84ab341055629d435473fdb3
|