Skip to main content

A Gymnasium environment for the Hitori puzzle game.

Project description

Hitori Gym 🧩

A Gymnasium environment for the Japanese puzzle game Hitori.

This environment is specifically designed to train Maskable Reinforcement Learning agents (like MaskablePPO), leveraging a dynamic action mask to prevent illegal moves and dramatically simplify the learning process.

🚀 Installation

pip install hitori-gym

🎮 Usage

Here is a simple example of how to use the Hitori environment with a random agent that respects the action mask.

import gymnasium as gym
import hitori_env
import numpy as np

# Create the Hitori environment
env = gym.make("hitori_env/Hitori-v2", size=5, render_mode="human")

# Reset the environment to get the initial observation
# You can also pass a seed for reproducibility and log the solution for debugging
observation, info = env.reset(seed=42, options={"log_solution": True})

# Run the environment for a certain number of steps
for step in range(1000):
    # Render the environment
    env.render()
    
    # --- CRITICAL: Use the action_mask to find valid actions ---
    action_mask = env.unwrapped.action_masks()
    valid_actions = np.where(action_mask == 1)[0]
    
    # Check if the agent is stuck (no valid moves left)
    if len(valid_actions) == 0:
        print("Agent is stuck! Game Over (Fail).")
        terminated = True
    else:
        # Choose a random valid action
        action = np.random.choice(valid_actions)

        # Take a step in the environment
        observation, reward, terminated, truncated, info = env.step(action)
        
        if terminated and reward > 0:
            print(f"Game Solved in {step + 1} steps!")

    # If the episode is over, reset the environment
    if terminated or truncated:
        observation, info = env.reset()

env.close()

🤔 The Hitori Puzzle

Hitori is a logic puzzle played on a grid of numbers. The goal is to shade cells according to three rules:

  1. No Duplicates in Unshaded Cells: In each row and column, every unshaded number must be unique.
  2. No Adjacent Shaded Cells: Shaded cells cannot be adjacent to each other (horizontally or vertically).
  3. All Unshaded Cells Must Be Connected: The unshaded cells must form a single, continuous area.

The puzzle is solved when all three conditions are met.

💡 Why Maskable Reinforcement Learning?

Hitori is a perfect use case for maskable RL agents. At any given step, the vast majority of actions (shading a cell) are illegal.

  • Massive Search Space: For a 5x5 grid, there are 25 possible actions, but often only a few are valid. A standard RL agent would waste an enormous amount of time learning to avoid illegal moves.
  • Complex Rules: The rules for what makes a move illegal are complex and depend on the global state of the board.

This environment solves that problem by providing an action mask on every step. The agent can use this mask to "see" only the valid moves, pruning the decision tree and making learning dramatically more efficient.

Action Masking Logic

The action_mask is a binary vector where a 1 indicates a valid move. An action (shading a cell) is considered illegal if it violates any of the following core Hitori rules:

  1. Cell Already Shaded: The cell is already shaded.
  2. Creates Adjacent Shading: Shading the cell would place it next to an already shaded cell.
  3. Disconnects Unshaded Cells: Shading the cell would split the group of unshaded cells into two or more separate regions (i.e., it's an articulation point).
  4. Cannot Shade an Already-Unique Number: A cell cannot be shaded if its number is already the only one of its kind in its row and the only one of its kind in its column. Such a number can never be a "duplicate," so there is no reason to shade it.

By enforcing these rules, the environment guarantees that the agent can only take valid steps toward a solution.

🕹️ Demo

Here is a demonstration of a Hitori game in this environment:

Hitori Gym Demo

The repository includes a playground.py script that allows you to manually play the Hitori game. This script is not part of the packaged library but is useful for testing and understanding the game mechanics.

To use it, run the following command:

python playground.py

🔍 Environment Details

Observation Space

The observation space is a dictionary containing the puzzle state:

  • game_grid: An NxN grid representing the puzzle board, with each cell containing a number from 1 to N.
  • shaded: An NxN binary grid indicating which cells are currently shaded (1 for shaded, 0 for unshaded).
spaces.Dict({
    "game_grid": spaces.Box(low=1, high=self.size, shape=(size, size), dtype=np.uint32),
    "shaded": spaces.MultiBinary((size, size)),
})

Action Space

The action space is a Discrete space of size N*N, where each action corresponds to shading a cell in row-major order. The agent should only select actions where the action_mask is 1.

spaces.Discrete(size * size)

Rewards

The reward structure is designed to be simple and effective, especially since illegal moves are prevented by the mask.

Outcome Reward Description Termination
Win (Puzzle Solved) +1.0 The current state is a complete and valid solution. True
Stuck (No valid moves) -1.0 The agent has no valid moves left and has not won. True
Valid Step Taken -0.01 A small penalty to encourage finding the shortest solution. False

⚙️ How It Works

The environment is built with a few key components:

  • hitori.py: The main gym.Env implementation. It handles the game logic, state transitions, rendering, and—most importantly—the dynamic generation of the action_mask on every step.
  • hitori_generator.py: A utility that generates valid, solvable Hitori puzzles of a given size.
  • hitori_solution.py: A backtracking solver that can find a valid solution for a given Hitori puzzle. This is used internally for debugging and can be enabled via an option in env.reset().

💻 Development

To set up the project for development, clone the repository and install it in editable mode:

git clone https://github.com/your-username/hitori-gym.git
cd hitori-gym
pip install -e .

📄 License

This project is licensed under the MIT License. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hitori_gym-0.0.3.tar.gz (32.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hitori_gym-0.0.3-py3-none-any.whl (18.8 kB view details)

Uploaded Python 3

File details

Details for the file hitori_gym-0.0.3.tar.gz.

File metadata

  • Download URL: hitori_gym-0.0.3.tar.gz
  • Upload date:
  • Size: 32.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.8

File hashes

Hashes for hitori_gym-0.0.3.tar.gz
Algorithm Hash digest
SHA256 32437d139593b748e5474004ea296d43ad67b7567d513274bc90ed1b610da704
MD5 81c4fd2b69055cbf5d9695791ddc4ec9
BLAKE2b-256 4f351f85a342f6216c01d3936ba928480c881962920333ef924afa9f0d7cf08e

See more details on using hashes here.

File details

Details for the file hitori_gym-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: hitori_gym-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 18.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.8

File hashes

Hashes for hitori_gym-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 eab26e03d7cec8400eb5e09e47da6c2d593d075c8f88200f1b286b22844aba11
MD5 30c24bde0642eee96ba6590af3889f7b
BLAKE2b-256 62c52a3e4a6e9ad58969adb56e7a65df43fe753f84ab341055629d435473fdb3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page