A Gymnasium environment for the Hitori puzzle game.

These details have not been verified by PyPI

Project links

Project description

Hitori Gym 🧩

A Gymnasium environment for the Japanese puzzle game Hitori.

This environment is specifically designed to train Maskable Reinforcement Learning agents (like MaskablePPO), leveraging a dynamic action mask to prevent illegal moves and dramatically simplify the learning process.

🚀 Installation

pip install hitori-gym

🎮 Usage

Here is a simple example of how to use the Hitori environment with a random agent that respects the action mask.

import gymnasium as gym
import hitori_env
import numpy as np

# Create the Hitori environment
env = gym.make("hitori_env/Hitori-v2", size=5, render_mode="human")

# Reset the environment to get the initial observation
# You can also pass a seed for reproducibility and log the solution for debugging
observation, info = env.reset(seed=42, options={"log_solution": True})

# Run the environment for a certain number of steps
for step in range(1000):
    # Render the environment
    env.render()
    
    # --- CRITICAL: Use the action_mask to find valid actions ---
    action_mask = env.unwrapped.action_masks()
    valid_actions = np.where(action_mask == 1)[0]
    
    # Check if the agent is stuck (no valid moves left)
    if len(valid_actions) == 0:
        print("Agent is stuck! Game Over (Fail).")
        terminated = True
    else:
        # Choose a random valid action
        action = np.random.choice(valid_actions)

        # Take a step in the environment
        observation, reward, terminated, truncated, info = env.step(action)
        
        if terminated and reward > 0:
            print(f"Game Solved in {step + 1} steps!")

    # If the episode is over, reset the environment
    if terminated or truncated:
        observation, info = env.reset()

env.close()

🤔 The Hitori Puzzle

Hitori is a logic puzzle played on a grid of numbers. The goal is to shade cells according to three rules:

No Duplicates in Unshaded Cells: In each row and column, every unshaded number must be unique.
No Adjacent Shaded Cells: Shaded cells cannot be adjacent to each other (horizontally or vertically).
All Unshaded Cells Must Be Connected: The unshaded cells must form a single, continuous area.

The puzzle is solved when all three conditions are met.

💡 Why Maskable Reinforcement Learning?

Hitori is a perfect use case for maskable RL agents. At any given step, the vast majority of actions (shading a cell) are illegal.

Massive Search Space: For a 5x5 grid, there are 25 possible actions, but often only a few are valid. A standard RL agent would waste an enormous amount of time learning to avoid illegal moves.
Complex Rules: The rules for what makes a move illegal are complex and depend on the global state of the board.

This environment solves that problem by providing an action mask on every step. The agent can use this mask to "see" only the valid moves, pruning the decision tree and making learning dramatically more efficient.

Action Masking Logic

The action_mask is a binary vector where a 1 indicates a valid move. An action (shading a cell) is considered illegal if it violates any of the following core Hitori rules:

Cell Already Shaded: The cell is already shaded.
Creates Adjacent Shading: Shading the cell would place it next to an already shaded cell.
Disconnects Unshaded Cells: Shading the cell would split the group of unshaded cells into two or more separate regions (i.e., it's an articulation point).
Cannot Shade an Already-Unique Number: A cell cannot be shaded if its number is already the only one of its kind in its row and the only one of its kind in its column. Such a number can never be a "duplicate," so there is no reason to shade it.

By enforcing these rules, the environment guarantees that the agent can only take valid steps toward a solution.

🕹️ Demo

Here is a demonstration of a Hitori game in this environment:

Hitori Gym Demo

The repository includes a playground.py script that allows you to manually play the Hitori game. This script is not part of the packaged library but is useful for testing and understanding the game mechanics.

To use it, run the following command:

python playground.py

🔍 Environment Details

Observation Space

The observation space is a dictionary containing the puzzle state:

game_grid: An NxN grid representing the puzzle board, with each cell containing a number from 1 to N.
shaded: An NxN binary grid indicating which cells are currently shaded (1 for shaded, 0 for unshaded).

spaces.Dict({
    "game_grid": spaces.Box(low=1, high=self.size, shape=(size, size), dtype=np.uint32),
    "shaded": spaces.MultiBinary((size, size)),
})

Action Space

The action space is a Discrete space of size N*N, where each action corresponds to shading a cell in row-major order. The agent should only select actions where the action_mask is 1.

spaces.Discrete(size * size)

Rewards

The reward structure is designed to be simple and effective, especially since illegal moves are prevented by the mask.

Outcome	Reward	Description	Termination
Win (Puzzle Solved)	`+1.0`	The current state is a complete and valid solution.	`True`
Stuck (No valid moves)	`-1.0`	The agent has no valid moves left and has not won.	`True`
Valid Step Taken	`-0.01`	A small penalty to encourage finding the shortest solution.	`False`

⚙️ How It Works

The environment is built with a few key components:

hitori.py: The main gym.Env implementation. It handles the game logic, state transitions, rendering, and—most importantly—the dynamic generation of the action_mask on every step.
hitori_generator.py: A utility that generates valid, solvable Hitori puzzles of a given size.
hitori_solution.py: A backtracking solver that can find a valid solution for a given Hitori puzzle. This is used internally for debugging and can be enabled via an option in env.reset().

💻 Development

To set up the project for development, clone the repository and install it in editable mode:

git clone https://github.com/your-username/hitori-gym.git
cd hitori-gym
pip install -e .

📄 License

This project is licensed under the MIT License. See the LICENSE file for details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.0.3

Apr 22, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hitori_gym-0.0.3.tar.gz (32.7 MB view details)

Uploaded Apr 22, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

hitori_gym-0.0.3-py3-none-any.whl (18.8 kB view details)

Uploaded Apr 22, 2026 Python 3

File details

Details for the file hitori_gym-0.0.3.tar.gz.

File metadata

Download URL: hitori_gym-0.0.3.tar.gz
Upload date: Apr 22, 2026
Size: 32.7 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.8

File hashes

Hashes for hitori_gym-0.0.3.tar.gz
Algorithm	Hash digest
SHA256	`32437d139593b748e5474004ea296d43ad67b7567d513274bc90ed1b610da704`
MD5	`81c4fd2b69055cbf5d9695791ddc4ec9`
BLAKE2b-256	`4f351f85a342f6216c01d3936ba928480c881962920333ef924afa9f0d7cf08e`

See more details on using hashes here.

File details

Details for the file hitori_gym-0.0.3-py3-none-any.whl.

File metadata

Download URL: hitori_gym-0.0.3-py3-none-any.whl
Upload date: Apr 22, 2026
Size: 18.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.8

File hashes

Hashes for hitori_gym-0.0.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`eab26e03d7cec8400eb5e09e47da6c2d593d075c8f88200f1b286b22844aba11`
MD5	`30c24bde0642eee96ba6590af3889f7b`
BLAKE2b-256	`62c52a3e4a6e9ad58969adb56e7a65df43fe753f84ab341055629d435473fdb3`

See more details on using hashes here.

hitori-gym 0.0.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Hitori Gym 🧩

🚀 Installation

🎮 Usage

🤔 The Hitori Puzzle

💡 Why Maskable Reinforcement Learning?

Action Masking Logic

🕹️ Demo

🔍 Environment Details

Observation Space

Action Space

Rewards

⚙️ How It Works

💻 Development

📄 License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes