A Gymnasium environment for benchmarking spatial reasoning capabilities of AI agents on grid-based puzzles

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

gipplab

These details have not been verified by PyPI

Project description

Spatial-Gym: A Gymnasium Environment for Spatial Reasoning Benchmarking

Abstract

Spatial-Gym is a Gymnasium-compatible environment designed for evaluating spatial reasoning capabilities of Large Language Models (LLMs) and other AI agents. Built upon the spatial puzzle dataset introduced in SPaRC (Kaesberg et al.), this environment provides a standardized interface for benchmarking agent performance on grid-based spatial reasoning tasks. The environment supports multiple observation formats, customizable rendering modes for human and LLM interaction, and comprehensive evaluation metrics for systematic analysis of spatial reasoning abilities.

Key Features

Standardized RL Interface: Full Gymnasium API compliance for seamless integration with existing RL frameworks
Dual Observation Modes: Structured tensor representation or JSON-based symbolic encoding
Multi-Modal Rendering: Human-readable visualizations and LLM-optimized text representations
Flexible Dataset Support: Compatible with HuggingFace datasets following the SPaRC format
Comprehensive Metrics: Episode-level tracking of success rate, path efficiency, and reasoning patterns
Backtracking Support: Optional state reversibility for exploring different solution strategies

Installation

From PyPI

pip install Spatial-Gym

From Source

git clone https://github.com/lkaesberg/Spatial-Gym.git
cd Spatial-Gym
pip install -e .

Quick Start

import gymnasium as gym
import Spatial_Gym

# Initialize environment with default configuration
env = gym.make(
    "Spatial-Gym",
    df_name='lkaesberg/SPaRC',
    df_split='all',
    df_set='test',
    render_mode='human',
    observation='new',
    traceback=True,
    max_steps=1000
)

# Standard RL loop
observation, info = env.reset()
terminated = False

while not terminated:
    action = env.action_space.sample()  # Replace with your agent
    observation, reward, terminated, truncated, info = env.step(action)
    env.render()

env.close()

Environment Configuration

Parameter	Type	Default	Description
`df_name`	str	`'lkaesberg/SPaRC'`	HuggingFace dataset identifier
`df_split`	str	`'all'`	Dataset split to use
`df_set`	str	`'test'`	Subset of data (train/val/test)
`render_mode`	str	`None`	Visualization mode: `'human'`, `'llm'`, or `None`
`observation`	str	`'new'`	Observation format: `'new'` (tensor) or `'SPaRC'` (JSON)
`traceback`	bool	`False`	Enable state reversibility
`max_steps`	int	`2000`	Maximum steps per episode

Environment Specification

Action Space

Discrete(4): Four directional moves in the grid environment.

Action	Value	Description
RIGHT	0	Move agent one cell to the right
UP	1	Move agent one cell upward
LEFT	2	Move agent one cell to the left
DOWN	3	Move agent one cell downward

Observation Space

Tensor Format (`observation='new'`)

A dictionary containing:

base (Dict[str, np.ndarray]): One-hot encoded spatial features
- visited: Binary grid marking visited cells
- gaps: Binary grid indicating traversable/non-traversable cells
- agent_location: One-hot encoding of agent position
- target_location: One-hot encoding of goal position
- Additional puzzle-specific properties (e.g., stars, triangles)
color (np.ndarray): Integer grid (1-8) representing color properties
additional_info (np.ndarray): Puzzle-specific metadata (polyshape IDs, counts)

JSON Format (`observation='SPaRC'`)

String-encoded JSON representing the grid state with symbolic notation, following the original SPaRC specification.

Reward Structure

+1.0: Successfully solving the puzzle
-1.0: Invalid termination or failure state
+0.01: Incremental reward for remaining on valid solution path (encourages exploration while maintaining progress)

Episode Termination

Success: Agent reaches target location satisfying all puzzle constraints
Failure: Agent enters invalid state or violates puzzle rules
Truncation: Maximum step limit reached

API Reference

Core Methods

env.reset(options: Optional[Dict] = None) -> Tuple[Observation, Dict]

Initializes or resets the environment to a new puzzle state.

Parameters:
- options: Optional dictionary with 'puzzle_id' key to load specific puzzle
Returns: Initial observation and info dictionary

env.step(action: int) -> Tuple[Observation, float, bool, bool, Dict]

Executes one environment step given an action.

Parameters:
- action: Integer in range [0, 3] representing directional move
Returns: Observation, reward, terminated flag, truncated flag, info dictionary

env.render() -> Optional[np.ndarray]

Generates visual or textual representation of current state based on render_mode.

env.close()

Releases environment resources and closes rendering windows.

Experimental Setup

Dataset

The environment uses puzzles from the SPaRC dataset, which contains spatial reasoning challenges of varying complexity. Each puzzle is defined by:

Grid dimensions (variable size)
Initial agent position
Target position
Spatial constraints (gaps, regions, colored elements)
Solution paths of varying lengths

Evaluation Metrics

The info dictionary returned by step() and reset() contains:

Success Rate: Binary indicator of puzzle completion
Path Length: Number of steps taken
Optimality: Ratio of actual path length to shortest possible path
Invalid Actions: Count of rule violations
Puzzle Metadata: Difficulty rating, constraint types, grid size

Use Cases

Benchmarking LLM Spatial Reasoning

import gymnasium as gym
import Spatial_Gym
from your_llm_wrapper import LLMAgent

env = gym.make("Spatial-Gym", render_mode='llm', observation='SPaRC')
agent = LLMAgent(model="gpt-4")

observation, info = env.reset()
for _ in range(100):  # Evaluate on 100 puzzles
    done = False
    while not done:
        action = agent.predict(env.render())  # LLM sees text representation
        observation, reward, terminated, truncated, info = env.step(action)
        done = terminated or truncated
    
    # Log metrics
    print(f"Puzzle {info['puzzle_id']}: Success={info['success']}, Steps={info['steps']}")
    observation, info = env.reset()

Reinforcement Learning Training

from stable_baselines3 import PPO

env = gym.make("Spatial-Gym", observation='new', max_steps=500)
model = PPO("MultiInputPolicy", env, verbose=1)
model.learn(total_timesteps=1_000_000)
model.save("spatial_reasoning_agent")

Repository Structure

Spatial-Gym/
├── Spatial_Gym/           # Core environment implementation
│   ├── __init__.py        # Package initialization
│   ├── Spatial_Gym.py     # Main environment class
│   ├── register_env.py    # Gymnasium registration
│   └── render/            # Rendering modules
│       ├── human_renderer.py
│       └── llm_renderer.py
├── llm_testing/           # LLM evaluation utilities
│   ├── llm_host.py        # LLM interaction wrapper
│   └── parse_logs.py      # Result analysis tools
├── Final_Product.py       # Interactive demo script
├── human_play.py          # Human player interface
├── pyproject.toml         # Package configuration
└── README.md              # This file

Testing

Spatial-Gym includes a comprehensive test suite to ensure environment stability and correctness.

Running Tests

# Install with test dependencies
pip install -e ".[test]"

# Run all tests
pytest tests/ -v

# Run specific test categories
pytest tests/test_environment.py -v       # Environment API tests
pytest tests/test_random_agent.py -v      # Random agent tests
pytest tests/test_predefined_paths.py -v  # Path validation tests

# Run with coverage
pytest tests/ --cov=Spatial_Gym --cov-report=html

Test Coverage

The test suite includes 43+ tests covering:

✅ Environment initialization and configuration
✅ Gymnasium API compliance
✅ Random agent behavior (stress tests)
✅ Predefined valid and invalid paths
✅ Multi-episode stability
✅ Different observation formats
✅ Rendering modes

Continuous Integration

Tests automatically run on:

Every push and pull request
Multiple OS (Ubuntu, macOS)
Python versions 3.9, 3.10, 3.11

See tests/README.md for detailed testing documentation.

Citation

If you use Spatial-Gym in your research, please cite:

@software{spatial_gym2024,
  title={Spatial-Gym: A Gymnasium Environment for Spatial Reasoning Benchmarking},
  author={Kaesberg, Lars Benedikt and Mark, Tobias},
  year={2024},
  url={https://github.com/lkaesberg/Spatial-Gym}
}

For the underlying SPaRC dataset and puzzles:

@inproceedings{kaesberg2024sparc,
  title={SPaRC: Spatial Reasoning Challenges for Large Language Models},
  author={Kaesberg, Lars Benedikt and others},
  booktitle={Proceedings of ACL},
  year={2024},
  url={https://sparc.gipplab.org/}
}

Contributing

We welcome contributions! Please follow these guidelines:

Fork the repository
Create a feature branch (git checkout -b feature/new-feature)
Commit changes with descriptive messages
Add tests for new functionality
Submit a pull request

For bug reports and feature requests, please use the GitHub issue tracker.

License

This project is licensed under the MIT License - see the LICENCE file for details.

Acknowledgments

Lars Benedikt Kaesberg (l.kaesberg@uni-goettingen.de) - Project conception and supervision
Jan Philip Wahle - Project supervision
Tobias Mark - Initial implementation and environment design
SPaRC Team - Original puzzle dataset and framework (sparc.gipplab.org)

Contact

For questions, suggestions, or collaboration inquiries, please contact:

Lars Benedikt Kaesberg: l.kaesberg@uni-goettingen.de
GitHub Issues: https://github.com/lkaesberg/Spatial-Gym/issues

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

gipplab

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.1

Mar 31, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spatial_gym-0.1.1.tar.gz (40.0 kB view details)

Uploaded Mar 31, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

spatial_gym-0.1.1-py3-none-any.whl (27.2 kB view details)

Uploaded Mar 31, 2026 Python 3

File details

Details for the file spatial_gym-0.1.1.tar.gz.

File metadata

Download URL: spatial_gym-0.1.1.tar.gz
Upload date: Mar 31, 2026
Size: 40.0 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for spatial_gym-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`16a394a561670b652f5e6f59464e4b5d05df8d6f312b201226e0fc675ad9db3b`
MD5	`f02b0526a4e0ae527d15a9cab0a3e9ac`
BLAKE2b-256	`5fa84887783fac45e383382c2c4d986c9e324b93048fce55ddbca7f30926c27f`

See more details on using hashes here.

Provenance

The following attestation bundles were made for spatial_gym-0.1.1.tar.gz:

Publisher: publish.yml on lkaesberg/Spatial-Gym

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: spatial_gym-0.1.1.tar.gz
- Subject digest: 16a394a561670b652f5e6f59464e4b5d05df8d6f312b201226e0fc675ad9db3b
- Sigstore transparency entry: 1203568718
- Sigstore integration time: Mar 31, 2026
Source repository:
- Permalink: lkaesberg/Spatial-Gym@c7dab51bef78a67e23fed1117c68f71196c2bdfe
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/lkaesberg
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@c7dab51bef78a67e23fed1117c68f71196c2bdfe
- Trigger Event: release

File details

Details for the file spatial_gym-0.1.1-py3-none-any.whl.

File metadata

Download URL: spatial_gym-0.1.1-py3-none-any.whl
Upload date: Mar 31, 2026
Size: 27.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for spatial_gym-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f90e1911c65bac83ebcf95a1ff09fd75e512ab42630991925517d7d778db2572`
MD5	`043b5a24d6cfb92757fa6f3baf28a9e5`
BLAKE2b-256	`2b36d53c79764aeb370c96fdc0cf37f3caa3aef2958f36da1082238108f0aa66`

See more details on using hashes here.

Provenance

The following attestation bundles were made for spatial_gym-0.1.1-py3-none-any.whl:

Publisher: publish.yml on lkaesberg/Spatial-Gym

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: spatial_gym-0.1.1-py3-none-any.whl
- Subject digest: f90e1911c65bac83ebcf95a1ff09fd75e512ab42630991925517d7d778db2572
- Sigstore transparency entry: 1203568719
- Sigstore integration time: Mar 31, 2026
Source repository:
- Permalink: lkaesberg/Spatial-Gym@c7dab51bef78a67e23fed1117c68f71196c2bdfe
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/lkaesberg
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@c7dab51bef78a67e23fed1117c68f71196c2bdfe
- Trigger Event: release

Spatial-Gym 0.1.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Spatial-Gym: A Gymnasium Environment for Spatial Reasoning Benchmarking

Abstract

Key Features

Installation

From PyPI

From Source

Quick Start

Environment Configuration

Environment Specification

Action Space

Observation Space

Tensor Format (observation='new')

JSON Format (observation='SPaRC')

Reward Structure

Episode Termination

API Reference

Core Methods

Experimental Setup

Dataset

Evaluation Metrics

Use Cases

Benchmarking LLM Spatial Reasoning

Reinforcement Learning Training

Repository Structure

Testing

Running Tests

Test Coverage

Continuous Integration

Citation

Contributing

License

Acknowledgments

Contact

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

Tensor Format (`observation='new'`)

JSON Format (`observation='SPaRC'`)