Skip to main content

No project description provided

Project description

SPaRC Logo

SPaRC: Spatial Pathfinding and Reasoning Challenge

A comprehensive toolkit for spatial reasoning puzzle solving and model evaluation

Overview

SPaRC provides a comprehensive framework for evaluating language models on spatial reasoning tasks inspired by "The Witness" puzzle game. This package includes tools for dataset processing, solution validation, and model evaluation with beautiful terminal output.

Installation

Install the package from PyPI:

pip install sparc-puzzle

Or install from source:

git clone https://github.com/lkaesberg/SPaRC.git
cd SPaRC
pip install -e .

Quick Start

1. Testing a Model on the Dataset

Run the complete benchmark on your model:

sparc --api-key "your-openai-api-key" --model "gpt-4" --batch-size 5

Key Features:

  • 🔄 Resume Support: Automatically saves progress and resumes from where you left off
  • Batching: Process multiple puzzles concurrently for faster evaluation
  • 🎨 Rich Output: Beautiful terminal interface with progress tracking
  • 🛑 Graceful Shutdown: Press Ctrl+C to stop after current batch

Example with different endpoints:

# OpenAI API
sparc --api-key "sk-..." --model "gpt-4"

# Custom endpoint (e.g., local model)
sparc --api-key "your-key" --base-url "http://localhost:8080/v1" --model "llama-3.3-70b"

# Resume interrupted session
sparc --api-key "your-key" --model "gpt-4"  # Automatically resumes

# Fresh start (ignore previous results)
sparc --api-key "your-key" --model "gpt-4" --overwrite

2. Using the Validation API

Use SPaRC's validation functions in your own code:

from sparc.validation import extract_solution_path, validate_solution, analyze_path
from sparc.prompt import generate_prompt
from datasets import load_dataset

# Load the dataset
dataset = load_dataset("lkaesberg/SPaRC", "all", split="test")
puzzle = dataset[0]

# Generate prompt for your model
puzzle_prompt = [
                  {
                    "role": "system",
                    "content": "You are an expert at solving puzzles games.",
                  },
                  {
                    "role": "user", 
                    "content": generate_prompt(puzzle)
                  }
                ]


# Your model generates a response
model_response = "... model response with path coordinates ..."

# Extract the path from model response
extracted_path = extract_solution_path(model_response, puzzle)
# Returns: [{"x": 0, "y": 2}, {"x": 0, "y": 1}, ...]

# Validate against ground truth
is_correct = validate_solution(extracted_path, puzzle)
# Returns: True/False

# Get detailed analysis
analysis = analyze_path(extracted_path, puzzle)
# Returns: {
#   "starts_at_start_ends_at_exit": True,
#   "connected_line": True,
#   "non_intersecting_line": True,
#   "no_rule_crossing": True,
#   "fully_valid_path": True
# }

CLI Reference

Basic Usage

sparc --api-key "your-key" [OPTIONS]

Options

Option Default Description
--api-key Required OpenAI API key or your model's API key
--base-url https://api.openai.com/v1 API endpoint URL
--model gpt-4 Model name to evaluate
--temperature 1.0 Generation temperature
--batch-size 5 Number of concurrent requests
--results-file <model>.jsonl File to save results
--overwrite False Ignore existing results and start over
--verbose False Show detailed output for each puzzle
--max-new None Process at most this many new puzzles
--gym False Use step-by-step gym mode instead of single-shot
--gym-traceback False Enable traceback visualization in gym mode
--run-name None Suffix for output filename (e.g., experiment1)

Examples

# Basic evaluation
sparc --api-key "sk-..." --model "gpt-4"

# High throughput with larger batches
sparc --api-key "sk-..." --model "gpt-3.5-turbo" --batch-size 20

# Conservative approach with lower temperature
sparc --api-key "sk-..." --model "gpt-4" --temperature 0.1

# Verbose output to see each puzzle result
sparc --api-key "sk-..." --model "gpt-4" --verbose

# Custom results file
sparc --api-key "sk-..." --model "claude-3" --results-file "claude_results.json"

# Process only 10 new puzzles
sparc --api-key "sk-..." --model "gpt-4" --max-new 10

# Step-by-step gym mode (agent receives feedback after each move)
sparc --api-key "sk-..." --model "gpt-4" --gym

# Gym mode with traceback (shows path history in observations)
sparc --api-key "sk-..." --model "gpt-4" --gym --gym-traceback

# Named experiment run
sparc --api-key "sk-..." --model "gpt-4" --run-name "experiment1"

Core Functions

extract_solution_path(solution_text: str, puzzle_data: Dict) -> List[Dict[str, int]]

Extracts coordinate path from model response text.

validate_solution(extracted_path: List[Dict[str, int]], puzzle_data: Dict) -> bool

Validates if the extracted path matches any ground truth solution.

analyze_path(solution_path: List[Dict[str, int]], puzzle: Dict) -> Dict

Provides detailed analysis of path validity and rule compliance.

generate_prompt(puzzle_data: Dict) -> str

Generates the formatted prompt for a puzzle.

Contributing

Contributions are welcome! Please feel free to submit pull requests or open issues.

Citation

If you use SPaRC in your research, please cite:

@inproceedings{kaesberg-etal-2025-sparc,
    title = "{SP}a{RC}: A Spatial Pathfinding Reasoning Challenge",
    author = "Kaesberg, Lars Benedikt and Wahle, Jan Philip and Ruas, Terry and Gipp, Bela",
    booktitle = "Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2025",
    address = "Suzhou, China",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.emnlp-main.526/",
    doi = "10.18653/v1/2025.emnlp-main.526",
    pages = "10370--10401"
}

License

This project is licensed under the MIT License - see the LICENSE file for details.

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sparc_puzzle-0.5.6.tar.gz (30.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sparc_puzzle-0.5.6-py3-none-any.whl (41.5 kB view details)

Uploaded Python 3

File details

Details for the file sparc_puzzle-0.5.6.tar.gz.

File metadata

  • Download URL: sparc_puzzle-0.5.6.tar.gz
  • Upload date:
  • Size: 30.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sparc_puzzle-0.5.6.tar.gz
Algorithm Hash digest
SHA256 54ab3608703375bd8dcf9ed3cad61227319107c5727b4dd22f402fa31c536182
MD5 7a5cb63e3f8e5a893fecbf77412eeeaa
BLAKE2b-256 33370fd5708fc9f1ed2a25852c952c83566fea81f2a31f886540178625c72b91

See more details on using hashes here.

File details

Details for the file sparc_puzzle-0.5.6-py3-none-any.whl.

File metadata

  • Download URL: sparc_puzzle-0.5.6-py3-none-any.whl
  • Upload date:
  • Size: 41.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sparc_puzzle-0.5.6-py3-none-any.whl
Algorithm Hash digest
SHA256 7f39d78ab415e9b245b541636d6714afe491904b575fb2d5532c7853eb152199
MD5 8921762ae3f2122d6b999430c0be219d
BLAKE2b-256 1d9e8edff5e3a0724778870828ef2e21993d7c731ab5e8a877c63d7158315c09

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page