Skip to main content

Physical Reasoning Toolkit

Project description

Physical Reasoning Toolkit ๐Ÿ”ฌ

A unified toolkit for researchers and engineers working on AI physical reasoning. PRKit provides a shared foundation for representing physics problems, running inference with multiple model providers, evaluating outputs with physics-aware comparators, and building structured annotation workflows.

PRKit applies a โ€œunified interfaceโ€ idea to the full physical-reasoning loop (data โ†” annotation โ†” inference โ†” evaluation), rather than focusing on datasets alone.

๐ŸŽฏ Project Overview

PRKit centers on core components that define the physical reasoning ontology. Three integrated subpackages build on this foundation:

  • Core components: PhysicsDomain, AnswerCategory, PhysicsProblem, Answer, PhysicalDataset, PhysicsSolution, BaseModelClient, create_model_client, PRKitLoggerโ€”the shared abstractions used across the toolkit.
  • prkit_datasets: A Datasets-like hub that downloads/loads benchmarks into the unified schema (PhysicsProblem, PhysicalDataset).
  • prkit_annotation: Workflow-oriented tools for structured, lower-level labels (e.g., domain/subdomain, theorem usage).
  • prkit_evaluation: Evaluate-like components for physics-oriented scoring and comparison (e.g., symbolic/numerical answer matching).

๐Ÿ’ก Quick Example

from prkit.prkit_datasets import DatasetHub
from prkit.prkit_core.model_clients import create_model_client

# Load any benchmark into the unified schema (PhysicsProblem, PhysicalDataset)
dataset = DatasetHub.load("physreason", variant="full", split="test")

# Run inference with the unified model client (core component)
client = create_model_client("gpt-4.1-mini")
for problem in dataset[:3]:
    print(client.chat(problem.question)[:200])

The same pattern works across different datasets and model providersโ€”swap the dataset name or model identifier.

๐Ÿ“– Documentation

Quick Links:

  • ๐Ÿ”ง CORE.md - Core components: domain model, model client, logger, and definitions
  • ๐Ÿ“š DATASETS.md - Complete guide to supported datasets and benchmarks
  • ๐Ÿ“Š EVALUATION.md - Evaluation metrics and comparison strategies
  • ๐Ÿ“ CHANGELOG.md - Version history and release notes

๐Ÿ—๏ธ Repository Structure

physical_reasoning_toolkit/
โ”œโ”€โ”€ src/prkit/                       # Main package (modern src-layout)
โ”‚   โ”œโ”€โ”€ prkit_core/                  # Core components (domain models, model clients, logging)
โ”‚   โ”œโ”€โ”€ prkit_datasets/              # Dataset loading and management
โ”‚   โ”œโ”€โ”€ prkit_annotation/            # Annotation workflows and tools
โ”‚   โ””โ”€โ”€ prkit_evaluation/            # Evaluation metrics and benchmarks
โ”œโ”€โ”€ tests/                           # Unit tests
โ”œโ”€โ”€ pyproject.toml                   # Package configuration
โ”œโ”€โ”€ LICENSE                          # MIT License
โ””โ”€โ”€ README.md                        # This file

Note: The actual dataset files are stored externally (see Environment Setup section). This repository contains only the toolkit code, examples, and documentation.

What's Included vs. External

In Repository (Code & Documentation):

  • โœ… src/prkit/: Complete toolkit with core components and 3 subpackages
  • โœ… tests/: Unit tests (for contributors)

External (Data & Runtime):

  • ๐Ÿ“ Data Directory: Dataset files (set via DATASET_CACHE_DIR)
  • ๐Ÿ”‘ API Keys: Model provider credentials (if applicable)
  • ๐Ÿ“Š Log Files: Runtime logs (default: {cwd}/prkit_logs/prkit.log, can be overridden via PRKIT_LOG_FILE)

๐Ÿš€ Quick Start

Prerequisites

  • Python 3.10+ (required)

Installation

Option 1: Install from PyPI (Recommended, not available yet)

# Install the latest stable version
pip install physical-reasoning-toolkit

# Verify installation
python -c "import prkit; print(prkit.__version__)"

Option 2: Install from Source

Step 1: Clone the Repository

git clone https://github.com/sherryzyh/physical_reasoning_toolkit.git
cd physical_reasoning_toolkit

Step 2: Set Up Virtual Environment

# Create virtual environment
python -m venv venv

# Activate (macOS/Linux)
source venv/bin/activate

# Activate (Windows)
venv\Scripts\activate

Step 3: Install

# Install the package (regular install for end users)
pip install .

# Verify installation
python -c "import prkit; print('โœ… Toolkit installed successfully!')"

Provider API Key Setup

# For model provider integration (optional)
export OPENAI_API_KEY="your-openai-api-key"
export GEMINI_API_KEY="your-gemini-api-key"
export DEEPSEEK_API_KEY="your-deepseek-api-key"

# For logging configuration (optional)
export PRKIT_LOG_LEVEL=INFO
export PRKIT_LOG_FILE=/var/log/prkit.log  # Optional: defaults to {cwd}/prkit_logs/prkit.log if not set

๐Ÿ“– See CORE.md (Model Client section) for supported providers and usage.

Data Directory Setup

# Set up data directory structure (external to repository)
mkdir -p ~/data
export DATASET_CACHE_DIR=~/data

# Download datasets using DatasetHub with auto_download=True
python -c "from prkit.prkit_datasets import DatasetHub; DatasetHub.load('ugphysics', auto_download=True)"

Note: The data directory is external to the repository and contains the actual dataset files. The default cache directory is ~/PHYSICAL_REASONING_DATASETS/ if DATASET_CACHE_DIR is not set. Use auto_download=True when loading datasets to automatically download them if they don't exist.

Validate Setup

python -c "
import prkit
from prkit.prkit_datasets import DatasetHub
from prkit.prkit_annotation.workflows import WorkflowComposer
print('โœ… All packages imported successfully!')
print(f'PRKit version: {prkit.__version__}')
"

๐Ÿ“ฆ Package Overview

The toolkit is organized around core components and three subpackages that use them. Subpackages depend only on prkit_core; there are no direct dependencies between prkit_datasets, prkit_annotation, and prkit_evaluation.

Component Purpose
prkit_core Core components, see below
prkit_datasets Dataset hub: loaders, downloaders, unified schema
prkit_evaluation Comparators and accuracy metrics
prkit_annotation Workflow pipelines for domain/theorem annotation

Core Components ๐Ÿ”ง

The essential building blocks of the physical-reasoning-toolkit. All datasets, inference, evaluation, and annotation workflows use these components.

  • PhysicsDomain โ€” Enumeration of physics subfields (mechanics, thermodynamics, quantum mechanics, optics, etc.) for problem classification. Aligned with UGPhysics, PHYBench, TPBench. Use PhysicsDomain.from_string() for flexible parsing.
  • AnswerCategory โ€” Enumeration of answer types for normalization and evaluation: NUMBER, PHYSICAL_QUANTITY, EQUATION, FORMULA, TEXT, OPTION. Drives how answers are compared (numerical precision, symbolic equivalence, exact match).
  • PhysicsProblem โ€” The canonical representation of a physics problem. Required: problem_id, question. Optional: answer (Answer), solution, domain, image_path, problem_type (MC/OE), options, correct_option. Supports dictionary-like access and load_images() for visual problems.
  • Answer โ€” Unified answer model. value holds the number (NUMBER), numeric part (PHYSICAL_QUANTITY), option string (OPTION), or plain string (EQUATION, FORMULA, TEXT). unit is optional and used only for PHYSICAL_QUANTITY. Type checks, unit helpers, LaTeX handling, option indexing.
  • PhysicalDataset โ€” Collection of PhysicsProblem instances. Indexing, slicing, get_by_id(), filter_by_domain(), take(), sample(), save_to_json() / from_json(). Provides get_statistics() for domain and problem-type distribution.
  • PhysicsSolution โ€” Bundles a PhysicsProblem, model agent_answer, and optional intermediate_steps. Captures the full solution trace for evaluation and analysis.
  • BaseModelClient โ€” Abstract base for model clients. Subclasses implement chat(user_prompt, image_paths=None).
  • PRKitLogger โ€” Centralized logging with colored output, file logging, and env config (PRKIT_LOG_LEVEL, PRKIT_LOG_FILE, etc.).

๐Ÿ“– See CORE.md for the full domain model, entity relationships, subpackage dependency diagram, and import reference.

prkit_evaluation ๐Ÿ“ˆ

Answer comparators (symbolic, numerical, textual, option-based), accuracy evaluator, and physics-focused assessment protocols.

๐Ÿ“– EVALUATION.md

prkit_datasets ๐Ÿ“Š

Dataset hub with a Datasets-like interface: DatasetHub.load() for PHYBench, PhysReason, UGPhysics, SeePhys, PhyX (plus JEEBench, TPBench loaders). Auto-download, variant selection, and reproducible sampling.

๐Ÿ“– DATASETS.md

prkit_annotation ๐Ÿท๏ธ

Modular workflows (domain classification, theorem extraction) via WorkflowComposer and presets. Model-assisted and human-in-the-loop.

๐Ÿ“– ANNOTATION.md

๐Ÿ†˜ Troubleshooting

Common Issues

Python Version Problems

# Check Python version
python --version  # Should be 3.10+

# If using wrong version
python -m venv venv
source venv/bin/activate

Import Errors

# Reinstall in development mode
pip install -e .

# Check installation
pip show physical-reasoning-toolkit

Data Directory Issues

# Set data directory (external to repository)
export DATASET_CACHE_DIR=/path/to/your/data

# Check directory structure
ls -la $DATASET_CACHE_DIR

# Verify dataset files exist
ls -la $DATASET_CACHE_DIR/ugphysics/
ls -la $DATASET_CACHE_DIR/PhysReason/

Getting Help

  1. Review logs: Check logging output for detailed error information
  2. Verify setup: Run the testing commands above
  3. Check data: Ensure datasets are properly downloaded and accessible
  4. Check documentation: Start with the root docs linked below

๐Ÿค Contributing

Community & Support

Development Setup

# Clone and install in development mode
git clone https://github.com/sherryzyh/physical_reasoning_toolkit.git
cd physical_reasoning_toolkit
pip install -e ".[dev]"

# Run code quality tools
black src/
isort src/
mypy src/

# Run tests
pytest tests/

Adding New Features

  1. Follow existing patterns: Use consistent logging and error handling
  2. Add tests: Include tests for new functionality
  3. Update documentation: Add examples and update README files
  4. Maintain compatibility: Ensure changes don't break existing functionality

Submitting Pull Requests

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes with tests
  4. Ensure all tests pass
  5. Submit a pull request with clear description

๐Ÿ“„ Citation

If you use PRKit in your research, please cite it as follows:

BibTeX:

@software{zhang2026physicalreasoningtoolkit,
  author = {Zhang, Yinghuan},
  title = {Physical Reasoning Toolkit},
  year = {2026},
  license = {MIT},
  url = {https://github.com/sherryzyh/physical_reasoning_toolkit},
  abstract = {A unified toolkit for researchers and engineers working on AI physical reasoning. PRKit provides a shared foundation for representing physics problems, running inference with multiple model providers, evaluating outputs with physics-aware comparators, and building structured annotation workflows.}
}

For citation files, see CITATION.cff and CITATION.bib in the repository root.

๐Ÿ™ Acknowledgments

PRKit integrates and builds upon several excellent physics reasoning benchmarks and datasets. We thank the creators of:

  • PhysReason, PHYBench, UGPhysics, SeePhys, PhyX, and other benchmark datasets
  • The open-source community for their valuable contributions and feedback

Note: For detailed citations and references to the original dataset papers, please see the Citations section in DATASETS.md.

๐Ÿ“ License

This project is licensed under the MIT License - see the LICENSE file for details.


Ready to advance physics reasoning research! ๐Ÿš€โœจ

Quick Links: pip install physical-reasoning-toolkit | GitHub | Documentation | Issues

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

physical_reasoning_toolkit-0.1.0.tar.gz (109.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

physical_reasoning_toolkit-0.1.0-py3-none-any.whl (153.4 kB view details)

Uploaded Python 3

File details

Details for the file physical_reasoning_toolkit-0.1.0.tar.gz.

File metadata

File hashes

Hashes for physical_reasoning_toolkit-0.1.0.tar.gz
Algorithm Hash digest
SHA256 dad0c2a45e5ef3e2a5cb0eafd43bd9e9e296bcbfc3a9017c4d7130a4c0ca8fcf
MD5 da97dd10d517e7b1d241ea76cbe14482
BLAKE2b-256 2d84a23c5a9af377fbea765a0f0465565cd3b7bca52765f6686f70f545858fb3

See more details on using hashes here.

File details

Details for the file physical_reasoning_toolkit-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for physical_reasoning_toolkit-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0f42151d6e444508a40a81d9af99ba88c36b91b887c8f1fde975bd129c8ff05e
MD5 0ab3312a66cb720ea5f3c2f27bec0423
BLAKE2b-256 07cabfb551681363784e47ebbbb28ad2e4e071197a20b08ceded2d207058b985

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page