Physical Reasoning Toolkit
Project description
Physical Reasoning Toolkit ๐ฌ
A unified toolkit for researchers and engineers working on AI physical reasoning. PRKit provides a shared foundation for representing physics problems, running inference with multiple model providers, evaluating outputs with physics-aware comparators, and building structured annotation workflows.
PRKit applies a โunified interfaceโ idea to the full physical-reasoning loop (data โ annotation โ inference โ evaluation), rather than focusing on datasets alone.
๐ฏ Project Overview
PRKit centers on core components that define the physical reasoning ontology. Three integrated subpackages build on this foundation:
- Core components:
PhysicsDomain,AnswerCategory,PhysicsProblem,Answer,PhysicalDataset,PhysicsSolution,BaseModelClient,create_model_client,PRKitLoggerโthe shared abstractions used across the toolkit. prkit_datasets: A Datasets-like hub that downloads/loads benchmarks into the unified schema (PhysicsProblem,PhysicalDataset).prkit_annotation: Workflow-oriented tools for structured, lower-level labels (e.g., domain/subdomain, theorem usage).prkit_evaluation: Evaluate-like components for physics-oriented scoring and comparison (e.g., symbolic/numerical answer matching).
๐ก Quick Example
from prkit.prkit_datasets import DatasetHub
from prkit.prkit_core.model_clients import create_model_client
# Load any benchmark into the unified schema (PhysicsProblem, PhysicalDataset)
dataset = DatasetHub.load("physreason", variant="full", split="test")
# Run inference with the unified model client (core component)
client = create_model_client("gpt-4.1-mini")
for problem in dataset[:3]:
print(client.chat(problem.question)[:200])
The same pattern works across different datasets and model providersโswap the dataset name or model identifier.
๐ Documentation
Quick Links:
- ๐ง CORE.md - Core components: domain model, model client, logger, and definitions
- ๐ DATASETS.md - Complete guide to supported datasets and benchmarks
- ๐ EVALUATION.md - Evaluation metrics and comparison strategies
- ๐ CHANGELOG.md - Version history and release notes
๐๏ธ Repository Structure
physical_reasoning_toolkit/
โโโ src/prkit/ # Main package (modern src-layout)
โ โโโ prkit_core/ # Core components (domain models, model clients, logging)
โ โโโ prkit_datasets/ # Dataset loading and management
โ โโโ prkit_annotation/ # Annotation workflows and tools
โ โโโ prkit_evaluation/ # Evaluation metrics and benchmarks
โโโ tests/ # Unit tests
โโโ pyproject.toml # Package configuration
โโโ LICENSE # MIT License
โโโ README.md # This file
Note: The actual dataset files are stored externally (see Environment Setup section). This repository contains only the toolkit code, examples, and documentation.
What's Included vs. External
In Repository (Code & Documentation):
- โ src/prkit/: Complete toolkit with core components and 3 subpackages
- โ tests/: Unit tests (for contributors)
External (Data & Runtime):
- ๐ Data Directory: Dataset files (set via
DATASET_CACHE_DIR) - ๐ API Keys: Model provider credentials (if applicable)
- ๐ Log Files: Runtime logs (default:
{cwd}/prkit_logs/prkit.log, can be overridden viaPRKIT_LOG_FILE)
๐ Quick Start
Prerequisites
- Python 3.10+ (required)
Installation
Option 1: Install from PyPI (Recommended, not available yet)
# Install the latest stable version
pip install physical-reasoning-toolkit
# Verify installation
python -c "import prkit; print(prkit.__version__)"
Option 2: Install from Source
Step 1: Clone the Repository
git clone https://github.com/sherryzyh/physical_reasoning_toolkit.git
cd physical_reasoning_toolkit
Step 2: Set Up Virtual Environment
# Create virtual environment
python -m venv venv
# Activate (macOS/Linux)
source venv/bin/activate
# Activate (Windows)
venv\Scripts\activate
Step 3: Install
# Install the package (regular install for end users)
pip install .
# Verify installation
python -c "import prkit; print('โ
Toolkit installed successfully!')"
Provider API Key Setup
# For model provider integration (optional)
export OPENAI_API_KEY="your-openai-api-key"
export GEMINI_API_KEY="your-gemini-api-key"
export DEEPSEEK_API_KEY="your-deepseek-api-key"
# For logging configuration (optional)
export PRKIT_LOG_LEVEL=INFO
export PRKIT_LOG_FILE=/var/log/prkit.log # Optional: defaults to {cwd}/prkit_logs/prkit.log if not set
๐ See CORE.md (Model Client section) for supported providers and usage.
Data Directory Setup
# Set up data directory structure (external to repository)
mkdir -p ~/data
export DATASET_CACHE_DIR=~/data
# Download datasets using DatasetHub with auto_download=True
python -c "from prkit.prkit_datasets import DatasetHub; DatasetHub.load('ugphysics', auto_download=True)"
Note: The data directory is external to the repository and contains the actual dataset files. The default cache directory is ~/PHYSICAL_REASONING_DATASETS/ if DATASET_CACHE_DIR is not set. Use auto_download=True when loading datasets to automatically download them if they don't exist.
Validate Setup
python -c "
import prkit
from prkit.prkit_datasets import DatasetHub
from prkit.prkit_annotation.workflows import WorkflowComposer
print('โ
All packages imported successfully!')
print(f'PRKit version: {prkit.__version__}')
"
๐ฆ Package Overview
The toolkit is organized around core components and three subpackages that use them. Subpackages depend only on prkit_core; there are no direct dependencies between prkit_datasets, prkit_annotation, and prkit_evaluation.
| Component | Purpose |
|---|---|
prkit_core |
Core components, see below |
prkit_datasets |
Dataset hub: loaders, downloaders, unified schema |
prkit_evaluation |
Comparators and accuracy metrics |
prkit_annotation |
Workflow pipelines for domain/theorem annotation |
Core Components ๐ง
The essential building blocks of the physical-reasoning-toolkit. All datasets, inference, evaluation, and annotation workflows use these components.
- PhysicsDomain โ Enumeration of physics subfields (mechanics, thermodynamics, quantum mechanics, optics, etc.) for problem classification. Aligned with UGPhysics, PHYBench, TPBench. Use
PhysicsDomain.from_string()for flexible parsing. - AnswerCategory โ Enumeration of answer types for normalization and evaluation:
NUMBER,PHYSICAL_QUANTITY,EQUATION,FORMULA,TEXT,OPTION. Drives how answers are compared (numerical precision, symbolic equivalence, exact match). - PhysicsProblem โ The canonical representation of a physics problem. Required:
problem_id,question. Optional:answer(Answer),solution,domain,image_path,problem_type(MC/OE),options,correct_option. Supports dictionary-like access andload_images()for visual problems. - Answer โ Unified answer model.
valueholds the number (NUMBER), numeric part (PHYSICAL_QUANTITY), option string (OPTION), or plain string (EQUATION, FORMULA, TEXT).unitis optional and used only for PHYSICAL_QUANTITY. Type checks, unit helpers, LaTeX handling, option indexing. - PhysicalDataset โ Collection of
PhysicsProbleminstances. Indexing, slicing,get_by_id(),filter_by_domain(),take(),sample(),save_to_json()/from_json(). Providesget_statistics()for domain and problem-type distribution. - PhysicsSolution โ Bundles a
PhysicsProblem, modelagent_answer, and optionalintermediate_steps. Captures the full solution trace for evaluation and analysis. - BaseModelClient โ Abstract base for model clients. Subclasses implement
chat(user_prompt, image_paths=None). - PRKitLogger โ Centralized logging with colored output, file logging, and env config (
PRKIT_LOG_LEVEL,PRKIT_LOG_FILE, etc.).
๐ See CORE.md for the full domain model, entity relationships, subpackage dependency diagram, and import reference.
prkit_evaluation ๐
Answer comparators (symbolic, numerical, textual, option-based), accuracy evaluator, and physics-focused assessment protocols.
๐ EVALUATION.md
prkit_datasets ๐
Dataset hub with a Datasets-like interface: DatasetHub.load() for PHYBench, PhysReason, UGPhysics, SeePhys, PhyX (plus JEEBench, TPBench loaders). Auto-download, variant selection, and reproducible sampling.
๐ DATASETS.md
prkit_annotation ๐ท๏ธ
Modular workflows (domain classification, theorem extraction) via WorkflowComposer and presets. Model-assisted and human-in-the-loop.
๐ ANNOTATION.md
๐ Troubleshooting
Common Issues
Python Version Problems
# Check Python version
python --version # Should be 3.10+
# If using wrong version
python -m venv venv
source venv/bin/activate
Import Errors
# Reinstall in development mode
pip install -e .
# Check installation
pip show physical-reasoning-toolkit
Data Directory Issues
# Set data directory (external to repository)
export DATASET_CACHE_DIR=/path/to/your/data
# Check directory structure
ls -la $DATASET_CACHE_DIR
# Verify dataset files exist
ls -la $DATASET_CACHE_DIR/ugphysics/
ls -la $DATASET_CACHE_DIR/PhysReason/
Getting Help
- Review logs: Check logging output for detailed error information
- Verify setup: Run the testing commands above
- Check data: Ensure datasets are properly downloaded and accessible
- Check documentation: Start with the root docs linked below
๐ค Contributing
Community & Support
- GitHub Issues: Report bugs or request features
- Discussions: Share ideas and get help
Development Setup
# Clone and install in development mode
git clone https://github.com/sherryzyh/physical_reasoning_toolkit.git
cd physical_reasoning_toolkit
pip install -e ".[dev]"
# Run code quality tools
black src/
isort src/
mypy src/
# Run tests
pytest tests/
Adding New Features
- Follow existing patterns: Use consistent logging and error handling
- Add tests: Include tests for new functionality
- Update documentation: Add examples and update README files
- Maintain compatibility: Ensure changes don't break existing functionality
Submitting Pull Requests
- Fork the repository
- Create a feature branch
- Make your changes with tests
- Ensure all tests pass
- Submit a pull request with clear description
๐ Citation
If you use PRKit in your research, please cite it as follows:
BibTeX:
@software{zhang2026physicalreasoningtoolkit,
author = {Zhang, Yinghuan},
title = {Physical Reasoning Toolkit},
year = {2026},
license = {MIT},
url = {https://github.com/sherryzyh/physical_reasoning_toolkit},
abstract = {A unified toolkit for researchers and engineers working on AI physical reasoning. PRKit provides a shared foundation for representing physics problems, running inference with multiple model providers, evaluating outputs with physics-aware comparators, and building structured annotation workflows.}
}
For citation files, see CITATION.cff and CITATION.bib in the repository root.
๐ Acknowledgments
PRKit integrates and builds upon several excellent physics reasoning benchmarks and datasets. We thank the creators of:
- PhysReason, PHYBench, UGPhysics, SeePhys, PhyX, and other benchmark datasets
- The open-source community for their valuable contributions and feedback
Note: For detailed citations and references to the original dataset papers, please see the Citations section in DATASETS.md.
๐ License
This project is licensed under the MIT License - see the LICENSE file for details.
Ready to advance physics reasoning research! ๐โจ
Quick Links: pip install physical-reasoning-toolkit | GitHub | Documentation | Issues
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file physical_reasoning_toolkit-0.1.0.tar.gz.
File metadata
- Download URL: physical_reasoning_toolkit-0.1.0.tar.gz
- Upload date:
- Size: 109.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dad0c2a45e5ef3e2a5cb0eafd43bd9e9e296bcbfc3a9017c4d7130a4c0ca8fcf
|
|
| MD5 |
da97dd10d517e7b1d241ea76cbe14482
|
|
| BLAKE2b-256 |
2d84a23c5a9af377fbea765a0f0465565cd3b7bca52765f6686f70f545858fb3
|
File details
Details for the file physical_reasoning_toolkit-0.1.0-py3-none-any.whl.
File metadata
- Download URL: physical_reasoning_toolkit-0.1.0-py3-none-any.whl
- Upload date:
- Size: 153.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0f42151d6e444508a40a81d9af99ba88c36b91b887c8f1fde975bd129c8ff05e
|
|
| MD5 |
0ab3312a66cb720ea5f3c2f27bec0423
|
|
| BLAKE2b-256 |
07cabfb551681363784e47ebbbb28ad2e4e071197a20b08ceded2d207058b985
|