On-Demand Datasets for Reasoning and Retrieval Evaluation

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

PhantomWiki

PhantomWiki generates on-demand datasets to evaluate reasoning and retrieval capabilities of LLMs.

Using PhantomWiki

PhantomWiki is available with Python 3.12+ through

pip install phantom-wiki

To build from source, you can clone this repository and run pip install ..

Then generate datasets of varying sizes with:

./data/generate-v05.sh /path/to/output/ 1 --use-multithreading

NOTE: We do not support --use-multithreading on macOS yet.

This generation script creates PhantomWiki datasets with random generation seed 1:

Universe sizes 25, 50, 500, ..., 5K, 500K, 1M (number of documents)
Question template depth 20 (proportional to difficulty)

For example, it executes the following command to generate a size 5K universe (5000 = --max-tree-size * --num-samples):

python -m phantom_wiki \
   -od /path/to/output/depth_20_size_5000_seed_1 \
   -s 1 \
   --depth 20 \
   --num-samples 100 \
   --max-tree-size 50 \
   --max-tree-depth 20 \
   --article-format json \
   --question-format json \
   --hard-mode \
   --valid-only \
   --use-multithreading

Installation

PhantomWiki uses the Prolog logic programming language, available on all operating systems through SWI-Prolog. We recommend installing SWI-prolog through your distribution or through conda, for example:

# On macOS: with homebrew
brew install swi-prolog

# On Linux: with apt
sudo add-apt-repository ppa:swi-prolog/stable
sudo apt-get update
sudo apt-get install swi-prolog

# On Linux: with conda
conda install conda-forge::swi-prolog

# On Windows: download and install binary from https://www.swi-prolog.org/download/stable

Installing PhantomWiki in development mode

There are 2 options:

(Recommended) Install the package in editable mode using pip:
```
pip install -e .
```
If you use VSCode, you can add to the python path without installing the package:
1. Create a file in the repo root called .env
2. Add PYTHONPATH=src
3. Restart VSCode

Evaluating LLMs on PhantomWiki

First, install dependencies and vLLM to match your hardware (GPU, CPU, etc.):

pip install phantom-wiki[eval]
pip install "vllm>=0.6.6"

If you're installing from source, use pip install -e ".[eval]".

Setting up API keys

Anthropic

Register an account with your cornell.edu email and join "Kilian's Group"
Create an API key at https://console.anthropic.com/settings/keys under your name
Set your Anthropic API key in your conda environment:

conda env config vars set ANTHROPIC_API_KEY=xxxxx

Rate limits: https://docs.anthropic.com/en/api/rate-limits#updated-rate-limits

:rotating_light: The Anthropic API has particularly low rate limits so it takes longer to get predictions.

Google Gemini

Create an API key at https://aistudio.google.com/app/apikey (NOTE: for some reason, Google AI Studio is disabled for cornell.edu accounts, so use your personal account)
Set your Gemini API key:

conda env config vars set GEMINI_API_KEY=xxxxx

OpenAI

Register an account with your cornell.edu email at https://platform.openai.com/ and join "Kilian's Group"
Create an API key at https://platform.openai.com/settings/organization/api-keys under your name
Set your OpenAI API key in your conda environment:

conda env config vars set OPENAI_API_KEY=xxxxx

Rate limits: https://platform.openai.com/docs/guides/rate-limits#usage-tiers

TogetherAI

Register for an account at https://api.together.ai
Set your TogetherAI API key:

conda env config vars set TOGETHER_API_KEY=xxxxx

vLLM

Original setup instructions: https://docs.vllm.ai/en/stable/getting_started/installation.html#install-the-latest-code

Additional notes:

It's recommended to download the model manually:

huggingface-cli download MODEL_REPO_ID

The models and their configs are downloaded directly from HuggingFace and almost all models on HF are fair game (see also: https://docs.vllm.ai/en/stable/models/supported_models.html#supported-models)
Total number of attention heads must be divisible by tensor parallel size
See minimum GPU requirements for small, medium, and large models at the top of each eval inference script
Running the same code on the same GPU indeed gives perfectly reproducible outputs, but running the same code on different GPUs (e.g., 3090 vs A6000) doesn't necessarily lead to the same results (see: https://github.com/albertgong1/phantom-wiki/pull/79#issuecomment-2559001925).

Reproducing LLM evaluation results in the paper

[!NOTE] For vLLM inference, make sure to request access for Gemma, Llama 3.1, 3.2, and 3.3 models on HuggingFace before proceeding.

🧪 To generate the prediction files, run the following scripts (e.g., using slurm) from the root directory:

python -m phantom_eval --method METHOD --model_name MODEL_NAME --split_list SPLIT_LIST -od OUTPUT_DIRECTORY

[!TIP] To generate a slurm script with the appropriate GPU allocation and inference config, run the create_eval.sh script and follow the prompted steps.

📊 To generate the tables and figures, run the following script from the root directory:

# make sure the dataset conda env is activated!
./eval/icml.sh OUTPUT_DIRECTORY METHOD

where OUTPUT_DIRECTORY and METHOD are the same as when generating the predictions. This script will create the following subdirectories in OUTPUT_DIRECTORY: scores/ and figures/.

Development best practices

Git:

Use pre-commit for automatic code formatting. You can install the git hook that automatically runs pre-commit on every commit.

pip install phantom-wiki[dev] # or pip install -e .[dev]
pre-commit install

To run pre-commit manually:

git add <files that you want to stage>
pre-commit run
# at this point, you might need to fix any issues raised by pre-commit and restage your modified files
git commit -m "your commit message"
git push

Testing:

Run pytest to run tests:

pip install phantom-wiki[tests] # or pip install -e .[tests]
pytest

Alternatively, you can use pytest through your editor's (like VSCode) testing extension. Accordingly specify your python environment and interpreter.

Sharing results:

Model predictions can be shared at /share/nikola/phantom-wiki/eval/
Please copy the predictions to your local working directory rather than reading from the shared directory directly

Sharing dataset to HuggingFace

Use the huggingface cli (see https://huggingface.co/docs/datasets/en/share#upload-an-entire-folder):

huggingface-cli upload mlcore/phantom-wiki-v<version> OUTPUT_DIRECTORY . --repo-type dataset --commit-message="optional commit message"

Citation

TODO with arxiv link

@article{2025_phantomwiki,
  title={{PhantomWiki: On-Demand Datasets for Reasoning and Retrieval Evaluation}},
  author={Albert Gong and Kamilė Stankevičiūtė and Chao Wan and Anmol Kabra and Raphael Thesmar and Johann Lee and Julius Klenke and Carla P. Gomes and Kilian Q. Weinberger},
  year={2025},
  journal={todo},
  url={todo},
  note={Under Review},
}

Project details

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

1.0.3

Feb 23, 2026

1.0.2

Aug 12, 2025

1.0.1

Apr 9, 2025

0.5.2

Mar 5, 2025

This version

0.5.0

Feb 27, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

phantom_wiki-0.5.0.tar.gz (152.5 kB view details)

Uploaded Feb 27, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

phantom_wiki-0.5.0-py3-none-any.whl (142.7 kB view details)

Uploaded Feb 27, 2025 Python 3

File details

Details for the file phantom_wiki-0.5.0.tar.gz.

File metadata

Download URL: phantom_wiki-0.5.0.tar.gz
Upload date: Feb 27, 2025
Size: 152.5 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for phantom_wiki-0.5.0.tar.gz
Algorithm	Hash digest
SHA256	`f60045f003451d243140c217d8ad3dd5f37a6ce7e46969dae045704a258cbfad`
MD5	`5eceaf449ab2ee04c18ba53599fefe74`
BLAKE2b-256	`90a7c0738f031c7a2ec64e03027522645a013b4fe71a97e62c45c635180c1d40`

See more details on using hashes here.

Provenance

The following attestation bundles were made for phantom_wiki-0.5.0.tar.gz:

Publisher: python-publish.yml on kilian-group/phantom-wiki

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: phantom_wiki-0.5.0.tar.gz
- Subject digest: f60045f003451d243140c217d8ad3dd5f37a6ce7e46969dae045704a258cbfad
- Sigstore transparency entry: 174925472
- Sigstore integration time: Feb 27, 2025
Source repository:
- Permalink: kilian-group/phantom-wiki@471f34924b79a29ef7048d93534d0c35cfe989f3
- Branch / Tag: refs/tags/v0.5.1
- Owner: https://github.com/kilian-group
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@471f34924b79a29ef7048d93534d0c35cfe989f3
- Trigger Event: release

File details

Details for the file phantom_wiki-0.5.0-py3-none-any.whl.

File metadata

Download URL: phantom_wiki-0.5.0-py3-none-any.whl
Upload date: Feb 27, 2025
Size: 142.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for phantom_wiki-0.5.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a7621a783ff472cf336f584d293323fe2ee4965642b3fa1945b40817e50076b6`
MD5	`1724d2d9c1233cfa8bb00fbe9902437c`
BLAKE2b-256	`4dfdb3a1bd474e4eb73b1603bf9894b3923fc680f5bd816e1548576aa347f249`

See more details on using hashes here.

Provenance

The following attestation bundles were made for phantom_wiki-0.5.0-py3-none-any.whl:

Publisher: python-publish.yml on kilian-group/phantom-wiki

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: phantom_wiki-0.5.0-py3-none-any.whl
- Subject digest: a7621a783ff472cf336f584d293323fe2ee4965642b3fa1945b40817e50076b6
- Sigstore transparency entry: 174925474
- Sigstore integration time: Feb 27, 2025
Source repository:
- Permalink: kilian-group/phantom-wiki@471f34924b79a29ef7048d93534d0c35cfe989f3
- Branch / Tag: refs/tags/v0.5.1
- Owner: https://github.com/kilian-group
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@471f34924b79a29ef7048d93534d0c35cfe989f3
- Trigger Event: release

phantom-wiki 0.5.0

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

PhantomWiki

Using PhantomWiki

Installation

Installing PhantomWiki in development mode

Evaluating LLMs on PhantomWiki

Setting up API keys

Reproducing LLM evaluation results in the paper

Development best practices

Sharing dataset to HuggingFace

Citation

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance