On-Demand Datasets for Reasoning and Retrieval Evaluation

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

ag2435 anmolkabra kamilest

These details have not been verified by PyPI

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

PhantomWiki

PhantomWiki generates on-demand datasets to evaluate reasoning and retrieval capabilities of LLMs.

Paper
Demo

🚀 Quickstart
- Pre-generated PhantomWiki datasets on Huggingface
🔗 Installing dependencies
- Installing PhantomWiki in development mode
🔢 Evaluating LLMs on PhantomWiki
- Setting up API keys
- Reproducing LLM evaluation results in the paper
📃 Citation

🚀 Quickstart

First install Prolog on your machine, then PhantomWiki with pip:

pip install phantom-wiki

[!NOTE] This package has been tested with Python 3.12. We require Python 3.10+ to support match statements.

To build from source, you can clone this repository and run pip install ..

Generate PhantomWiki datasets with random generation seed 1:

In Python:

import phantom_wiki as pw

pw.generate_dataset(
    output_dir="/path/to/output",
    seed=1,
    use_multithreading=True,
)

In a terminal:

phantom-wiki-generate -od "/path/to/output" --seed 1 --use-multithreading

(You can also use the shorthand alias pw-generate.)

[!NOTE] We do not support --use-multithreading on macOS yet, so you should skip this flag (or set it to False).

The following generation script creates datasets of various sizes with random generation seed 1:

./data/generate-v1.sh /path/to/output/ 1 --use-multithreading

Universe sizes 25, 50, 500, ..., 5K, 500K, 1M (number of documents)
Question template depth 20 (proportional to difficulty)

For example, it executes the following command to generate a size 5K universe (5000 = --max-family-tree-size * --num-family-trees):

pw-generate \
	-od /path/to/output/depth_20_size_5000_seed_1 \
	--seed 1 \
	--question-depth 20 \
	--num-family-trees 100 \
	--max-family-tree-size 50 \
	--max-family-tree-depth 20 \
	--article-format json \
	--question-format json \
	--use-multithreading

Pre-generated PhantomWiki datasets on Huggingface

For convenience of development, we provide pre-generated PhantomWiki datasets on HuggingFace (sizes 50, 500, and 5000 with seeds 1, 2, and 3).

from datasets import load_dataset

# Download the document corpus
ds_corpus = load_dataset("kilian-group/phantom-wiki-v1", "text-corpus")
# Download the question-answer pairs
ds_qa = load_dataset("kilian-group/phantom-wiki-v1", "question-answer")

🔗 Installing dependencies

PhantomWiki uses the Prolog logic programming language, available on all operating systems through SWI-Prolog. We recommend installing SWI-prolog through your distribution or through conda, for example:

# On macOS: with homebrew
brew install swi-prolog

# On Linux: with apt
sudo add-apt-repository ppa:swi-prolog/stable
sudo apt-get update
sudo apt-get install swi-prolog

# On Linux: with conda
conda install conda-forge::swi-prolog

# On Windows: download and install binary from https://www.swi-prolog.org/download/stable

Installing PhantomWiki in development mode

There are 2 options:

(Recommended) Install the package in editable mode using pip:
```
pip install -e .
```
If you use VSCode, you can add to the python path without installing the package:
1. Create a file in the repo root called .env
2. Add PYTHONPATH=src
3. Restart VSCode

🔢 Evaluating LLMs on PhantomWiki

First, install dependencies and vLLM to match your hardware (GPU, CPU, etc.):

pip install phantom-wiki[eval]

If you're installing from source, use pip install -e ".[eval]".

Setting up API keys

Anthropic

Create an API key at https://console.anthropic.com/settings/keys
Set your Anthropic API key as an environment variable. Or in your conda environment:

export ANTHROPIC_API_KEY=xxxxx
# or
conda env config vars set ANTHROPIC_API_KEY=xxxxx

Rate limits: https://docs.anthropic.com/en/api/rate-limits#updated-rate-limits

:rotating_light: The Anthropic API has particularly low rate limits so it takes longer to get predictions.

Google Gemini

Create an API key at https://aistudio.google.com/app/apikey
Set your Gemini API key as an environment variable. Or in your conda environment:

export GEMINI_API_KEY=xxxx
# or
conda env config vars set GEMINI_API_KEY=xxxxx

OpenAI

Create an API key at https://platform.openai.com/settings/organization/api-keys
Set your OpenAI API key as an environment variable. Or in your conda environment:

export OPENAI_API_KEY=xxxxx
# or
conda env config vars set OPENAI_API_KEY=xxxxx

TogetherAI

Register for an account at https://api.together.ai
Set your TogetherAI API key as an environment variable. Or in your conda environment:

export TOGETHER_API_KEY=xxxxx
# or
conda env config vars set TOGETHER_API_KEY=xxxxx

vLLM

Original setup instructions: https://docs.vllm.ai/en/stable/getting_started/installation.html#install-the-latest-code

Additional notes:

It's recommended to download the model manually:

huggingface-cli download MODEL_REPO_ID

The models and their configs are downloaded directly from HuggingFace and almost all models on HF are fair game (see also: https://docs.vllm.ai/en/stable/models/supported_models.html#supported-models)

Reproducing LLM evaluation results in the paper

[!NOTE] For vLLM inference, make sure to request access for Gemma, Llama 3.1, 3.2, and 3.3 models on HuggingFace before proceeding.

🧪 To generate the predictions from an LLM with a prompting METHOD, run the following command:

python -m phantom_eval --method METHOD --server SERVER --model_name MODEL_NAME_OR_PATH --split_list SPLIT_LIST -od OUTPUT_DIRECTORY

We implement lightweight interfaces to Anthropic, OpenAI, Gemini, and Together APIs, which you can select by specifying SERVER, e.g. anthropic, openai, gemini, together respectively. We also implement an interface to vllm server, to evaluate local LLMs.

Example usages:

METHOD can be zeroshot, fewshot, cot, react, zeroshot-rag etc.
Evaluate GPT-4o through checkpoint names --server openai --model_name gpt-4o-2024-11-20 or with name aliases --server openai --model_name gpt-4o. We pass on the model name to the API, so any LLM name supported by the API is supported by our interface. Similarly for Anthropic, Gemini, and Together.
Evaluate Huggingface LLMs through Model Card name --server vllm --model_name deepseek-ai/DeepSeek-R1-Distill-Qwen-32B, or through local weights path --server vllm --model_name /absolute/path/to/weights/.
Evaluate LoRA weights through Model Card name and path to LoRA --server vllm --model_name deepseek-ai/DeepSeek-R1-Distill-Qwen-32B --inf_vllm_lora_path /path/to/lora/weights/.

[!TIP] To generate a slurm script for clusters at Cornell (g2, empire, aida) with the appropriate GPU allocation, run bash eval/create_eval.sh script and follow the prompted steps.

📊 To generate the tables and figures, run the following command from the root directory, replacing METHODS with a space-separated list of prompting techniques e.g. "zeroshot cot zeroshot-rag cot-rag react".

./eval/evaluate.sh OUTPUT_DIRECTORY MODEL_NAME_OR_PATH METHODS
# For local datasets, specify the dataset path and add the --from_local flag
DATASET="/path/to/dataset/" ./eval/evaluate.sh OUTPUT_DIRECTORY MODEL_NAME_OR_PATH METHODS --from_local

Here, OUTPUT_DIRECTORY is the same as when generating the predictions. This script will create the following subdirectories in OUTPUT_DIRECTORY: scores/ and figures/.

📃 Citation

@article{gong2025phantomwiki,
  title={{PhantomWiki}: On-Demand Datasets for Reasoning and Retrieval Evaluation},
  author={Gong, Albert and Stankevi{\v{c}}i{\=u}t{\.e}, Kamil{\.e} and Wan, Chao and Kabra, Anmol and Thesmar, Raphael and Lee, Johann and Klenke, Julius and Gomes, Carla P and Weinberger, Kilian Q},
  journal={arXiv preprint arXiv:2502.20377},
  year={2025}
}

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

ag2435 anmolkabra kamilest

These details have not been verified by PyPI

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

1.0.3

Feb 23, 2026

1.0.2

Aug 12, 2025

This version

1.0.1

Apr 9, 2025

0.5.2

Mar 5, 2025

0.5.0

Feb 27, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

phantom_wiki-1.0.1.tar.gz (160.2 kB view details)

Uploaded Apr 9, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

phantom_wiki-1.0.1-py3-none-any.whl (156.1 kB view details)

Uploaded Apr 9, 2025 Python 3

File details

Details for the file phantom_wiki-1.0.1.tar.gz.

File metadata

Download URL: phantom_wiki-1.0.1.tar.gz
Upload date: Apr 9, 2025
Size: 160.2 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for phantom_wiki-1.0.1.tar.gz
Algorithm	Hash digest
SHA256	`214e5aec073a91c47cd0485d3912b9da432e2e0090050a58603cb7329ce272c9`
MD5	`73dd47e756f973a9e3cc55dc19fe9f79`
BLAKE2b-256	`d01f45f9a9d513e4c8aaaf4c41b4100e4d88ebb209cf88df6a04fc0a17f2a74e`

See more details on using hashes here.

Provenance

The following attestation bundles were made for phantom_wiki-1.0.1.tar.gz:

Publisher: python-publish.yml on kilian-group/phantom-wiki

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: phantom_wiki-1.0.1.tar.gz
- Subject digest: 214e5aec073a91c47cd0485d3912b9da432e2e0090050a58603cb7329ce272c9
- Sigstore transparency entry: 194597232
- Sigstore integration time: Apr 9, 2025
Source repository:
- Permalink: kilian-group/phantom-wiki@1435651bbda887eff9b363210beb129bf5b83c90
- Branch / Tag: refs/tags/v1.0.1
- Owner: https://github.com/kilian-group
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@1435651bbda887eff9b363210beb129bf5b83c90
- Trigger Event: release

File details

Details for the file phantom_wiki-1.0.1-py3-none-any.whl.

File metadata

Download URL: phantom_wiki-1.0.1-py3-none-any.whl
Upload date: Apr 9, 2025
Size: 156.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for phantom_wiki-1.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b23cfedb8aeb26b9c80c16308d9db493466c06ddff03fcf681b3cb39b9e85bd2`
MD5	`0c87ba7590be4dd68cf95f961e2e02cc`
BLAKE2b-256	`035348f9c04bc1d077d586f53dfddcf7c48cd8f6fc635357628a19d1f53aa1a8`

See more details on using hashes here.

Provenance

The following attestation bundles were made for phantom_wiki-1.0.1-py3-none-any.whl:

Publisher: python-publish.yml on kilian-group/phantom-wiki

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: phantom_wiki-1.0.1-py3-none-any.whl
- Subject digest: b23cfedb8aeb26b9c80c16308d9db493466c06ddff03fcf681b3cb39b9e85bd2
- Sigstore transparency entry: 194597233
- Sigstore integration time: Apr 9, 2025
Source repository:
- Permalink: kilian-group/phantom-wiki@1435651bbda887eff9b363210beb129bf5b83c90
- Branch / Tag: refs/tags/v1.0.1
- Owner: https://github.com/kilian-group
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@1435651bbda887eff9b363210beb129bf5b83c90
- Trigger Event: release

phantom-wiki 1.0.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Meta

Classifiers

Project description

PhantomWiki

Contents

🚀 Quickstart

Pre-generated PhantomWiki datasets on Huggingface

🔗 Installing dependencies

Installing PhantomWiki in development mode

🔢 Evaluating LLMs on PhantomWiki

Setting up API keys

Reproducing LLM evaluation results in the paper

📃 Citation

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance