Hero's Journey: a benchmark for rule induction in goal-directed episodic tasks

These details have not been verified by PyPI

Project links

Project description

Hero's Journey

A benchmark for testing whether language models can induce hidden rules from demonstrations and act on them in a goal-directed, text-based adventure game.

🔗 Code | 📄 Paper | 📦 PyPI | 📐 Architecture

🔔 Overview

An agent plays an RPG-style game. It sees a set of rules for the current task, but some rules are deliberately hidden. It must infer the missing requirements by studying demonstration episodes (in-context examples) and then apply the inferred pattern to a novel entity — and execute a multi-step plan, not just state the answer.

See the paper for the full design and experiments. This package is the reusable framework; our paper experiments live separately.

🛠️ Install

pip install herosjourney                 # core: task generation + env + eval
pip install "herosjourney[runner]"       # + a generic OpenAI-compatible model adapter
pip install "herosjourney[yaml]"         # + YAML task-definition files
pip install "herosjourney[analysis]"     # + pandas/numpy/matplotlib for metrics & figures

Python 3.10+.

🧩 The four concepts

Concept	What it is	Where it lives
Rule	The abstract attribute→item/process mapping (no surface names)	`*.json` rule file (`herosjourney/core/rules/`)
Task	A rule + a process structure + a source/gen split	a task spec `*.json` registered via `register_task`
Agent	Your model, wrapped as a `model_fn(prompt) -> text`	any callable you pass to `run_single_episode`
Method	An induction strategy layered on the agent (ReAct/HR/IDEA/ACE)	`episode_mode=` + `herosjourney/runner/strategies.py`

🚀 Quick start

import json
from herosjourney import get_task, compute_ecsr
from herosjourney.core.elements import fill_elements, load_lexicons
from herosjourney.core.demo_generator import generate_mixed_demos
from herosjourney.runner.adventure_episode import run_single_episode, construct_demo_context

# 1. RULE + TASK — pick a built-in task (additive | compositional | conditional |
#    override | proc_add | proc_comp | proc_cond | proc_over)
spec = get_task("additive")

# 2. Surface-realize the rule into a concrete variant (seed controls names)
sem_lex, nonce_lex = load_lexicons()
with open(spec.rules) as f:
    rule = json.load(f)
elements = fill_elements(rule, sem_lex, nonce_lex, seed=0, split_spec=spec.split)

# Source entities (shown in demos) and gen entities (what we evaluate)
source_tasks = spec.gen_fn(elements, split="source", use_nonce=False)
gen_tasks    = spec.gen_fn(elements, split="gen",    use_nonce=False)

# 3. Build the in-context demonstrations
demos        = generate_mixed_demos(source_tasks, distractor_tasks=[])
demo_context = construct_demo_context(demos)

# 4. AGENT — wrap your model as model_fn(prompt, max_tokens) -> (text, thinking, tokens)
def my_model_fn(prompt, max_tokens=512):
    text = my_llm(prompt)            # call your model however you like
    return text, None, None

# 5. Run one episode on a gen task and score it
result = run_single_episode(
    episode_idx=0,
    task=gen_tasks[0],
    demo_context=demo_context,
    max_runs=None,                   # defaults to reference_length * num_tries
    verbose=False,
    truncate_window=None,
    model_fn=my_model_fn,
    source_tasks=source_tasks,
)
print(result.success, result.efficiency)

# ECSR (efficiency-calibrated success rate) over a set of results
ecsr = compute_ecsr([result], n_tries=spec.max_tries)

Using a hosted/local model without writing a `model_fn`

Install the runner extra and point the generic OpenAI-compatible adapter at any endpoint (OpenAI, vLLM, LM Studio, Ollama, …):

export OPENAI_BASE_URL="http://localhost:8000/v1"   # your server
export OPENAI_API_KEY="EMPTY"                        # or your real key

result = run_single_episode(..., model_path="my-model-name")

Or from the command line:

adventure-story --task_type additive \
    --elements herosjourney/core/rules/additive.json \
    --model my-model-name --num_tries 2 --num_workers 4

➕ Adding your own task (no Python required)

Write a rule file (my_rule.json, see herosjourney/core/rules/RULE_FORMAT.md) and a task spec:

{
  "name": "my_task",
  "rules": "my_rule.json",
  "process": "property_flat",
  "split": {"fn": "two_offset", "seed": 0},
  "max_tries": 5,
  "validate_fn": "validate_additive_split",
  "eval": {"correct_rule": "…", "description": "…"}
}

import herosjourney
herosjourney.register_task("path/to/my_task.json")   # .yaml also works with [yaml]
spec = herosjourney.get_task("my_task")

For custom validators or per-episode process variation, register_task(...) also accepts keyword arguments. See docs/ARCHITECTURE.md for the full architecture.

🧠 Applying an induction method

# episode_mode selects a steering strategy applied on top of your agent
result = run_single_episode(..., model_fn=my_model_fn, episode_mode="idea")
# "standard" (default), "react", "hr", "idea"

📊 Evaluation

ECSR (efficiency-calibrated success rate) — herosjourney.compute_ecsr: success_rate × normalized_efficiency, where efficiency = reference_length / num_runs and the floor is 1 / n_tries.
RV (rule verbalization) — an LLM judge scores a model's free-text rule description; prompts are in herosjourney.eval.judge.

📖 Citation

If you use Hero's Journey in your research, please cite:

@misc{zheng2026herosjourneytestingcomplex,
      title={HERO'S JOURNEY: Testing Complex Rule Induction with Text Games},
      author={Anshun Asher Zheng and Kanishka Misra and David I. Beaver and Junyi Jessy Li},
      year={2026},
      eprint={2606.02556},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2606.02556},
}

📄 License

This project is released under the MIT License.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.1

Jun 15, 2026

0.1.0

May 30, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

herosjourney-0.1.1.tar.gz (92.7 kB view details)

Uploaded Jun 15, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

herosjourney-0.1.1-py3-none-any.whl (110.9 kB view details)

Uploaded Jun 15, 2026 Python 3

File details

Details for the file herosjourney-0.1.1.tar.gz.

File metadata

Download URL: herosjourney-0.1.1.tar.gz
Upload date: Jun 15, 2026
Size: 92.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for herosjourney-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`5f8d726be4749fcfc2d2dd816684af0b8ed4719c40abf49d4be5728b6be28645`
MD5	`31724a7b2729ebda9cff69623e90c7ca`
BLAKE2b-256	`e1e5ecab0440bc481c1441ed038394514e4765a4b940c77aa06e9995780b2ea1`

See more details on using hashes here.

File details

Details for the file herosjourney-0.1.1-py3-none-any.whl.

File metadata

Download URL: herosjourney-0.1.1-py3-none-any.whl
Upload date: Jun 15, 2026
Size: 110.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for herosjourney-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3bb78b7ee68ddfa615f09e6fad357c65f23f3a9a58ecec0debb249ff48f47444`
MD5	`0e8d8ef887126064ee3e27983c0dbde2`
BLAKE2b-256	`40e2577fdf5863fbfda0eb49414f9fe86d6d09aa36e9457ce04031869d6f31fb`

See more details on using hashes here.

herosjourney 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Hero's Journey

🔔 Overview

🛠️ Install

🧩 The four concepts

🚀 Quick start

Using a hosted/local model without writing a `model_fn`

➕ Adding your own task (no Python required)

🧠 Applying an induction method

📊 Evaluation

📖 Citation

📄 License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

herosjourney 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Hero's Journey

🔔 Overview

🛠️ Install

🧩 The four concepts

🚀 Quick start

Using a hosted/local model without writing a model_fn

➕ Adding your own task (no Python required)

🧠 Applying an induction method

📊 Evaluation

📖 Citation

📄 License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Using a hosted/local model without writing a `model_fn`