Skip to main content

Hero's Journey: a benchmark for rule induction in goal-directed episodic tasks

Project description

Hero's Journey

A benchmark for testing whether language models can induce hidden rules from demonstrations and act on them in a goal-directed, text-based adventure game.

An agent plays an RPG-style game. It sees a set of rules for the current task, but some rules are deliberately hidden. It must infer the missing requirements by studying demonstration episodes (in-context examples) and then apply the inferred pattern to a novel entity — and execute a multi-step plan, not just state the answer.

See the paper for the full design and experiments. This package is the reusable framework; our paper experiments live separately.

Source: https://github.com/asherz720/HerosJourney

Install

pip install herosjourney                 # core: task generation + env + eval
pip install "herosjourney[runner]"       # + a generic OpenAI-compatible model adapter
pip install "herosjourney[yaml]"         # + YAML task-definition files
pip install "herosjourney[analysis]"     # + pandas/numpy/matplotlib for metrics & figures

Python 3.10+.

The four concepts

Concept What it is Where it lives
Rule The abstract attribute→item/process mapping (no surface names) *.json rule file (herosjourney/core/rules/)
Task A rule + a process structure + a source/gen split a task spec *.json registered via register_task
Agent Your model, wrapped as a model_fn(prompt) -> text any callable you pass to run_single_episode
Method An induction strategy layered on the agent (ReAct/HR/IDEA/ACE) episode_mode= + herosjourney/runner/strategies.py

Quick start

import json
from herosjourney import get_task, compute_ecsr
from herosjourney.core.elements import fill_elements, load_lexicons
from herosjourney.core.demo_generator import generate_mixed_demos
from herosjourney.runner.adventure_episode import run_single_episode, construct_demo_context

# 1. RULE + TASK — pick a built-in task (additive | compositional | conditional |
#    override | proc_add | proc_comp | proc_cond | proc_over)
spec = get_task("additive")

# 2. Surface-realize the rule into a concrete variant (seed controls names)
sem_lex, nonce_lex = load_lexicons()
with open(spec.rules) as f:
    rule = json.load(f)
elements = fill_elements(rule, sem_lex, nonce_lex, seed=0, split_spec=spec.split)

# Source entities (shown in demos) and gen entities (what we evaluate)
source_tasks = spec.gen_fn(elements, split="source", use_nonce=False)
gen_tasks    = spec.gen_fn(elements, split="gen",    use_nonce=False)

# 3. Build the in-context demonstrations
demos        = generate_mixed_demos(source_tasks, distractor_tasks=[])
demo_context = construct_demo_context(demos)

# 4. AGENT — wrap your model as model_fn(prompt, max_tokens) -> (text, thinking, tokens)
def my_model_fn(prompt, max_tokens=512):
    text = my_llm(prompt)            # call your model however you like
    return text, None, None

# 5. Run one episode on a gen task and score it
result = run_single_episode(
    episode_idx=0,
    task=gen_tasks[0],
    demo_context=demo_context,
    max_runs=None,                   # defaults to reference_length * num_tries
    verbose=False,
    truncate_window=None,
    model_fn=my_model_fn,
    source_tasks=source_tasks,
)
print(result.success, result.efficiency)

# ECSR (efficiency-calibrated success rate) over a set of results
ecsr = compute_ecsr([result], n_tries=spec.max_tries)

Using a hosted/local model without writing a model_fn

Install the runner extra and point the generic OpenAI-compatible adapter at any endpoint (OpenAI, vLLM, LM Studio, Ollama, …):

export OPENAI_BASE_URL="http://localhost:8000/v1"   # your server
export OPENAI_API_KEY="EMPTY"                        # or your real key
result = run_single_episode(..., model_path="my-model-name")

Or from the command line:

adventure-story --task_type additive \
    --elements herosjourney/core/rules/additive.json \
    --model my-model-name --num_tries 2 --num_workers 4

Adding your own task (no Python required)

Write a rule file (my_rule.json, see herosjourney/core/rules/RULE_FORMAT.md) and a task spec:

{
  "name": "my_task",
  "rules": "my_rule.json",
  "process": "property_flat",
  "split": {"fn": "two_offset", "seed": 0},
  "max_tries": 5,
  "validate_fn": "validate_additive_split",
  "eval": {"correct_rule": "…", "description": "…"}
}
import herosjourney
herosjourney.register_task("path/to/my_task.json")   # .yaml also works with [yaml]
spec = herosjourney.get_task("my_task")

For custom validators or per-episode process variation, register_task(...) also accepts keyword arguments. See docs/ARCHITECTURE.md for the full architecture.

Applying an induction method

# episode_mode selects a steering strategy applied on top of your agent
result = run_single_episode(..., model_fn=my_model_fn, episode_mode="idea")
# "standard" (default), "react", "hr", "idea"

Evaluation

  • ECSR (efficiency-calibrated success rate) — herosjourney.compute_ecsr: success_rate × normalized_efficiency, where efficiency = reference_length / num_runs and the floor is 1 / n_tries.
  • RV (rule verbalization) — an LLM judge scores a model's free-text rule description; prompts are in herosjourney.eval.judge.

License

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

herosjourney-0.1.0.tar.gz (91.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

herosjourney-0.1.0-py3-none-any.whl (110.4 kB view details)

Uploaded Python 3

File details

Details for the file herosjourney-0.1.0.tar.gz.

File metadata

  • Download URL: herosjourney-0.1.0.tar.gz
  • Upload date:
  • Size: 91.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for herosjourney-0.1.0.tar.gz
Algorithm Hash digest
SHA256 2f0e1b3f1aafa593f41ed20bf9d5d33fcdefed8e66320ff67ff3a4abbd0c3224
MD5 854ccfd4d9f1060984010f83b1774a72
BLAKE2b-256 293b029a80d405cb41da3f5f05744a6522716df97194c4d443ce4ab2a328c9ec

See more details on using hashes here.

File details

Details for the file herosjourney-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: herosjourney-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 110.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for herosjourney-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c3f616cdb48450d8d9ed141c8958cfdda154be2ab5b6ae43588864eb4ce126b3
MD5 b8957ef824222ed39cec3e265e3ae246
BLAKE2b-256 89f4ad222a3e6e3efeea1ab6dd9f793496f0355a0888e52ad56bfb26e5cb7dc8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page