Skip to main content

Python integration layer for testable physical-AI world-model workflows, with capability contracts, planning, evaluation, benchmarks, diagnostics, local state, and CLI tools.

Project description

WorldForge

Testable world-model workflows for physical-AI systems.

WorldForge is a Python integration layer that gives world-model providers, score models, embodied policies, and media generators explicit capability contracts. It adds planning, evaluation, benchmarks, diagnostics, local state, and CLI tools while keeping checkpoints, credentials, robot controllers, and deployment host-owned.

CI Docs Python Version PyPI License: MIT Coverage Typed Ruff uv Status: pre-1.0

Quickstart · CLI · Providers · Capability Model · Architecture · Quality · Docs · Playbooks · Support · Security

Robotics Showcase: LeRobot + LeWorldModel

WorldForge's front-door robotics demo composes a Hugging Face LeRobot policy with a LeWorldModel checkpoint. LeRobot proposes PushT action candidates, WorldForge bridges those policy actions into LeWorldModel-native candidate tensors, LeWorldModel scores the candidates, and WorldForge selects and mock-replays the lowest-cost action chunk.

The LeWorldModel runtime path intentionally follows the official LeWM loading contract: stable_worldmodel.policy.AutoCostModel("pusht/lewm") loads the Lucas Maes LeWM object checkpoint. stable-worldmodel is the runtime/evaluation library used by the official LeWorldModel repo, not a substitute score model.

This is simulation/replay planning. It demonstrates policy inference, score-model inference, typed provider composition, candidate ranking, event capture, and visual replay. Hardware control, safety checks, robot-controller integration, and task-specific preprocessing stay host-owned.

WorldForge robotics showcase TUI with pipeline flow, runtime metrics, and tensor contract
Pipeline: real policy, real score checkpoint, WorldForge planner, local mock replay.
WorldForge robotics showcase TUI with robot-arm illustration, candidate ranking, and tabletop replay
Decision: candidate ranking, robot-arm illustration, and fixed tabletop replay.
scripts/robotics-showcase

The command launches a staged Textual report by default and writes the same run data to /tmp/worldforge-robotics-showcase/real-run.json. Use --tui-stage-delay 0.1 for a faster reveal, --no-tui-animation to skip sleeps and arm motion, --no-tui for the plain terminal report, --json-only for automation, or --health-only for a non-mutating dependency/checkpoint preflight. Use --lewm-revision <tag-or-commit> to pin auto-built LeWorldModel assets.

Read the walkthrough and implementation notes: Robotics Replay Showcase and Robotics Showcase Technical Deep Dive.

TheWorldHarness TUI - checkout-safe visual harness for worlds, providers, evals, benchmarks, and packaged flows

TheWorldHarness is the optional Textual workspace for inspecting WorldForge flows without installing robotics or model runtimes. It runs checkout-safe demos, provider diagnostics, benchmark comparison, world editing, and saved report previews through the harness extra.

uv run --extra harness worldforge-harness
uv run --extra harness worldforge-harness --flow lerobot
uv run --extra harness worldforge-harness --flow diagnostics
TheWorldHarness home screen with keyboard-first launch targets
Home: keyboard-first launchpad for worlds, providers, evals, and help.
TheWorldHarness run inspector showing a score-planning flow
Run inspector: flow trace with scored plans, metrics, and transcript.
TheWorldHarness world editor with saved state preview
World editor: persisted scene state, objects, provider, and preview in one place.
TheWorldHarness provider screen with bindings help overlay
Provider help: discoverable bindings over live provider diagnostics.

More detail: TheWorldHarness docs.


Overview

A score model, a robot policy server, a video simulator, and a remote media API have different inputs, runtimes, and failure modes. WorldForge does not flatten those differences. Each provider adapter declares which of eight capabilities it supports (predict, score, policy, generate, transfer, reason, embed, plan). The contract is strict and fail-closed: calling an unsupported capability raises rather than quietly returning empty results.

Planning, evaluation, benchmarks, diagnostics, and persistence are built on top of that contract, not on any specific runtime. Benchmark budget files can turn success rate, error count, retry count, latency, and throughput thresholds into non-zero CLI gates for release checks or preserved benchmark claims.

WorldForge is not a hosted service, a model API abstraction, or a training framework. Optional runtimes, robot stacks, credentials, checkpoints, and durable storage remain the host application's responsibility.

Highlights

Capability contracts Eight named capabilities. Adapters advertise only what they actually implement and return typed WorldForge results. Unknown names raise instead of behaving like empty filters.
Composable planning Combine predictive, score, and policy providers in a single planning loop. Rank candidates, roll out futures, execute actions, persist state.
Deterministic by default Built-in mock provider, reusable contract assertions (worldforge.testing), and packaged demos that run from a clean checkout without credentials or GPUs.
Host-owned runtimes No torch, CUDA, robot controllers, or checkpoints in base dependencies. LeWorldModel, GR00T, LeRobot, Cosmos, and Runway integrate through their own surfaces.
Diagnostics worldforge doctor, provider events, benchmark and evaluation harnesses, and an optional Textual TUI (TheWorldHarness) for inspecting traces.
Quality gates py.typed, import-isolated pytest, ruff, a 90% coverage floor, strict docs, and wheel + sdist contract tests in CI on Python 3.13.

Install

Library (recommended)

# From PyPI (recommended)
uv add worldforge-ai
# or
pip install worldforge-ai

The Python import path stays the same:

import worldforge

If you want the optional Textual harness UI:

uv add "worldforge-ai[harness]"

From source (bleeding edge)

uv add "worldforge-ai @ git+https://github.com/AbdelStark/worldforge"

Repository development

git clone https://github.com/AbdelStark/worldforge.git
cd worldforge
uv sync --group dev
cp .env.example .env

Optional extras:

uv sync --group dev --extra harness   # TheWorldHarness Textual TUI

Python 3.13 only. Base install depends only on httpx. Optional runtimes are host-owned.

Quickstart

The short path is the mock provider: it runs from a clean checkout and exercises the same typed world, provider, planning, persistence, and diagnostics surfaces used by richer runtimes.

Full references: Python API · CLI reference · Examples index

Python API sample
from worldforge import Action, BBox, Position, SceneObject, StructuredGoal, WorldForge

forge = WorldForge()
world = forge.create_world("kitchen", provider="mock")

world.add_object(
    SceneObject(
        "red_mug",
        Position(0.0, 0.8, 0.0),
        BBox(Position(-0.05, 0.75, -0.05), Position(0.05, 0.85, 0.05)),
    )
)

prediction = world.predict(Action.move_to(0.3, 0.8, 0.0), steps=2)
print(prediction.provider, prediction.physics_score)

plan = world.plan(
    goal_spec=StructuredGoal.object_at(
        object_name="red_mug",
        position=Position(0.3, 0.8, 0.0),
    )
)
print(plan.action_count, plan.success_probability)

doctor = forge.doctor()
print(doctor.healthy_provider_count, doctor.provider_count)
CLI sample
uv run worldforge examples                                              # runnable scripts index
uv run worldforge doctor --registered-only                              # active provider health
uv run worldforge world create lab --provider mock                      # save a local world
uv run worldforge world add-object <world-id> cube --x 0 --y 0.5 --z 0  # edit scene state
uv run worldforge world predict <world-id> --object-id <object-id> --x 0.4 --y 0.5 --z 0
uv run worldforge world list                                            # persisted worlds
uv run worldforge world objects <world-id>                              # scene objects
uv run worldforge world history <world-id>                              # object edits + predictions
uv run worldforge world export <world-id> --output world.json           # portable state JSON
uv run worldforge world delete <world-id>                               # remove local JSON state
uv run worldforge provider list                                         # registered providers
uv run worldforge provider info mock                                    # capability surface
uv run worldforge predict kitchen --provider mock --x 0.3 --y 0.8 --z 0.0 --steps 2
uv run worldforge eval --suite planning --provider mock --format json
uv run worldforge benchmark --provider mock --iterations 5 --format json
uv run worldforge benchmark --provider mock --operation embed --input-file examples/benchmark-inputs.json
uv run worldforge benchmark --provider mock --operation generate --budget-file examples/benchmark-budget.json

Scene mutations append persisted history entries with typed action payloads. Position patches keep the object's bounding box translated with the pose so saved snapshots stay coherent.

Full CLI reference: worldforge/cli.

Capability Model

In WorldForge, a "capability" names an operation an adapter actually supports, not the upstream model's branding.

Capability Signature Example providers
predict state + action → predicted state mock
score observations + goal + candidates → ranked candidates leworldmodel
policy observation + instruction → action chunks gr00t, lerobot
generate prompt + options → media artifact cosmos, runway, mock
transfer artifact + prompt/options → artifact runway, mock
reason structured reasoning over state mock
embed observation → embedding mock
plan facade over composed surfaces WorldForge facade

Adapters can register a full BaseProvider or a narrow capability protocol implementation such as a Cost, Policy, Generator, or Predictor. The protocol path is intentionally small: declare name, optional profile metadata, and the one method behind the advertised capability. Registered protocol implementations are visible through diagnostics, planning, and benchmarks without forcing unrelated provider methods into the adapter.

LeWorldModel is a score provider, not a video generator. GR00T and LeRobot are policy providers, not predictive world models. Cosmos and Runway are media generators, not controllable physical planning.

The canonical loop:

observe state
  → propose candidate actions
  → score or roll out possible futures  (score / predict)
  → select an action sequence            (plan)
  → execute through a provider           (policy / predict)
  → persist, evaluate, observe again

Provider Surfaces

Provider Capability surface Registration Runtime ownership
mock predict, generate, transfer, reason, embed always registered in-repo deterministic local provider
cosmos generate COSMOS_BASE_URL host supplies a reachable Cosmos deployment and optional NVIDIA_API_KEY
runway generate, transfer RUNWAYML_API_SECRET or RUNWAY_API_SECRET host supplies Runway credentials and persists returned artifacts
leworldmodel score LEWORLDMODEL_POLICY or LEWM_POLICY host installs the official LeWM loading path (stable_worldmodel.policy.AutoCostModel), torch, and compatible checkpoints
gr00t policy GROOT_POLICY_HOST host runs or reaches an Isaac GR00T policy server
lerobot policy LEROBOT_POLICY_PATH or LEROBOT_POLICY host installs LeRobot and compatible policy checkpoints
jepa scaffold JEPA_MODEL_PATH capability-fail-closed reservation, not a real JEPA runtime
genie scaffold GENIE_API_KEY capability-fail-closed reservation, not a real Genie runtime

jepa and genie are capability-closed reservations. Executable scaffold candidates stay outside package exports and auto-registration until they have a validated runtime path, typed parser coverage, request limits, and docs. The active candidate is jepa-wms, a direct-construction scaffold targeting future facebookresearch/jepa-wms score-provider work.

Architecture

  ┌──────────────────────────────────────────────┐
  │  Host application / CLI                      │
  └──────────────────────┬───────────────────────┘
                         │
                         ▼
  ┌──────────────────────────────────────────────┐
  │  WorldForge facade                           │
  │  catalog · registry · diagnostics · persist  │
  └──────────────────────┬───────────────────────┘
                         │
                         ▼
  ┌──────────────────────────────────────────────┐
  │  World runtime                               │
  │  state · history · planning · execution      │
  └──────────────────────┬───────────────────────┘
                         │
                         ▼
  ┌──────────────────────────────────────────────┐
  │  Provider adapter                            │
  │  capability contract · validation · events   │
  └──────────────────────┬───────────────────────┘
                         │
                         ▼
  ┌──────────────────────────────────────────────┐
  │  Upstream runtime or API                     │
  │  local model · policy server · media API     │
  └──────────────────────────────────────────────┘
Path Responsibility
src/worldforge/models.py Domain models, serialization, validation errors, provider metadata, result types, request policies
src/worldforge/framework.py WorldForge, World, persistence, planning, prediction, comparison, diagnostics
src/worldforge/providers/catalog.py In-repo provider factories and auto-registration policy
src/worldforge/providers/base.py Provider interfaces, ProviderError, remote-provider behavior, PredictionPayload
src/worldforge/providers/ Concrete adapters: mock, Cosmos, Runway, LeWorldModel, GR00T, LeRobot, JEPA, Genie
src/worldforge/evaluation/ Deterministic evaluation suites and report renderers
src/worldforge/benchmark.py Capability-aware latency, retry, throughput, and event benchmark harness
src/worldforge/observability.py ProviderEvent sinks for logs, recording, and metrics
src/worldforge/testing/ Reusable provider contract assertions

Read architecture · world-model taxonomy · provider authoring guide before adding a new adapter.

Command And Example Index

The README keeps the primary showcase and quickstart visible. Use the docs for the full command surface and runtime-specific entrypoints:

Need Start here
Full CLI command map CLI reference
Runnable example index Examples and CLI commands or uv run worldforge examples
LeRobot + LeWorldModel replay showcase Robotics showcase walkthrough
Checkout-safe visual flows TheWorldHarness
Optional runtime operations Operator playbooks
Support, security, citation Support, Security, Citation

Who It's For

  • Researchers comparing world-model surfaces without rewriting the harness for each one.
  • Robotics and physical-AI engineers wiring policies, scorers, simulators, and media providers around their own stacks.
  • Framework builders shipping adapter packages, CLI workflows, and reproducible demos.
  • Anyone who wants the repo to run from a clean checkout before installing CUDA or downloading checkpoints.

Operating Boundaries

  • Capabilities are contracts. Don't advertise an operation unless the adapter implements it and returns the typed WorldForge result.
  • Optional runtimes remain host-owned. No torch, LeWorldModel, LeRobot, GR00T, CUDA, TensorRT, controllers, checkpoints, or datasets in base dependencies.
  • Embodiment-specific action translation is host-owned. Policy providers preserve raw actions; the caller converts them into executable Action objects.
  • Local JSON persistence is single-writer and available through both Python APIs and worldforge world CLI commands. Services needing locking, transactions, or migrations own that layer.
  • Built-in evaluation suites are deterministic contract harnesses. They are not physical-fidelity, media-quality, or real-world safety claims.
  • Scaffold adapters (jepa, genie, jepa-wms) are placeholders, not real integrations.
  • World IDs are local storage identifiers. Path separators and traversal-shaped IDs are rejected.

Development

Primary local gate (same as CI):

uv sync --group dev
make check

make check runs the lock check, Ruff, strict docs build, full pytest, harness coverage gate, wheel/sdist package contract, and distribution build. Run make release-check before a tag; it adds the dependency audit. The expanded gate and triage steps live in the operator playbooks.

Scaffold a new provider:

uv run python scripts/scaffold_provider.py "Acme WM" \
  --taxonomy "JEPA latent predictive world model" \
  --planned-capability score

Contributor guide: CONTRIBUTING.md. Repository agent context: AGENTS.md.

Project Status

WorldForge is pre-1.0 beta. Minor releases may still include breaking changes when the public API needs to tighten.

Useful today for

  • local provider adapter development
  • deterministic planning and evaluation experiments
  • checkout-safe demos and optional-runtime smoke tests
  • contract testing for third-party provider packages
  • CLI diagnostics around provider registration, health, and capabilities

Known limits

  • jepa and genie are capability-fail-closed scaffold adapters
  • jepa-wms is a direct-construction candidate, not exported or auto-registered
  • local JSON persistence is single-writer only
  • evaluation scores are contract signals, not physical-fidelity or safety claims
  • optional runtimes, checkpoints, trace export, dashboards, and production telemetry stay host-owned

Citing WorldForge

If you use WorldForge in academic work, a BibTeX entry is:

@software{worldforge,
  title   = {WorldForge: An integration layer for physical-AI world models},
  author  = {AbdelStark and {WorldForge contributors}},
  year    = {2026},
  url     = {https://github.com/AbdelStark/worldforge},
  version = {0.5.0}
}

Contributing

Issues, discussions, and pull requests are welcome. Please read CONTRIBUTING.md and open an issue for non-trivial changes before sending a patch. For provider work, start with the provider authoring guide and the playbooks.

License

WorldForge is released under the MIT License.

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

worldforge_ai-0.5.0.tar.gz (1.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

worldforge_ai-0.5.0-py3-none-any.whl (207.8 kB view details)

Uploaded Python 3

File details

Details for the file worldforge_ai-0.5.0.tar.gz.

File metadata

  • Download URL: worldforge_ai-0.5.0.tar.gz
  • Upload date:
  • Size: 1.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.18 {"installer":{"name":"uv","version":"0.9.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for worldforge_ai-0.5.0.tar.gz
Algorithm Hash digest
SHA256 c9721a7b924816b5bd54af452f4d336780f67f28721e9810addda8ae7e680e74
MD5 d56d5228e61e38f967257e16b92330cd
BLAKE2b-256 0110e619a7d96036d8dbc60ca7ea276d0b001825bf7eb9c85ca6c3c2b7d66a0c

See more details on using hashes here.

File details

Details for the file worldforge_ai-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: worldforge_ai-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 207.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.18 {"installer":{"name":"uv","version":"0.9.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for worldforge_ai-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5cdce7708214660888eac228cfff8252a559010df24f6044be05eaa205870a3e
MD5 896e41436c8f7d4a107cd7f66225feea
BLAKE2b-256 857a0f04c9c880a288f684f4db65b9f58b2d513e69e652931b9929c54a7ce6f5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page