Skip to main content

The open-source world model runtime. Train physics-aware AI on a laptop. Deploy anywhere.

Project description

WorldKit

The open-source world model SDK.
Train, predict, plan, and deploy — on a laptop.

License: MIT Python 3.9+ CI Models on HF

Paper | Models | Docs | Examples | Contributing


What is WorldKit?

WorldKit is a Python SDK for training and deploying lightweight world models — neural networks that learn how environments behave and can imagine future states without interacting with the real world.

from worldkit import WorldModel

# Train a world model from your data
model = WorldModel.train(data="my_data.h5", config="base", epochs=100)

# Imagine the future: given a state and actions, predict what happens next
result = model.predict(current_frame, actions)

# Plan: find the actions that reach a goal state
plan = model.plan(current_frame, goal_frame, max_steps=50)

Why world models matter: Instead of trial-and-error in the real world (slow, expensive, dangerous), a world model lets an agent "think ahead" by simulating outcomes in a learned latent space. This is how robots can plan manipulation sequences, how game AI can anticipate physics, and how anomaly detectors can flag impossible events.

Why WorldKit: Existing world model implementations are research code — coupled to specific environments, hard to train, harder to deploy. WorldKit gives you a clean train → predict → plan → deploy pipeline with one hyperparameter.

Key Features

  • Train in minutes — 13M-param model trains in ~60 seconds on an M4 MacBook
  • One hyperparameter — SIGReg regularization replaces 6+ collapse-prevention hyperparameters
  • Plan in latent space — CEM planner "imagines" thousands of futures without rendering pixels
  • Deploy anywhere — Export to ONNX or TorchScript for edge, mobile, or server
  • Hub integration — Push and pull trained models from Hugging Face

Install

pip install worldkit

Optional extras:

pip install worldkit[train]    # WandB logging, Hydra configs
pip install worldkit[envs]     # Gymnasium environment wrappers
pip install worldkit[serve]    # FastAPI inference server
pip install worldkit[export]   # ONNX / TorchScript export
pip install worldkit[all]      # Everything

Quickstart

Train a model

from worldkit import WorldModel

model = WorldModel.train(
    data="my_data.h5",   # HDF5 with pixels + actions
    config="base",        # nano | base | large | xl
    epochs=100,
)
model.save("my_model.wk")

Load a pre-trained model

model = WorldModel.from_hub("DilpreetBansi/pusht")

Predict future states

# Given current observation and a sequence of actions,
# roll out the dynamics model in latent space
result = model.predict(current_frame, actions=[action] * 10)
# result.latent_trajectory: (10, 192) predicted latent states
# result.confidence: prediction confidence score

Plan to reach a goal

# Find an action sequence that takes you from current_frame to goal_frame
plan = model.plan(current_frame, goal_frame, max_steps=50)
# plan.actions: optimized action sequence
# plan.cost: final planning cost (lower = closer to goal)

Detect anomalies

# Score whether a video sequence is physically plausible
score = model.plausibility(video_frames)
# 1.0 = expected behavior, 0.0 = physically impossible

Pre-trained Models

Model Config Params Latent Dim Task Download
DilpreetBansi/pusht base 13M 192 Push-T manipulation WorldModel.from_hub("DilpreetBansi/pusht")
DilpreetBansi/pusht-nano nano 3.5M 128 Push-T manipulation WorldModel.from_hub("DilpreetBansi/pusht-nano")

Train your own and share it: model.save("my_model.wk") then upload to the Hub.

Model Configurations

All configs share the same API. Pick the one that fits your compute budget.

Config Params Latent Dim Encoder Predictor Depth Train Time*
nano ~3.5M 128 ViT-Tiny 2 layers ~30s
base ~13M 192 ViT-Small 3 layers ~60s
large ~54M 384 ViT-Base 4 layers ~8 min
xl ~102M 512 ViT-Large 6 layers ~20 min

*On Apple M4 Pro with MPS. GPU times will vary.

Architecture

WorldKit implements a world model using the JEPA (Joint-Embedding Predictive Architecture) pattern — an architecture class proposed by Yann LeCun where prediction happens in latent space rather than pixel space.

JEPA alone is an architecture, not a training method. Many architectures are JEPAs (including Siamese networks from 1993). The critical question is how you prevent representation collapse — how you stop the model from learning a trivial mapping where all inputs produce the same output.

WorldKit uses SIGReg (Sketch Isotropic Gaussian Regularizer), introduced in the LeWorldModel paper, which solves collapse with a single hyperparameter:

L = L_prediction + λ · SIGReg(Z)

where:
  L_prediction = MSE between predicted and actual latent states
  SIGReg(Z)    = KL divergence approximation enforcing Gaussian structure on Z
  λ             = the ONE hyperparameter you tune (default: 1.0)

This replaces the 6+ hyperparameters required by prior methods (VICReg, Barlow Twins, BYOL).

Components

Observation (96x96 RGB)
        │
        ▼
┌───────────────┐
│   ViT Encoder │ ── CLS token pooling ──▶ z ∈ R^192 (latent state)
└───────────────┘
        │
        ▼
┌───────────────────────┐
│ Predictor (AdaLN-Zero)│ ── conditioned on action embeddings
│   Transformer         │ ── causal attention
└───────────────────────┘
        │
        ▼
   z' ∈ R^192 (predicted next state)
        │
        ▼
┌───────────────┐
│  CEM Planner  │ ── samples action candidates
│               │ ── rolls out in latent space (no pixels)
│               │ ── refines toward goal
└───────────────┘
        │
        ▼
   Optimal action sequence
  • Encoder — Vision Transformer (ViT) compresses 96x96 RGB images into compact latent vectors via CLS token pooling. ~200x more compact than patch-level representations.
  • Predictor — Transformer with AdaLN-Zero conditioning. Given latent state z and action a, predicts next state z'. Autoregressive for multi-step rollouts.
  • Planner — Cross-Entropy Method (CEM) that searches for optimal actions by "imagining" outcomes entirely in latent space — no rendering, no physics engine needed.

CLI

# Train
worldkit train --data ./data.h5 --config base --epochs 100

# Serve as REST API
worldkit serve --model ./model.wk --port 8000

# Export for edge deployment
worldkit export --model ./model.wk --format onnx

# Inspect a model
worldkit info --model ./model.wk

# Convert video data to HDF5
worldkit convert --input ./videos/ --output ./data.h5 --fps 10

# Hub operations
worldkit hub download DilpreetBansi/pusht

REST API

worldkit serve --model ./model.wk --port 8000
Endpoint Method Description
/health GET Server status and model info
/encode POST Encode observation to latent vector
/predict POST Predict future latent states from actions
/plan POST Plan optimal action sequence to reach a goal
/plausibility POST Score physical plausibility of a video

Examples

Example What it shows
01_quickstart.py Train, predict, plan in 5 lines
02_train_from_gym.py Record a Gymnasium env and train
03_plan_to_goal.py Goal-conditioned CEM planning
04_anomaly_detection.py Detect physically impossible events
05_export_onnx.py Export to ONNX / TorchScript
06_serve_api.py Deploy as a REST API
07_latent_probing.py Visualize what the latent space learns

Project Structure

worldkit/
├── core/           # WorldModel, ViT encoder, predictor, CEM planner, SIGReg loss
├── data/           # HDF5 dataset, env recorder, video converter
├── cli/            # CLI commands (train, serve, export, hub, convert)
├── server/         # FastAPI inference server
├── envs/           # Gymnasium wrappers
├── eval/           # Benchmarks, probing, visualization
├── export/         # ONNX and TorchScript export
└── hub/            # Hugging Face Hub integration

Research & Acknowledgments

WorldKit is an independent open-source project created by Dilpreet Bansi. It is not affiliated with, endorsed by, or sponsored by any of the researchers or institutions listed below.

The concept of learning world models with neural networks was pioneered by:

Recurrent World Models Facilitate Policy Evolution David Ha, Jürgen Schmidhuber (2018) — NIPS 2018 Paper | Code

Ha & Schmidhuber demonstrated that agents can learn entirely inside their own "dreams" — training in a learned simulation of the environment and transferring policies back to reality. Their VAE + MDN-RNN architecture is the foundation that all modern world models build upon.

WorldKit v0.1 implements the architecture and training methodology from:

LeWorldModel: Learning World Models with Joint-Embedding Predictive Architectures Lucas Maes, Quentin Le Lidec, Damien Scieur, Yann LeCun, Randall Balestriero (2026) Paper | Code

LeWM builds on Ha & Schmidhuber's vision but replaces the generative approach (pixel reconstruction) with a JEPA-based approach (latent prediction), and uses SIGReg to solve the collapse problem with a single hyperparameter.

The JEPA architectural pattern was proposed in:

A Path Towards Autonomous Machine Intelligence Yann LeCun (2022) Paper

WorldKit builds on these open-source projects:

Citation

If you use WorldKit in your research, please cite both WorldKit and the underlying research:

@software{worldkit,
  title   = {WorldKit: The Open-Source World Model SDK},
  author  = {Bansi, Dilpreet},
  year    = {2026},
  url     = {https://github.com/DilpreetBansi/worldkit},
  license = {MIT}
}

@article{lewm2026,
  title   = {LeWorldModel: Learning World Models with Joint-Embedding Predictive Architectures},
  author  = {Maes, Lucas and Le Lidec, Quentin and Scieur, Damien and LeCun, Yann and Balestriero, Randall},
  year    = {2026},
  url     = {https://le-wm.github.io/}
}

@incollection{ha2018worldmodels,
  title     = {Recurrent World Models Facilitate Policy Evolution},
  author    = {Ha, David and Schmidhuber, J{\"u}rgen},
  booktitle = {Advances in Neural Information Processing Systems 31},
  pages     = {2451--2463},
  year      = {2018},
  url       = {https://worldmodels.github.io}
}

Roadmap

WorldKit v0.1 ships with the LeWM architecture. The goal is to become a unified SDK for all world model architectures — same train/predict/plan API, multiple backends.

Version Architecture Type Status
v0.1 LeWM (JEPA + SIGReg) Latent prediction Available now
v0.2 Ha & Schmidhuber (2018) (VAE + MDN-RNN) Generative Planned
v0.3 Dreamer V4 (VAE-based) Generative Planned
v0.4 TD-MPC2 (task-specific MPC) Latent prediction Planned
v0.5 DIAMOND (diffusion-based) Generative Planned
v0.6 Custom architecture API Any Planned

The vision:

# Today (v0.1) — LeWM is the default and only backend
model = WorldModel.train(data="my_data.h5", config="base")

# Future (v0.2+) — choose your architecture, same API
model = WorldModel.train(data="my_data.h5", arch="lewm", config="base")
model = WorldModel.train(data="my_data.h5", arch="ha2018", config="base")
model = WorldModel.train(data="my_data.h5", arch="dreamer", config="medium")
model = WorldModel.train(data="my_data.h5", arch="td-mpc", config="large")

One API. Any world model. Train on a laptop, deploy anywhere.

Want to help build this? See CONTRIBUTING.md.

Known Limitations

  • Synthetic data only — current pre-trained models are trained on synthetic Push-T environments. Real-world robotics models coming soon.
  • No video decoder — WorldKit predicts in latent space. It does not reconstruct pixel observations from latent states (by design — this is a feature of JEPA, not a limitation).
  • Single-task models — each model is trained on one environment. Multi-task and transfer learning are planned.
  • CPU/MPS training only tested — CUDA training works but is less extensively tested at this stage.

Disclaimer

This software is provided "as is" without warranty of any kind. WorldKit is an independent open-source project. It is not affiliated with, endorsed by, or connected to Meta, FAIR, NYU, or any other company or research institution.

See NOTICE for complete third-party attribution.

Contributing

Contributions welcome. See CONTRIBUTING.md for guidelines.

License

MIT License — Copyright (c) 2026 Dilpreet Bansi and WorldKit Contributors.


Built by Dilpreet Bansi

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

worldkit-0.1.0.tar.gz (39.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

worldkit-0.1.0-py3-none-any.whl (34.6 kB view details)

Uploaded Python 3

File details

Details for the file worldkit-0.1.0.tar.gz.

File metadata

  • Download URL: worldkit-0.1.0.tar.gz
  • Upload date:
  • Size: 39.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for worldkit-0.1.0.tar.gz
Algorithm Hash digest
SHA256 fd32bc47db06c96981f816d8edd1037092a658b327af32c7d9c0f7cbd74ccd51
MD5 845f36c31318e443059c4ffeb821f586
BLAKE2b-256 8461509864c6698d685b01756ea69bebb135e007500490f3fe3df0a433f2fb09

See more details on using hashes here.

File details

Details for the file worldkit-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: worldkit-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 34.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for worldkit-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2cf6c21c3eed195de6bccfe4409a8422f63dd7d052f7d4a9195b1f2680aec405
MD5 3572ed6b68c56bd6bbd7cd3aeec46571
BLAKE2b-256 f0300484fedfe8e80749a4255631271f084c479e1d5a4d4712a9dfa15e8a9aac

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page