The open-source world model runtime. Train physics-aware AI on a laptop. Deploy anywhere.

These details have not been verified by PyPI

Project links

Project description

WorldKit

The open-source world model SDK.
Train, predict, plan, and deploy — on a laptop.

Paper | Models | Docs | Examples | Contributing

What is WorldKit?

WorldKit is a Python SDK for training and deploying lightweight world models — neural networks that learn how environments behave and can imagine future states without interacting with the real world.

from worldkit import WorldModel

# Train a world model from your data
model = WorldModel.train(data="my_data.h5", config="base", epochs=100)

# Imagine the future: given a state and actions, predict what happens next
result = model.predict(current_frame, actions)

# Plan: find the actions that reach a goal state
plan = model.plan(current_frame, goal_frame, max_steps=50)

Why world models matter: Instead of trial-and-error in the real world (slow, expensive, dangerous), a world model lets an agent "think ahead" by simulating outcomes in a learned latent space. This is how robots can plan manipulation sequences, how game AI can anticipate physics, and how anomaly detectors can flag impossible events.

Why WorldKit: Existing world model implementations are research code — coupled to specific environments, hard to train, harder to deploy. WorldKit gives you a clean train → predict → plan → deploy pipeline with one hyperparameter.

Key Features

Train in minutes — 13M-param model trains in ~60 seconds on an M4 MacBook
One hyperparameter — SIGReg regularization replaces 6+ collapse-prevention hyperparameters
Plan in latent space — CEM planner "imagines" thousands of futures without rendering pixels
Deploy anywhere — Export to ONNX or TorchScript for edge, mobile, or server
Hub integration — Push and pull trained models from Hugging Face

Install

pip install worldkit

Optional extras:

pip install worldkit[train]    # WandB logging, Hydra configs
pip install worldkit[envs]     # Gymnasium environment wrappers
pip install worldkit[serve]    # FastAPI inference server
pip install worldkit[export]   # ONNX / TorchScript export
pip install worldkit[all]      # Everything

Quickstart

Train a model

from worldkit import WorldModel

model = WorldModel.train(
    data="my_data.h5",   # HDF5 with pixels + actions
    config="base",        # nano | base | large | xl
    epochs=100,
)
model.save("my_model.wk")

Load a pre-trained model

model = WorldModel.from_hub("DilpreetBansi/pusht")

Predict future states

# Given current observation and a sequence of actions,
# roll out the dynamics model in latent space
result = model.predict(current_frame, actions=[action] * 10)
# result.latent_trajectory: (10, 192) predicted latent states
# result.confidence: prediction confidence score

Plan to reach a goal

# Find an action sequence that takes you from current_frame to goal_frame
plan = model.plan(current_frame, goal_frame, max_steps=50)
# plan.actions: optimized action sequence
# plan.cost: final planning cost (lower = closer to goal)

Detect anomalies

# Score whether a video sequence is physically plausible
score = model.plausibility(video_frames)
# 1.0 = expected behavior, 0.0 = physically impossible

Pre-trained Models

Model	Config	Params	Latent Dim	Task	Download
`DilpreetBansi/pusht`	base	13M	192	Push-T manipulation	`WorldModel.from_hub("DilpreetBansi/pusht")`
`DilpreetBansi/pusht-nano`	nano	3.5M	128	Push-T manipulation	`WorldModel.from_hub("DilpreetBansi/pusht-nano")`

Train your own and share it: model.save("my_model.wk") then upload to the Hub.

Model Configurations

All configs share the same API. Pick the one that fits your compute budget.

Config	Params	Latent Dim	Encoder	Predictor Depth	Train Time*
`nano`	~3.5M	128	ViT-Tiny	2 layers	~30s
`base`	~13M	192	ViT-Small	3 layers	~60s
`large`	~54M	384	ViT-Base	4 layers	~8 min
`xl`	~102M	512	ViT-Large	6 layers	~20 min

*On Apple M4 Pro with MPS. GPU times will vary.

Architecture

WorldKit implements a world model using the JEPA (Joint-Embedding Predictive Architecture) pattern — an architecture class proposed by Yann LeCun where prediction happens in latent space rather than pixel space.

JEPA alone is an architecture, not a training method. Many architectures are JEPAs (including Siamese networks from 1993). The critical question is how you prevent representation collapse — how you stop the model from learning a trivial mapping where all inputs produce the same output.

WorldKit uses SIGReg (Sketch Isotropic Gaussian Regularizer), introduced in the LeWorldModel paper, which solves collapse with a single hyperparameter:

L = L_prediction + λ · SIGReg(Z)

where:
  L_prediction = MSE between predicted and actual latent states
  SIGReg(Z)    = KL divergence approximation enforcing Gaussian structure on Z
  λ             = the ONE hyperparameter you tune (default: 1.0)

This replaces the 6+ hyperparameters required by prior methods (VICReg, Barlow Twins, BYOL).

Components

Observation (96x96 RGB)
        │
        ▼
┌───────────────┐
│   ViT Encoder │ ── CLS token pooling ──▶ z ∈ R^192 (latent state)
└───────────────┘
        │
        ▼
┌───────────────────────┐
│ Predictor (AdaLN-Zero)│ ── conditioned on action embeddings
│   Transformer         │ ── causal attention
└───────────────────────┘
        │
        ▼
   z' ∈ R^192 (predicted next state)
        │
        ▼
┌───────────────┐
│  CEM Planner  │ ── samples action candidates
│               │ ── rolls out in latent space (no pixels)
│               │ ── refines toward goal
└───────────────┘
        │
        ▼
   Optimal action sequence

Encoder — Vision Transformer (ViT) compresses 96x96 RGB images into compact latent vectors via CLS token pooling. ~200x more compact than patch-level representations.
Predictor — Transformer with AdaLN-Zero conditioning. Given latent state z and action a, predicts next state z'. Autoregressive for multi-step rollouts.
Planner — Cross-Entropy Method (CEM) that searches for optimal actions by "imagining" outcomes entirely in latent space — no rendering, no physics engine needed.

CLI

# Train
worldkit train --data ./data.h5 --config base --epochs 100

# Serve as REST API
worldkit serve --model ./model.wk --port 8000

# Export for edge deployment
worldkit export --model ./model.wk --format onnx

# Inspect a model
worldkit info --model ./model.wk

# Convert video data to HDF5
worldkit convert --input ./videos/ --output ./data.h5 --fps 10

# Hub operations
worldkit hub download DilpreetBansi/pusht

REST API

worldkit serve --model ./model.wk --port 8000

Endpoint	Method	Description
`/health`	GET	Server status and model info
`/encode`	POST	Encode observation to latent vector
`/predict`	POST	Predict future latent states from actions
`/plan`	POST	Plan optimal action sequence to reach a goal
`/plausibility`	POST	Score physical plausibility of a video

Examples

Example	What it shows
`01_quickstart.py`	Train, predict, plan in 5 lines
`02_train_from_gym.py`	Record a Gymnasium env and train
`03_plan_to_goal.py`	Goal-conditioned CEM planning
`04_anomaly_detection.py`	Detect physically impossible events
`05_export_onnx.py`	Export to ONNX / TorchScript
`06_serve_api.py`	Deploy as a REST API
`07_latent_probing.py`	Visualize what the latent space learns

Project Structure

worldkit/
├── core/           # WorldModel, ViT encoder, predictor, CEM planner, SIGReg loss
├── data/           # HDF5 dataset, env recorder, video converter
├── cli/            # CLI commands (train, serve, export, hub, convert)
├── server/         # FastAPI inference server
├── envs/           # Gymnasium wrappers
├── eval/           # Benchmarks, probing, visualization
├── export/         # ONNX and TorchScript export
└── hub/            # Hugging Face Hub integration

Research & Acknowledgments

WorldKit is an independent open-source project created by Dilpreet Bansi. It is not affiliated with, endorsed by, or sponsored by any of the researchers or institutions listed below.

The concept of learning world models with neural networks was pioneered by:

Recurrent World Models Facilitate Policy Evolution David Ha, Jürgen Schmidhuber (2018) — NIPS 2018 Paper | Code

Ha & Schmidhuber demonstrated that agents can learn entirely inside their own "dreams" — training in a learned simulation of the environment and transferring policies back to reality. Their VAE + MDN-RNN architecture is the foundation that all modern world models build upon.

WorldKit v0.1 implements the architecture and training methodology from:

LeWorldModel: Learning World Models with Joint-Embedding Predictive Architectures Lucas Maes, Quentin Le Lidec, Damien Scieur, Yann LeCun, Randall Balestriero (2026) Paper | Code

LeWM builds on Ha & Schmidhuber's vision but replaces the generative approach (pixel reconstruction) with a JEPA-based approach (latent prediction), and uses SIGReg to solve the collapse problem with a single hyperparameter.

The JEPA architectural pattern was proposed in:

A Path Towards Autonomous Machine Intelligence Yann LeCun (2022) Paper

WorldKit builds on these open-source projects:

PyTorch — Deep learning framework (BSD License)
Hugging Face Hub — Model hosting (Apache 2.0)
Vision Transformer — Dosovitskiy et al., 2020
FastAPI — REST API framework (MIT License)

Citation

If you use WorldKit in your research, please cite both WorldKit and the underlying research:

@software{worldkit,
  title   = {WorldKit: The Open-Source World Model SDK},
  author  = {Bansi, Dilpreet},
  year    = {2026},
  url     = {https://github.com/DilpreetBansi/worldkit},
  license = {MIT}
}

@article{lewm2026,
  title   = {LeWorldModel: Learning World Models with Joint-Embedding Predictive Architectures},
  author  = {Maes, Lucas and Le Lidec, Quentin and Scieur, Damien and LeCun, Yann and Balestriero, Randall},
  year    = {2026},
  url     = {https://le-wm.github.io/}
}

@incollection{ha2018worldmodels,
  title     = {Recurrent World Models Facilitate Policy Evolution},
  author    = {Ha, David and Schmidhuber, J{\"u}rgen},
  booktitle = {Advances in Neural Information Processing Systems 31},
  pages     = {2451--2463},
  year      = {2018},
  url       = {https://worldmodels.github.io}
}

Roadmap

WorldKit v0.1 ships with the LeWM architecture. The goal is to become a unified SDK for all world model architectures — same train/predict/plan API, multiple backends.

Version	Architecture	Type	Status
v0.1	LeWM (JEPA + SIGReg)	Latent prediction	Available now
v0.2	Ha & Schmidhuber (2018) (VAE + MDN-RNN)	Generative	Planned
v0.3	Dreamer V4 (VAE-based)	Generative	Planned
v0.4	TD-MPC2 (task-specific MPC)	Latent prediction	Planned
v0.5	DIAMOND (diffusion-based)	Generative	Planned
v0.6	Custom architecture API	Any	Planned

The vision:

# Today (v0.1) — LeWM is the default and only backend
model = WorldModel.train(data="my_data.h5", config="base")

# Future (v0.2+) — choose your architecture, same API
model = WorldModel.train(data="my_data.h5", arch="lewm", config="base")
model = WorldModel.train(data="my_data.h5", arch="ha2018", config="base")
model = WorldModel.train(data="my_data.h5", arch="dreamer", config="medium")
model = WorldModel.train(data="my_data.h5", arch="td-mpc", config="large")

One API. Any world model. Train on a laptop, deploy anywhere.

Want to help build this? See CONTRIBUTING.md.

Known Limitations

Synthetic data only — current pre-trained models are trained on synthetic Push-T environments. Real-world robotics models coming soon.
No video decoder — WorldKit predicts in latent space. It does not reconstruct pixel observations from latent states (by design — this is a feature of JEPA, not a limitation).
Single-task models — each model is trained on one environment. Multi-task and transfer learning are planned.
CPU/MPS training only tested — CUDA training works but is less extensively tested at this stage.

Disclaimer

This software is provided "as is" without warranty of any kind. WorldKit is an independent open-source project. It is not affiliated with, endorsed by, or connected to Meta, FAIR, NYU, or any other company or research institution.

See NOTICE for complete third-party attribution.

Contributing

Contributions welcome. See CONTRIBUTING.md for guidelines.

License

Built by Dilpreet Bansi

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.0

Mar 30, 2026

This version

0.1.0

Mar 25, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

worldkit-0.1.0.tar.gz (39.4 kB view details)

Uploaded Mar 25, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

worldkit-0.1.0-py3-none-any.whl (34.6 kB view details)

Uploaded Mar 25, 2026 Python 3

File details

Details for the file worldkit-0.1.0.tar.gz.

File metadata

Download URL: worldkit-0.1.0.tar.gz
Upload date: Mar 25, 2026
Size: 39.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for worldkit-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`fd32bc47db06c96981f816d8edd1037092a658b327af32c7d9c0f7cbd74ccd51`
MD5	`845f36c31318e443059c4ffeb821f586`
BLAKE2b-256	`8461509864c6698d685b01756ea69bebb135e007500490f3fe3df0a433f2fb09`

See more details on using hashes here.

File details

Details for the file worldkit-0.1.0-py3-none-any.whl.

File metadata

Download URL: worldkit-0.1.0-py3-none-any.whl
Upload date: Mar 25, 2026
Size: 34.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for worldkit-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2cf6c21c3eed195de6bccfe4409a8422f63dd7d052f7d4a9195b1f2680aec405`
MD5	`3572ed6b68c56bd6bbd7cd3aeec46571`
BLAKE2b-256	`f0300484fedfe8e80749a4255631271f084c479e1d5a4d4712a9dfa15e8a9aac`

See more details on using hashes here.

worldkit 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

WorldKit

What is WorldKit?

Key Features

Install

Quickstart

Train a model

Load a pre-trained model

Predict future states

Plan to reach a goal

Detect anomalies

Pre-trained Models

Model Configurations

Architecture

Components

CLI

REST API

Examples

Project Structure

Research & Acknowledgments

Citation

Roadmap

Known Limitations

Disclaimer

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes