Skip to main content

Photoreal Filament PBR rendering for GPU-resident MuJoCo (MJWarp), zero-copy to PyTorch

Project description

mujofil-warp

Photoreal PBR rendering for GPU-resident MuJoCo (MJWarp), zero-copy to PyTorch.

MJWarp simulates thousands of parallel MuJoCo worlds entirely on the GPU, but its built-in batch renderer is a deliberately low-fidelity single-hit raycaster (flat Lambertian, no PBR / IBL / reflections, and it cannot load GLB environments).

mujofil-warp pairs MJWarp's GPU-resident physics with Google Filament's physically-based renderer (PBR materials, image-based lighting, soft shadows, SSAO) and delivers each rendered frame straight to PyTorch as a CUDA tensor — no CPU round-trip.

📖 Full documentation: docs/getting started, API guide, feature reference, cookbook & troubleshooting.

🖥️ Running CPU MuJoCo instead? Use the CPU edition, mujofil (photoreal frames as NumPy arrays).

Highlights

  • Zero-copy to torch.cuda. Filament renders into GPU memory that CUDA imports directly; observations arrive as torch.cuda tensors with no GPU→CPU→GPU bounce.
  • GPU-resident pipeline. MJWarp steps physics on the GPU; only a tiny transform array crosses to the host. Pixels never leave the GPU.
  • Photoreal. Full PBR metalness/roughness, IBL, soft shadows, SSAO, MSAA, filmic tone mapping — renders complete GLB environments MJWarp/MuJoCo can't.
  • Two backends. An OpenGL single-sync path and a Vulkan shared-device path, selectable at runtime.

Performance (RTX 4060 Laptop, 8 GiB)

All numbers are env-steps/s (= cameras/s), MJWarp GPU physics → torch.cuda.

vs vanilla MuJoCo, same scene, same workload (ours adds PBR + zero-copy):

128px N=512 256px N=512 256px N=1024
mujofil-warp (GL) 10,675 9,949 10,628
vanilla mujoco.Renderer 8,394 4,808 5,021
speedup 1.27× 2.07× 2.12×

We beat vanilla MuJoCo by 1.25–2.12× on equal work — the gap widens at higher resolution because zero-copy avoids the CPU readback that scales with pixels.

Full photoreal warehouse (3 GLB meshes + IBL + 16 spotlights + SSAO — geometry vanilla MuJoCo and MJWarp cannot even load): ~3,200 cam/s at 128px, holding flat from N=64 to N=2048.

GL vs Vulkan backend (full warehouse): the GL single-sync path is 1.3× faster and, critically, its sync cost is constant across N (one flushAndWait), where the Vulkan path's grows linearly with batch size.

vs MJWarp's own raycaster: MJWarp scales to ~42,000 cam/s at N=2048 — but that is flat Lambertian on bare objects (no PBR/IBL, no GLB environments). At small N (≤32) mujofil-warp is faster and photoreal; at large N MJWarp wins raw throughput by trading away all visual fidelity. Different categories: MJWarp is a parallel raycaster, this is a photoreal rasterizer.

Quickstart

import mujoco, mujoco_warp as mjw, warp as wp, torch
from mujofil_warp import WarpRenderer

mjm = mujoco.MjModel.from_xml_path("scene.xml")
M = mjw.put_model(mjm)
d = mjw.make_data(mjm, nworld=32)
host = [mujoco.MjData(mjm) for _ in range(32)]

r = WarpRenderer(width=256, height=256, batch_size=32, preset="high")
r.load_model(mjm)

mjw.step(M, d); wp.synchronize()
gx = d.geom_xpos.numpy(); gm = d.geom_xmat.numpy().reshape(32, mjm.ngeom, 9)
for i, h in enumerate(host):
    h.geom_xpos[:] = gx[i]; h.geom_xmat[:] = gm[i]

obs = r.render_batch(mjm, host, cam_id=0)   # (32, 256, 256, 4) uint8 torch.cuda

See examples/minimal_render.py for a runnable demo.

Quality toggles

Every fidelity feature is an independent toggle so you can reproduce the throughput/fidelity trade-offs in benchmarks/ on your own hardware:

from mujofil_warp import WarpRenderer, make_config

# keyword toggles
r = WarpRenderer(width=256, batch_size=32, ssao=False, shadows=True, msaa=True)

# or a named preset, optionally overriding individual toggles
r = WarpRenderer(width=256, batch_size=32, preset="fast")          # SSAO off, ~2x
r = WarpRenderer(width=256, batch_size=32, preset="high", bloom=True)

# or an explicit config
cfg = make_config(width=256, height=256, batch_size=32, exposure=1.6)
r = WarpRenderer(config=cfg)
Toggle Effect Notes
ssao screen-space ambient occlusion biggest cost — ~2× faster when off
ssao_quality SSAO quality low/medium/high/ultra affects look more than speed
ssao_ssct SSAO cone tracing (contact shadows) small extra cost on top of SSAO
shadows soft shadow maps
msaa / msaa_samples multi-sample AA 2 / 4 / 8
bloom HDR bloom off by default
fxaa fast approximate AA alternative to MSAA
exposure linear exposure before tone mapping
tone_mapping FILMIC vs LINEAR
dithering temporal dithering reduces banding

Presets: high (photoreal, default), medium (high-quality SSAO, no cone tracing), fast (SSAO off, ~2×), ultra (8× MSAA + bloom), raw (no AO/shadows/AA, ~3×).

Backends

Select at runtime with MUJOFIL_WARP_BACKEND:

  • gl (default) — OpenGL single-sync. Renders N worlds into N imported GL textures bracketed by one flushAndWait, then exports via GL↔CUDA interop. Sync cost is constant in N; fastest in the warehouse. Requires an X display (DISPLAY); when none is available it automatically falls back to Vulkan.
  • vulkan — shared Vulkan device + exportable swapchain + CUDA external-memory import. Works fully headless (no X), but the 2-frame in-flight cap makes its sync cost grow with batch size.
# default is gl; force a backend explicitly with the env var:
MUJOFIL_WARP_BACKEND=gl     python examples/minimal_render.py --preset high
MUJOFIL_WARP_BACKEND=vulkan python examples/minimal_render.py --preset high

Installation

pip install mujofil-warp

The wheel is self-contained: Filament and the CUDA runtime are statically baked in, the compiled materials ship inside it, and libc++ is bundled. There is no CUDA toolkit, no Filament, and no mujofil to install — the only hard requirement at runtime is an NVIDIA GPU + driver.

Supported environments

Because the package contains no CUDA device code (only host-side runtime calls), a single wheel is portable across GPUs and driver versions:

Dimension Support
GPU Any NVIDIA GPU (Turing / Ampere / Ada / Hopper / …) — no compute-capability lock-in
Driver / CUDA NVIDIA driver ≥ R525 (CUDA 12.0+). One wheel, all newer drivers
OS Linux x86_64, glibc ≥ 2.34 (Ubuntu 22.04+, Debian 12+, RHEL/Alma/Rocky 9+, Fedora 35+)
Python CPython 3.10 – 3.13

Not yet supported: aarch64 (Jetson/Grace), glibc < 2.34 (Ubuntu 20.04 / RHEL 8), non-NVIDIA GPUs. These need a from-source Filament build (planned).

PyTorch (zero-copy target)

torch is an optional dependency (pip install "mujofil-warp[torch]"), and you must install a build that matches your GPU's compute capability — the zero-copy DLPack handoff runs CUDA kernels through your torch, not ours.

  • Blackwell (RTX 50-series / sm_120, e.g. 5090): install the CUDA 12.8 torch — pip install torch --index-url https://download.pytorch.org/whl/cu128. A torch+cu124 (or older) build has no sm_120 kernels and fails at runtime with CUDA error: no kernel image is available for execution on the device.
  • Ada / Hopper / Ampere (sm_80–sm_90): the default cu124 torch is fine.

warp-lang and mujoco-warp JIT-compile for the local GPU, so they need no such pinning — only torch ships prebuilt device code.

Headless / display

Both backends are fully headless — no X server, no display, nothing extra to install beyond the NVIDIA driver:

  • GL (default) uses surfaceless EGL, so it renders headless at full speed on a bare GPU server (cloud, cluster, container). This is the recommended path for vision-RL training.
  • Vulkan is also headless (shared device + exportable swapchain).

GL auto-falls back to Vulkan only if the GL module fails to initialize.

Building from source

Most users never need this — pip install mujofil-warp ships prebuilt wheels. Build from source only to hack on the C++ or target an unsupported environment.

Prerequisites (the native modules and Filament are built with Clang + libc++):

Tool Debian/Ubuntu RHEL/Fedora/Alma
Clang + libc++ dev clang libc++-dev libc++abi-dev clang + libc++ (LLVM release)
CUDA toolkit (headers + static cudart) nvidia-cuda-toolkit cuda-cudart-devel-12-x cuda-driver-devel-12-x
EGL / GL dev headers libegl1-mesa-dev libgl1-mesa-dev mesa-libEGL-devel mesa-libGL-devel
Build tools (source-built Filament only) git cmake ninja-build git cmake ninja-build

Then:

git clone https://github.com/tau-intelligence/mujofil-warp
cd mujofil-warp
CC=clang CXX=clang++ pip install .

How Filament is resolved (the GL backend's headless EGL rendering needs a custom EGL-enabled Filament — Google's prebuilt Linux Filament is GLX-only). CMakeLists.txt tries, in order:

  1. FILAMENT_DIR=/path/to/egl-filament if you set it — used as-is (fastest).
  2. Download a prebuilt EGL Filament artifact (seconds). The default path.
  3. Build from source via packaging/build_filament_egl.sh (~20–30 min) if the download is unavailable — this is the step that needs git/cmake/ninja.

So a plain pip install . is one command; supply FILAMENT_DIR to skip the download/build entirely:

CC=clang CXX=clang++ FILAMENT_DIR=/path/to/egl-filament pip install .

The EGL Filament artifact is reproducible from source:

packaging/build_filament_egl.sh ./_filament_egl   # clone + patch + build

Dev rebuilds (no full reinstall)

For iterating on the C++ without a full pip install, the two helper scripts build the modules in place (point FILAMENT_DIR at the EGL Filament build):

bash native/build_gl.sh   # OpenGL single-sync, headless EGL -> _mujofil_warp_gl
bash native/build.sh      # Vulkan zero-copy                  -> _mujofil_warp

Architecture & porting

mujofil-warp is one core with pluggable rendering backends, so new platforms are added as a backend — not a fork.

mujofil_warp/__init__.py     Python API, presets, backend selection   (shared)
native/render_module.cpp     pybind bindings, batching                (shared)
native/vendor/core/          scene / material / light bridge          (shared)
native/renderer_gl.cpp       Linux: surfaceless EGL  + CUDA interop   (backend)
native/renderer_warp.cpp     Linux: Vulkan device    + CUDA interop   (backend)

Everything platform-specific lives behind the vf_mujoco::Renderer interface (context creation, GPU→tensor interop). Adding macOS or Windows means adding one renderer_*.{cpp,mm} implementing that interface — the scene, material, lighting, Python API, and batching layers are reused unchanged.

  • Windows would use a WGL/EGL context + OPAQUE_WIN32 external-memory handles for the CUDA interop.
  • macOS is a different target: there is no CUDA on Apple platforms, so a Mac backend would use Filament's Metal backend and export to PyTorch via MPS (MTLBuffer → torch-MPS) rather than torch.cuda.

These are not yet implemented (they need the respective hardware to develop and validate on), but the codebase is structured so they slot in without a fork.

Layout

mujofil_warp/        Python package (WarpRenderer, make_config, presets)
native/              C++ renderer + pybind module + build scripts
  renderer_gl.cpp      OpenGL single-sync zero-copy backend
  renderer_warp.cpp    Vulkan shared-device zero-copy backend
  render_module.cpp    pybind bindings (shared by both backends)
examples/            runnable demos
benchmarks/          the benchmark suite behind the numbers above
spikes/              isolated feasibility proofs (GL↔CUDA, Vulkan↔CUDA, DLPack)
docs/ARCHITECTURE.md design + phased integration plan

Relationship to mujofil

mujofil-warp reuses the CPU-MuJoCo mujofil renderer's scene/material/light source but is a separate build — the published mujofil package is untouched. Use mujofil for high-fidelity CPU-MuJoCo vector-env rendering; use mujofil-warp when you want MJWarp's GPU-resident physics with photoreal, zero-copy observations.

License

Apache-2.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mujofil_warp-0.1.4.tar.gz (7.3 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

mujofil_warp-0.1.4-cp313-cp313-manylinux_2_34_x86_64.whl (11.6 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.34+ x86-64

mujofil_warp-0.1.4-cp312-cp312-manylinux_2_34_x86_64.whl (11.6 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ x86-64

mujofil_warp-0.1.4-cp311-cp311-manylinux_2_34_x86_64.whl (11.6 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.34+ x86-64

mujofil_warp-0.1.4-cp310-cp310-manylinux_2_34_x86_64.whl (11.6 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.34+ x86-64

File details

Details for the file mujofil_warp-0.1.4.tar.gz.

File metadata

  • Download URL: mujofil_warp-0.1.4.tar.gz
  • Upload date:
  • Size: 7.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mujofil_warp-0.1.4.tar.gz
Algorithm Hash digest
SHA256 a8bc499ed2f95f74a37e00b38a94f19a07067f1a474fd07392ab7e03da91848f
MD5 837b70cf2f08527f1429a2479abeec16
BLAKE2b-256 87b90aba9c0c2ea201bfeeaa01172ce814046d18ea9b05ccc1bb4831bedaaa8a

See more details on using hashes here.

Provenance

The following attestation bundles were made for mujofil_warp-0.1.4.tar.gz:

Publisher: wheels.yml on tau-intelligence/mujofil-warp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mujofil_warp-0.1.4-cp313-cp313-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for mujofil_warp-0.1.4-cp313-cp313-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 d4a64732d9419d37ced6b4716e9c7ea99995808b030d3a9bfea7c65a50fa2570
MD5 20410ecfad8fb4ad06c5d0ff06caeec8
BLAKE2b-256 39bd2d4110b93522a73cd40dd3b06aaf4a0fec7745c523e48891b3a80590eff3

See more details on using hashes here.

Provenance

The following attestation bundles were made for mujofil_warp-0.1.4-cp313-cp313-manylinux_2_34_x86_64.whl:

Publisher: wheels.yml on tau-intelligence/mujofil-warp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mujofil_warp-0.1.4-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for mujofil_warp-0.1.4-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 cb90ba995b239627965d5f3e3c9ed0124cb5172d90f65af944436903f9087a48
MD5 a8b66d6e0b98cf5ee5823eaccf55556a
BLAKE2b-256 d525fe1b647543d7791670d690e2e05c65c42d18f1418778c7d461848bc4fed4

See more details on using hashes here.

Provenance

The following attestation bundles were made for mujofil_warp-0.1.4-cp312-cp312-manylinux_2_34_x86_64.whl:

Publisher: wheels.yml on tau-intelligence/mujofil-warp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mujofil_warp-0.1.4-cp311-cp311-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for mujofil_warp-0.1.4-cp311-cp311-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 b1d92050328db1049e1c24c3b0586b174c0bb5ac316a0af45e9a28b39ce1d3e2
MD5 a94b910630c15f63c3a0cf493ff12ac1
BLAKE2b-256 ee56ea347c73281e1214525b4e121cc47e6e9e329dffc9301d836ae394e79ccc

See more details on using hashes here.

Provenance

The following attestation bundles were made for mujofil_warp-0.1.4-cp311-cp311-manylinux_2_34_x86_64.whl:

Publisher: wheels.yml on tau-intelligence/mujofil-warp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mujofil_warp-0.1.4-cp310-cp310-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for mujofil_warp-0.1.4-cp310-cp310-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 bdf450bcaad15471fee1008b91eabd1ff8c76cf22ea5f0afd7ed7790c09a080e
MD5 b2fa9a7e3276b02467b60a18e4777754
BLAKE2b-256 d5706ddc70f419f5ffd22c745e4519d811629a58da5c35ad81e2442a53fe818f

See more details on using hashes here.

Provenance

The following attestation bundles were made for mujofil_warp-0.1.4-cp310-cp310-manylinux_2_34_x86_64.whl:

Publisher: wheels.yml on tau-intelligence/mujofil-warp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page