Skip to main content

Photoreal Filament PBR rendering for GPU-resident MuJoCo (MJWarp), zero-copy to PyTorch

Project description

mujofil-warp

Photoreal PBR rendering for GPU-resident MuJoCo (MJWarp), zero-copy to PyTorch.

MJWarp simulates thousands of parallel MuJoCo worlds entirely on the GPU, but its built-in batch renderer is a deliberately low-fidelity single-hit raycaster (flat Lambertian, no PBR / IBL / reflections, and it cannot load GLB environments).

mujofil-warp pairs MJWarp's GPU-resident physics with Google Filament's physically-based renderer (PBR materials, image-based lighting, soft shadows, SSAO) and delivers each rendered frame straight to PyTorch as a CUDA tensor — no CPU round-trip.

Highlights

  • Zero-copy to torch.cuda. Filament renders into GPU memory that CUDA imports directly; observations arrive as torch.cuda tensors with no GPU→CPU→GPU bounce.
  • GPU-resident pipeline. MJWarp steps physics on the GPU; only a tiny transform array crosses to the host. Pixels never leave the GPU.
  • Photoreal. Full PBR metalness/roughness, IBL, soft shadows, SSAO, MSAA, filmic tone mapping — renders complete GLB environments MJWarp/MuJoCo can't.
  • Two backends. An OpenGL single-sync path and a Vulkan shared-device path, selectable at runtime.

Performance (RTX 4060 Laptop, 8 GiB)

All numbers are env-steps/s (= cameras/s), MJWarp GPU physics → torch.cuda.

vs vanilla MuJoCo, same scene, same workload (ours adds PBR + zero-copy):

128px N=512 256px N=512 256px N=1024
mujofil-warp (GL) 10,675 9,949 10,628
vanilla mujoco.Renderer 8,394 4,808 5,021
speedup 1.27× 2.07× 2.12×

We beat vanilla MuJoCo by 1.25–2.12× on equal work — the gap widens at higher resolution because zero-copy avoids the CPU readback that scales with pixels.

Full photoreal warehouse (3 GLB meshes + IBL + 16 spotlights + SSAO — geometry vanilla MuJoCo and MJWarp cannot even load): ~3,200 cam/s at 128px, holding flat from N=64 to N=2048.

GL vs Vulkan backend (full warehouse): the GL single-sync path is 1.3× faster and, critically, its sync cost is constant across N (one flushAndWait), where the Vulkan path's grows linearly with batch size.

vs MJWarp's own raycaster: MJWarp scales to ~42,000 cam/s at N=2048 — but that is flat Lambertian on bare objects (no PBR/IBL, no GLB environments). At small N (≤32) mujofil-warp is faster and photoreal; at large N MJWarp wins raw throughput by trading away all visual fidelity. Different categories: MJWarp is a parallel raycaster, this is a photoreal rasterizer.

Quickstart

import mujoco, mujoco_warp as mjw, warp as wp, torch
from mujofil_warp import WarpRenderer

mjm = mujoco.MjModel.from_xml_path("scene.xml")
M = mjw.put_model(mjm)
d = mjw.make_data(mjm, nworld=32)
host = [mujoco.MjData(mjm) for _ in range(32)]

r = WarpRenderer(width=256, height=256, batch_size=32, preset="high")
r.load_model(mjm)

mjw.step(M, d); wp.synchronize()
gx = d.geom_xpos.numpy(); gm = d.geom_xmat.numpy().reshape(32, mjm.ngeom, 9)
for i, h in enumerate(host):
    h.geom_xpos[:] = gx[i]; h.geom_xmat[:] = gm[i]

obs = r.render_batch(mjm, host, cam_id=0)   # (32, 256, 256, 4) uint8 torch.cuda

See examples/minimal_render.py for a runnable demo.

Quality toggles

Every fidelity feature is an independent toggle so you can reproduce the throughput/fidelity trade-offs in benchmarks/ on your own hardware:

from mujofil_warp import WarpRenderer, make_config

# keyword toggles
r = WarpRenderer(width=256, batch_size=32, ssao=False, shadows=True, msaa=True)

# or a named preset, optionally overriding individual toggles
r = WarpRenderer(width=256, batch_size=32, preset="fast")          # SSAO off, ~2x
r = WarpRenderer(width=256, batch_size=32, preset="high", bloom=True)

# or an explicit config
cfg = make_config(width=256, height=256, batch_size=32, exposure=1.6)
r = WarpRenderer(config=cfg)
Toggle Effect Notes
ssao screen-space ambient occlusion biggest cost — ~2× faster when off
ssao_quality SSAO quality low/medium/high/ultra affects look more than speed
ssao_ssct SSAO cone tracing (contact shadows) small extra cost on top of SSAO
shadows soft shadow maps
msaa / msaa_samples multi-sample AA 2 / 4 / 8
bloom HDR bloom off by default
fxaa fast approximate AA alternative to MSAA
exposure linear exposure before tone mapping
tone_mapping FILMIC vs LINEAR
dithering temporal dithering reduces banding

Presets: high (photoreal, default), medium (high-quality SSAO, no cone tracing), fast (SSAO off, ~2×), ultra (8× MSAA + bloom), raw (no AO/shadows/AA, ~3×).

Backends

Select at runtime with MUJOFIL_WARP_BACKEND:

  • gl (default) — OpenGL single-sync. Renders N worlds into N imported GL textures bracketed by one flushAndWait, then exports via GL↔CUDA interop. Sync cost is constant in N; fastest in the warehouse. Requires an X display (DISPLAY); when none is available it automatically falls back to Vulkan.
  • vulkan — shared Vulkan device + exportable swapchain + CUDA external-memory import. Works fully headless (no X), but the 2-frame in-flight cap makes its sync cost grow with batch size.
# default is gl; force a backend explicitly with the env var:
MUJOFIL_WARP_BACKEND=gl     python examples/minimal_render.py --preset high
MUJOFIL_WARP_BACKEND=vulkan python examples/minimal_render.py --preset high

Installation

pip install mujofil-warp

The wheel is self-contained: Filament and the CUDA runtime are statically baked in, the compiled materials ship inside it, and libc++ is bundled. There is no CUDA toolkit, no Filament, and no mujofil to install — the only hard requirement at runtime is an NVIDIA GPU + driver.

Supported environments

Because the package contains no CUDA device code (only host-side runtime calls), a single wheel is portable across GPUs and driver versions:

Dimension Support
GPU Any NVIDIA GPU (Turing / Ampere / Ada / Hopper / …) — no compute-capability lock-in
Driver / CUDA NVIDIA driver ≥ R525 (CUDA 12.0+). One wheel, all newer drivers
OS Linux x86_64, glibc ≥ 2.34 (Ubuntu 22.04+, Debian 12+, RHEL/Alma/Rocky 9+, Fedora 35+)
Python CPython 3.10 – 3.13

Not yet supported: aarch64 (Jetson/Grace), glibc < 2.34 (Ubuntu 20.04 / RHEL 8), non-NVIDIA GPUs. These need a from-source Filament build (planned).

Headless / display

Both backends are fully headless — no X server, no display, nothing extra to install beyond the NVIDIA driver:

  • GL (default) uses surfaceless EGL, so it renders headless at full speed on a bare GPU server (cloud, cluster, container). This is the recommended path for vision-RL training.
  • Vulkan is also headless (shared device + exportable swapchain).

GL auto-falls back to Vulkan only if the GL module fails to initialize.

Building from source

Most users never need this — pip install mujofil-warp ships prebuilt wheels. Build from source only to hack on the C++ or target an unsupported environment.

Prerequisites (the native modules and Filament are built with Clang + libc++):

Tool Debian/Ubuntu RHEL/Fedora/Alma
Clang + libc++ dev clang libc++-dev libc++abi-dev clang + libc++ (LLVM release)
CUDA toolkit (headers + static cudart) nvidia-cuda-toolkit cuda-cudart-devel-12-x cuda-driver-devel-12-x
EGL / GL dev headers libegl1-mesa-dev libgl1-mesa-dev mesa-libEGL-devel mesa-libGL-devel
Build tools (source-built Filament only) git cmake ninja-build git cmake ninja-build

Then:

git clone https://github.com/tau-intelligence/mujofil-warp
cd mujofil-warp
CC=clang CXX=clang++ pip install .

How Filament is resolved (the GL backend's headless EGL rendering needs a custom EGL-enabled Filament — Google's prebuilt Linux Filament is GLX-only). CMakeLists.txt tries, in order:

  1. FILAMENT_DIR=/path/to/egl-filament if you set it — used as-is (fastest).
  2. Download a prebuilt EGL Filament artifact (seconds). The default path.
  3. Build from source via packaging/build_filament_egl.sh (~20–30 min) if the download is unavailable — this is the step that needs git/cmake/ninja.

So a plain pip install . is one command; supply FILAMENT_DIR to skip the download/build entirely:

CC=clang CXX=clang++ FILAMENT_DIR=/path/to/egl-filament pip install .

The EGL Filament artifact is reproducible from source:

packaging/build_filament_egl.sh ./_filament_egl   # clone + patch + build

Dev rebuilds (no full reinstall)

For iterating on the C++ without a full pip install, the two helper scripts build the modules in place (point FILAMENT_DIR at the EGL Filament build):

bash native/build_gl.sh   # OpenGL single-sync, headless EGL -> _mujofil_warp_gl
bash native/build.sh      # Vulkan zero-copy                  -> _mujofil_warp

Architecture & porting

mujofil-warp is one core with pluggable rendering backends, so new platforms are added as a backend — not a fork.

mujofil_warp/__init__.py     Python API, presets, backend selection   (shared)
native/render_module.cpp     pybind bindings, batching                (shared)
native/vendor/core/          scene / material / light bridge          (shared)
native/renderer_gl.cpp       Linux: surfaceless EGL  + CUDA interop   (backend)
native/renderer_warp.cpp     Linux: Vulkan device    + CUDA interop   (backend)

Everything platform-specific lives behind the vf_mujoco::Renderer interface (context creation, GPU→tensor interop). Adding macOS or Windows means adding one renderer_*.{cpp,mm} implementing that interface — the scene, material, lighting, Python API, and batching layers are reused unchanged.

  • Windows would use a WGL/EGL context + OPAQUE_WIN32 external-memory handles for the CUDA interop.
  • macOS is a different target: there is no CUDA on Apple platforms, so a Mac backend would use Filament's Metal backend and export to PyTorch via MPS (MTLBuffer → torch-MPS) rather than torch.cuda.

These are not yet implemented (they need the respective hardware to develop and validate on), but the codebase is structured so they slot in without a fork.

Layout

mujofil_warp/        Python package (WarpRenderer, make_config, presets)
native/              C++ renderer + pybind module + build scripts
  renderer_gl.cpp      OpenGL single-sync zero-copy backend
  renderer_warp.cpp    Vulkan shared-device zero-copy backend
  render_module.cpp    pybind bindings (shared by both backends)
examples/            runnable demos
benchmarks/          the benchmark suite behind the numbers above
spikes/              isolated feasibility proofs (GL↔CUDA, Vulkan↔CUDA, DLPack)
docs/ARCHITECTURE.md design + phased integration plan

Relationship to mujofil

mujofil-warp reuses the CPU-MuJoCo mujofil renderer's scene/material/light source but is a separate build — the published mujofil package is untouched. Use mujofil for high-fidelity CPU-MuJoCo vector-env rendering; use mujofil-warp when you want MJWarp's GPU-resident physics with photoreal, zero-copy observations.

License

Apache-2.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mujofil_warp-0.1.1.tar.gz (6.4 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

mujofil_warp-0.1.1-cp313-cp313-manylinux_2_34_x86_64.whl (10.6 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.34+ x86-64

mujofil_warp-0.1.1-cp312-cp312-manylinux_2_34_x86_64.whl (10.6 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ x86-64

mujofil_warp-0.1.1-cp311-cp311-manylinux_2_34_x86_64.whl (10.6 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.34+ x86-64

mujofil_warp-0.1.1-cp310-cp310-manylinux_2_34_x86_64.whl (10.6 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.34+ x86-64

File details

Details for the file mujofil_warp-0.1.1.tar.gz.

File metadata

  • Download URL: mujofil_warp-0.1.1.tar.gz
  • Upload date:
  • Size: 6.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mujofil_warp-0.1.1.tar.gz
Algorithm Hash digest
SHA256 dde2ff3d0f40134b7f838dc8430fa14884ea53027a361cde9769074b87d3dfbf
MD5 8be0e76b381304ddd98362df04d9c664
BLAKE2b-256 6b529844c1ce6f494a79dbc08d9db40cf088607cdce0a09041b6381eed4fe850

See more details on using hashes here.

Provenance

The following attestation bundles were made for mujofil_warp-0.1.1.tar.gz:

Publisher: wheels.yml on tau-intelligence/mujofil-warp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mujofil_warp-0.1.1-cp313-cp313-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for mujofil_warp-0.1.1-cp313-cp313-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 0a5da52c72c0bf8e2c32e339fdbce6a0f9ad0bd1e355dcd8bd605b137a0d1e55
MD5 cd80e8cd80b2042033fd503048aa51a9
BLAKE2b-256 c2d95d76b48e1c73bc474f6001e859fdf7d1affeda5d76756da3c5c01bbb3a6d

See more details on using hashes here.

Provenance

The following attestation bundles were made for mujofil_warp-0.1.1-cp313-cp313-manylinux_2_34_x86_64.whl:

Publisher: wheels.yml on tau-intelligence/mujofil-warp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mujofil_warp-0.1.1-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for mujofil_warp-0.1.1-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 655ded70633ff55f04a911154671716a4ffd00c03de4ccfed18058ed8793a86e
MD5 abe0741e82d9fed75d1b8f1518680525
BLAKE2b-256 b1c7a614bbe8f797d2020a29d3ba29f90fe869667d377618aaa606259c9eaa49

See more details on using hashes here.

Provenance

The following attestation bundles were made for mujofil_warp-0.1.1-cp312-cp312-manylinux_2_34_x86_64.whl:

Publisher: wheels.yml on tau-intelligence/mujofil-warp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mujofil_warp-0.1.1-cp311-cp311-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for mujofil_warp-0.1.1-cp311-cp311-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 9e290c47579319b4d05c685f7343d32b3f871157e92cc0ad253ae87145508691
MD5 09c21d0287b12e34cd7324c4743511c0
BLAKE2b-256 426c5d63599f811d78e4a047e3a3e345f5b23c2c0c7289650bf9e42599c3d3f9

See more details on using hashes here.

Provenance

The following attestation bundles were made for mujofil_warp-0.1.1-cp311-cp311-manylinux_2_34_x86_64.whl:

Publisher: wheels.yml on tau-intelligence/mujofil-warp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mujofil_warp-0.1.1-cp310-cp310-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for mujofil_warp-0.1.1-cp310-cp310-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 bf2f08bd8486f3529c267f037b373dd360e54fd95c9754e80deed18cba35adbc
MD5 b0e79a684cf169cfd756708a461692a3
BLAKE2b-256 9649b6c76bac1edc5bed34c88d2bef8d32ca4063f47b6ec6f056c57c79188241

See more details on using hashes here.

Provenance

The following attestation bundles were made for mujofil_warp-0.1.1-cp310-cp310-manylinux_2_34_x86_64.whl:

Publisher: wheels.yml on tau-intelligence/mujofil-warp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page