Skip to main content

Photoreal Filament PBR rendering for GPU-resident MuJoCo (MJWarp), zero-copy to PyTorch

Project description

mujofil-warp

Photoreal PBR rendering for GPU-resident MuJoCo (MJWarp), zero-copy to PyTorch.

MJWarp simulates thousands of parallel MuJoCo worlds entirely on the GPU, but its built-in batch renderer is a deliberately low-fidelity single-hit raycaster (flat Lambertian, no PBR / IBL / reflections, and it cannot load GLB environments).

mujofil-warp pairs MJWarp's GPU-resident physics with Google Filament's physically-based renderer (PBR materials, image-based lighting, soft shadows, SSAO) and delivers each rendered frame straight to PyTorch as a CUDA tensor — no CPU round-trip.

Highlights

  • Zero-copy to torch.cuda. Filament renders into GPU memory that CUDA imports directly; observations arrive as torch.cuda tensors with no GPU→CPU→GPU bounce.
  • GPU-resident pipeline. MJWarp steps physics on the GPU; only a tiny transform array crosses to the host. Pixels never leave the GPU.
  • Photoreal. Full PBR metalness/roughness, IBL, soft shadows, SSAO, MSAA, filmic tone mapping — renders complete GLB environments MJWarp/MuJoCo can't.
  • Two backends. An OpenGL single-sync path and a Vulkan shared-device path, selectable at runtime.

Performance (RTX 4060 Laptop, 8 GiB)

All numbers are env-steps/s (= cameras/s), MJWarp GPU physics → torch.cuda.

vs vanilla MuJoCo, same scene, same workload (ours adds PBR + zero-copy):

128px N=512 256px N=512 256px N=1024
mujofil-warp (GL) 10,675 9,949 10,628
vanilla mujoco.Renderer 8,394 4,808 5,021
speedup 1.27× 2.07× 2.12×

We beat vanilla MuJoCo by 1.25–2.12× on equal work — the gap widens at higher resolution because zero-copy avoids the CPU readback that scales with pixels.

Full photoreal warehouse (3 GLB meshes + IBL + 16 spotlights + SSAO — geometry vanilla MuJoCo and MJWarp cannot even load): ~3,200 cam/s at 128px, holding flat from N=64 to N=2048.

GL vs Vulkan backend (full warehouse): the GL single-sync path is 1.3× faster and, critically, its sync cost is constant across N (one flushAndWait), where the Vulkan path's grows linearly with batch size.

vs MJWarp's own raycaster: MJWarp scales to ~42,000 cam/s at N=2048 — but that is flat Lambertian on bare objects (no PBR/IBL, no GLB environments). At small N (≤32) mujofil-warp is faster and photoreal; at large N MJWarp wins raw throughput by trading away all visual fidelity. Different categories: MJWarp is a parallel raycaster, this is a photoreal rasterizer.

Quickstart

import mujoco, mujoco_warp as mjw, warp as wp, torch
from mujofil_warp import WarpRenderer

mjm = mujoco.MjModel.from_xml_path("scene.xml")
M = mjw.put_model(mjm)
d = mjw.make_data(mjm, nworld=32)
host = [mujoco.MjData(mjm) for _ in range(32)]

r = WarpRenderer(width=256, height=256, batch_size=32, preset="high")
r.load_model(mjm)

mjw.step(M, d); wp.synchronize()
gx = d.geom_xpos.numpy(); gm = d.geom_xmat.numpy().reshape(32, mjm.ngeom, 9)
for i, h in enumerate(host):
    h.geom_xpos[:] = gx[i]; h.geom_xmat[:] = gm[i]

obs = r.render_batch(mjm, host, cam_id=0)   # (32, 256, 256, 4) uint8 torch.cuda

See examples/minimal_render.py for a runnable demo.

Quality toggles

Every fidelity feature is an independent toggle so you can reproduce the throughput/fidelity trade-offs in benchmarks/ on your own hardware:

from mujofil_warp import WarpRenderer, make_config

# keyword toggles
r = WarpRenderer(width=256, batch_size=32, ssao=False, shadows=True, msaa=True)

# or a named preset, optionally overriding individual toggles
r = WarpRenderer(width=256, batch_size=32, preset="fast")          # SSAO off, ~2x
r = WarpRenderer(width=256, batch_size=32, preset="high", bloom=True)

# or an explicit config
cfg = make_config(width=256, height=256, batch_size=32, exposure=1.6)
r = WarpRenderer(config=cfg)
Toggle Effect Notes
ssao screen-space ambient occlusion biggest cost — ~2× faster when off
ssao_quality SSAO quality low/medium/high/ultra affects look more than speed
ssao_ssct SSAO cone tracing (contact shadows) small extra cost on top of SSAO
shadows soft shadow maps
msaa / msaa_samples multi-sample AA 2 / 4 / 8
bloom HDR bloom off by default
fxaa fast approximate AA alternative to MSAA
exposure linear exposure before tone mapping
tone_mapping FILMIC vs LINEAR
dithering temporal dithering reduces banding

Presets: high (photoreal, default), medium (high-quality SSAO, no cone tracing), fast (SSAO off, ~2×), ultra (8× MSAA + bloom), raw (no AO/shadows/AA, ~3×).

Backends

Select at runtime with MUJOFIL_WARP_BACKEND:

  • gl (default) — OpenGL single-sync. Renders N worlds into N imported GL textures bracketed by one flushAndWait, then exports via GL↔CUDA interop. Sync cost is constant in N; fastest in the warehouse. Requires an X display (DISPLAY); when none is available it automatically falls back to Vulkan.
  • vulkan — shared Vulkan device + exportable swapchain + CUDA external-memory import. Works fully headless (no X), but the 2-frame in-flight cap makes its sync cost grow with batch size.
# default is gl; force a backend explicitly with the env var:
MUJOFIL_WARP_BACKEND=gl     python examples/minimal_render.py --preset high
MUJOFIL_WARP_BACKEND=vulkan python examples/minimal_render.py --preset high

Installation

pip install mujofil-warp

The wheel is self-contained: Filament and the CUDA runtime are statically baked in, the compiled materials ship inside it, and libc++ is bundled. There is no CUDA toolkit, no Filament, and no mujofil to install — the only hard requirement at runtime is an NVIDIA GPU + driver.

Supported environments

Because the package contains no CUDA device code (only host-side runtime calls), a single wheel is portable across GPUs and driver versions:

Dimension Support
GPU Any NVIDIA GPU (Turing / Ampere / Ada / Hopper / …) — no compute-capability lock-in
Driver / CUDA NVIDIA driver ≥ R525 (CUDA 12.0+). One wheel, all newer drivers
OS Linux x86_64, glibc ≥ 2.34 (Ubuntu 22.04+, Debian 12+, RHEL/Alma/Rocky 9+, Fedora 35+)
Python CPython 3.10 – 3.13

Not yet supported: aarch64 (Jetson/Grace), glibc < 2.34 (Ubuntu 20.04 / RHEL 8), non-NVIDIA GPUs. These need a from-source Filament build (planned).

Headless / display

Both backends are fully headless — no X server, no display, nothing extra to install beyond the NVIDIA driver:

  • GL (default) uses surfaceless EGL, so it renders headless at full speed on a bare GPU server (cloud, cluster, container). This is the recommended path for vision-RL training.
  • Vulkan is also headless (shared device + exportable swapchain).

GL auto-falls back to Vulkan only if the GL module fails to initialize.

Building from source

Most users never need this — pip install mujofil-warp ships prebuilt wheels. Build from source only to hack on the C++ or target an unsupported environment.

Prerequisites (the native modules and Filament are built with Clang + libc++):

Tool Debian/Ubuntu RHEL/Fedora/Alma
Clang + libc++ dev clang libc++-dev libc++abi-dev clang + libc++ (LLVM release)
CUDA toolkit (headers + static cudart) nvidia-cuda-toolkit cuda-cudart-devel-12-x cuda-driver-devel-12-x
EGL / GL dev headers libegl1-mesa-dev libgl1-mesa-dev mesa-libEGL-devel mesa-libGL-devel
Build tools (source-built Filament only) git cmake ninja-build git cmake ninja-build

Then:

git clone https://github.com/tau-intelligence/mujofil-warp
cd mujofil-warp
CC=clang CXX=clang++ pip install .

How Filament is resolved (the GL backend's headless EGL rendering needs a custom EGL-enabled Filament — Google's prebuilt Linux Filament is GLX-only). CMakeLists.txt tries, in order:

  1. FILAMENT_DIR=/path/to/egl-filament if you set it — used as-is (fastest).
  2. Download a prebuilt EGL Filament artifact (seconds). The default path.
  3. Build from source via packaging/build_filament_egl.sh (~20–30 min) if the download is unavailable — this is the step that needs git/cmake/ninja.

So a plain pip install . is one command; supply FILAMENT_DIR to skip the download/build entirely:

CC=clang CXX=clang++ FILAMENT_DIR=/path/to/egl-filament pip install .

The EGL Filament artifact is reproducible from source:

packaging/build_filament_egl.sh ./_filament_egl   # clone + patch + build

Dev rebuilds (no full reinstall)

For iterating on the C++ without a full pip install, the two helper scripts build the modules in place (point FILAMENT_DIR at the EGL Filament build):

bash native/build_gl.sh   # OpenGL single-sync, headless EGL -> _mujofil_warp_gl
bash native/build.sh      # Vulkan zero-copy                  -> _mujofil_warp

Architecture & porting

mujofil-warp is one core with pluggable rendering backends, so new platforms are added as a backend — not a fork.

mujofil_warp/__init__.py     Python API, presets, backend selection   (shared)
native/render_module.cpp     pybind bindings, batching                (shared)
native/vendor/core/          scene / material / light bridge          (shared)
native/renderer_gl.cpp       Linux: surfaceless EGL  + CUDA interop   (backend)
native/renderer_warp.cpp     Linux: Vulkan device    + CUDA interop   (backend)

Everything platform-specific lives behind the vf_mujoco::Renderer interface (context creation, GPU→tensor interop). Adding macOS or Windows means adding one renderer_*.{cpp,mm} implementing that interface — the scene, material, lighting, Python API, and batching layers are reused unchanged.

  • Windows would use a WGL/EGL context + OPAQUE_WIN32 external-memory handles for the CUDA interop.
  • macOS is a different target: there is no CUDA on Apple platforms, so a Mac backend would use Filament's Metal backend and export to PyTorch via MPS (MTLBuffer → torch-MPS) rather than torch.cuda.

These are not yet implemented (they need the respective hardware to develop and validate on), but the codebase is structured so they slot in without a fork.

Layout

mujofil_warp/        Python package (WarpRenderer, make_config, presets)
native/              C++ renderer + pybind module + build scripts
  renderer_gl.cpp      OpenGL single-sync zero-copy backend
  renderer_warp.cpp    Vulkan shared-device zero-copy backend
  render_module.cpp    pybind bindings (shared by both backends)
examples/            runnable demos
benchmarks/          the benchmark suite behind the numbers above
spikes/              isolated feasibility proofs (GL↔CUDA, Vulkan↔CUDA, DLPack)
docs/ARCHITECTURE.md design + phased integration plan

Relationship to mujofil

mujofil-warp reuses the CPU-MuJoCo mujofil renderer's scene/material/light source but is a separate build — the published mujofil package is untouched. Use mujofil for high-fidelity CPU-MuJoCo vector-env rendering; use mujofil-warp when you want MJWarp's GPU-resident physics with photoreal, zero-copy observations.

License

Apache-2.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mujofil_warp-0.1.2.tar.gz (7.3 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

mujofil_warp-0.1.2-cp313-cp313-manylinux_2_34_x86_64.whl (11.5 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.34+ x86-64

mujofil_warp-0.1.2-cp312-cp312-manylinux_2_34_x86_64.whl (11.5 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ x86-64

mujofil_warp-0.1.2-cp311-cp311-manylinux_2_34_x86_64.whl (11.5 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.34+ x86-64

mujofil_warp-0.1.2-cp310-cp310-manylinux_2_34_x86_64.whl (11.5 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.34+ x86-64

File details

Details for the file mujofil_warp-0.1.2.tar.gz.

File metadata

  • Download URL: mujofil_warp-0.1.2.tar.gz
  • Upload date:
  • Size: 7.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mujofil_warp-0.1.2.tar.gz
Algorithm Hash digest
SHA256 ea8767e780e743b2ab6f3907da34d3a6af241174f7d6f07a0e485afb540bd6d4
MD5 b680a1c91c02bce2c89121025eeafd16
BLAKE2b-256 ce4d3e4ea5ea7a89312300b6a8e2c89543ca5b964cfa0f6d738586ca3de87d48

See more details on using hashes here.

Provenance

The following attestation bundles were made for mujofil_warp-0.1.2.tar.gz:

Publisher: wheels.yml on tau-intelligence/mujofil-warp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mujofil_warp-0.1.2-cp313-cp313-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for mujofil_warp-0.1.2-cp313-cp313-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 840d0db886b918995bbb11a5032d63ca92437ef4908d0e89972a85895b1a87e4
MD5 2262eebe5dc05451a5d77de910cfc495
BLAKE2b-256 26cf489d572e5b689ed0cb9341a865a3516a35cc790dc0551db566264946e197

See more details on using hashes here.

Provenance

The following attestation bundles were made for mujofil_warp-0.1.2-cp313-cp313-manylinux_2_34_x86_64.whl:

Publisher: wheels.yml on tau-intelligence/mujofil-warp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mujofil_warp-0.1.2-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for mujofil_warp-0.1.2-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 cbf3c06f8abe34dcd675f7ce521d7d2661aa817097015047eeee1539a2e57ad3
MD5 40fb2101e167a3d5bb56c9c69eca6ac7
BLAKE2b-256 8ff93979c166f47dce87eaf3fb5e6bf30947325428c3f3227fd601f55800265a

See more details on using hashes here.

Provenance

The following attestation bundles were made for mujofil_warp-0.1.2-cp312-cp312-manylinux_2_34_x86_64.whl:

Publisher: wheels.yml on tau-intelligence/mujofil-warp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mujofil_warp-0.1.2-cp311-cp311-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for mujofil_warp-0.1.2-cp311-cp311-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 fadd405c285c3c88ba4e96e68a63e4fae459fe4b12afc5ae18497875dd60a499
MD5 45b808aff65077aa17a78d5277895be9
BLAKE2b-256 78137f85cc7ac04f1280124898110dfeb3f5bb5e3502aa8fff98d896cda4e459

See more details on using hashes here.

Provenance

The following attestation bundles were made for mujofil_warp-0.1.2-cp311-cp311-manylinux_2_34_x86_64.whl:

Publisher: wheels.yml on tau-intelligence/mujofil-warp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mujofil_warp-0.1.2-cp310-cp310-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for mujofil_warp-0.1.2-cp310-cp310-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 613debbab12d2d714e24e91edc4f4e5ec9df5b21c57d4b6212d5d3cce930e8c2
MD5 8f240bda3f076bb688b659427cb89ad8
BLAKE2b-256 42e3f4a77884e493810e7605f404b522da52f7b0dfdd61d606d292cd867eccda

See more details on using hashes here.

Provenance

The following attestation bundles were made for mujofil_warp-0.1.2-cp310-cp310-manylinux_2_34_x86_64.whl:

Publisher: wheels.yml on tau-intelligence/mujofil-warp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page