Skip to main content

Photoreal Filament PBR rendering for GPU-resident MuJoCo (MJWarp), zero-copy to PyTorch

Project description

mujofil-warp

Photoreal PBR rendering for GPU-resident MuJoCo (MJWarp), zero-copy to PyTorch.

MJWarp simulates thousands of parallel MuJoCo worlds entirely on the GPU, but its built-in batch renderer is a deliberately low-fidelity single-hit raycaster (flat Lambertian, no PBR / IBL / reflections, and it cannot load GLB environments).

mujofil-warp pairs MJWarp's GPU-resident physics with Google Filament's physically-based renderer (PBR materials, image-based lighting, soft shadows, SSAO) and delivers each rendered frame straight to PyTorch as a CUDA tensor — no CPU round-trip.

Highlights

  • Zero-copy to torch.cuda. Filament renders into GPU memory that CUDA imports directly; observations arrive as torch.cuda tensors with no GPU→CPU→GPU bounce.
  • GPU-resident pipeline. MJWarp steps physics on the GPU; only a tiny transform array crosses to the host. Pixels never leave the GPU.
  • Photoreal. Full PBR metalness/roughness, IBL, soft shadows, SSAO, MSAA, filmic tone mapping — renders complete GLB environments MJWarp/MuJoCo can't.
  • Two backends. An OpenGL single-sync path and a Vulkan shared-device path, selectable at runtime.

Performance (RTX 4060 Laptop, 8 GiB)

All numbers are env-steps/s (= cameras/s), MJWarp GPU physics → torch.cuda.

vs vanilla MuJoCo, same scene, same workload (ours adds PBR + zero-copy):

128px N=512 256px N=512 256px N=1024
mujofil-warp (GL) 10,675 9,949 10,628
vanilla mujoco.Renderer 8,394 4,808 5,021
speedup 1.27× 2.07× 2.12×

We beat vanilla MuJoCo by 1.25–2.12× on equal work — the gap widens at higher resolution because zero-copy avoids the CPU readback that scales with pixels.

Full photoreal warehouse (3 GLB meshes + IBL + 16 spotlights + SSAO — geometry vanilla MuJoCo and MJWarp cannot even load): ~3,200 cam/s at 128px, holding flat from N=64 to N=2048.

GL vs Vulkan backend (full warehouse): the GL single-sync path is 1.3× faster and, critically, its sync cost is constant across N (one flushAndWait), where the Vulkan path's grows linearly with batch size.

vs MJWarp's own raycaster: MJWarp scales to ~42,000 cam/s at N=2048 — but that is flat Lambertian on bare objects (no PBR/IBL, no GLB environments). At small N (≤32) mujofil-warp is faster and photoreal; at large N MJWarp wins raw throughput by trading away all visual fidelity. Different categories: MJWarp is a parallel raycaster, this is a photoreal rasterizer.

Quickstart

import mujoco, mujoco_warp as mjw, warp as wp, torch
from mujofil_warp import WarpRenderer

mjm = mujoco.MjModel.from_xml_path("scene.xml")
M = mjw.put_model(mjm)
d = mjw.make_data(mjm, nworld=32)
host = [mujoco.MjData(mjm) for _ in range(32)]

r = WarpRenderer(width=256, height=256, batch_size=32, preset="high")
r.load_model(mjm)

mjw.step(M, d); wp.synchronize()
gx = d.geom_xpos.numpy(); gm = d.geom_xmat.numpy().reshape(32, mjm.ngeom, 9)
for i, h in enumerate(host):
    h.geom_xpos[:] = gx[i]; h.geom_xmat[:] = gm[i]

obs = r.render_batch(mjm, host, cam_id=0)   # (32, 256, 256, 4) uint8 torch.cuda

See examples/minimal_render.py for a runnable demo.

Quality toggles

Every fidelity feature is an independent toggle so you can reproduce the throughput/fidelity trade-offs in benchmarks/ on your own hardware:

from mujofil_warp import WarpRenderer, make_config

# keyword toggles
r = WarpRenderer(width=256, batch_size=32, ssao=False, shadows=True, msaa=True)

# or a named preset, optionally overriding individual toggles
r = WarpRenderer(width=256, batch_size=32, preset="fast")          # SSAO off, ~2x
r = WarpRenderer(width=256, batch_size=32, preset="high", bloom=True)

# or an explicit config
cfg = make_config(width=256, height=256, batch_size=32, exposure=1.6)
r = WarpRenderer(config=cfg)
Toggle Effect Notes
ssao screen-space ambient occlusion biggest cost — ~2× faster when off
ssao_quality SSAO quality low/medium/high/ultra affects look more than speed
ssao_ssct SSAO cone tracing (contact shadows) small extra cost on top of SSAO
shadows soft shadow maps
msaa / msaa_samples multi-sample AA 2 / 4 / 8
bloom HDR bloom off by default
fxaa fast approximate AA alternative to MSAA
exposure linear exposure before tone mapping
tone_mapping FILMIC vs LINEAR
dithering temporal dithering reduces banding

Presets: high (photoreal, default), medium (high-quality SSAO, no cone tracing), fast (SSAO off, ~2×), ultra (8× MSAA + bloom), raw (no AO/shadows/AA, ~3×).

Backends

Select at runtime with MUJOFIL_WARP_BACKEND:

  • gl (default) — OpenGL single-sync. Renders N worlds into N imported GL textures bracketed by one flushAndWait, then exports via GL↔CUDA interop. Sync cost is constant in N; fastest in the warehouse. Requires an X display (DISPLAY); when none is available it automatically falls back to Vulkan.
  • vulkan — shared Vulkan device + exportable swapchain + CUDA external-memory import. Works fully headless (no X), but the 2-frame in-flight cap makes its sync cost grow with batch size.
# default is gl; force a backend explicitly with the env var:
MUJOFIL_WARP_BACKEND=gl     python examples/minimal_render.py --preset high
MUJOFIL_WARP_BACKEND=vulkan python examples/minimal_render.py --preset high

Installation

pip install mujofil-warp

The wheel is self-contained: Filament and the CUDA runtime are statically baked in, the compiled materials ship inside it, and libc++ is bundled. There is no CUDA toolkit, no Filament, and no mujofil to install — the only hard requirement at runtime is an NVIDIA GPU + driver.

Supported environments

Because the package contains no CUDA device code (only host-side runtime calls), a single wheel is portable across GPUs and driver versions:

Dimension Support
GPU Any NVIDIA GPU (Turing / Ampere / Ada / Hopper / …) — no compute-capability lock-in
Driver / CUDA NVIDIA driver ≥ R525 (CUDA 12.0+). One wheel, all newer drivers
OS Linux x86_64, glibc ≥ 2.34 (Ubuntu 22.04+, Debian 12+, RHEL/Alma/Rocky 9+, Fedora 35+)
Python CPython 3.10 – 3.13

Not yet supported: aarch64 (Jetson/Grace), glibc < 2.34 (Ubuntu 20.04 / RHEL 8), non-NVIDIA GPUs. These need a from-source Filament build (planned).

Headless / display

Both backends are fully headless — no X server, no display, nothing extra to install beyond the NVIDIA driver:

  • GL (default) uses surfaceless EGL, so it renders headless at full speed on a bare GPU server (cloud, cluster, container). This is the recommended path for vision-RL training.
  • Vulkan is also headless (shared device + exportable swapchain).

GL auto-falls back to Vulkan only if the GL module fails to initialize.

Building from source

Most users never need this — pip install mujofil-warp ships prebuilt wheels. Build from source only to hack on the C++ or target an unsupported environment.

Prerequisites (the native modules and Filament are built with Clang + libc++):

Tool Debian/Ubuntu RHEL/Fedora/Alma
Clang + libc++ dev clang libc++-dev libc++abi-dev clang + libc++ (LLVM release)
CUDA toolkit (headers + static cudart) nvidia-cuda-toolkit cuda-cudart-devel-12-x cuda-driver-devel-12-x
EGL / GL dev headers libegl1-mesa-dev libgl1-mesa-dev mesa-libEGL-devel mesa-libGL-devel
Build tools (source-built Filament only) git cmake ninja-build git cmake ninja-build

Then:

git clone https://github.com/tau-intelligence/mujofil-warp
cd mujofil-warp
CC=clang CXX=clang++ pip install .

How Filament is resolved (the GL backend's headless EGL rendering needs a custom EGL-enabled Filament — Google's prebuilt Linux Filament is GLX-only). CMakeLists.txt tries, in order:

  1. FILAMENT_DIR=/path/to/egl-filament if you set it — used as-is (fastest).
  2. Download a prebuilt EGL Filament artifact (seconds). The default path.
  3. Build from source via packaging/build_filament_egl.sh (~20–30 min) if the download is unavailable — this is the step that needs git/cmake/ninja.

So a plain pip install . is one command; supply FILAMENT_DIR to skip the download/build entirely:

CC=clang CXX=clang++ FILAMENT_DIR=/path/to/egl-filament pip install .

The EGL Filament artifact is reproducible from source:

packaging/build_filament_egl.sh ./_filament_egl   # clone + patch + build

Dev rebuilds (no full reinstall)

For iterating on the C++ without a full pip install, the two helper scripts build the modules in place (point FILAMENT_DIR at the EGL Filament build):

bash native/build_gl.sh   # OpenGL single-sync, headless EGL -> _mujofil_warp_gl
bash native/build.sh      # Vulkan zero-copy                  -> _mujofil_warp

Architecture & porting

mujofil-warp is one core with pluggable rendering backends, so new platforms are added as a backend — not a fork.

mujofil_warp/__init__.py     Python API, presets, backend selection   (shared)
native/render_module.cpp     pybind bindings, batching                (shared)
native/vendor/core/          scene / material / light bridge          (shared)
native/renderer_gl.cpp       Linux: surfaceless EGL  + CUDA interop   (backend)
native/renderer_warp.cpp     Linux: Vulkan device    + CUDA interop   (backend)

Everything platform-specific lives behind the vf_mujoco::Renderer interface (context creation, GPU→tensor interop). Adding macOS or Windows means adding one renderer_*.{cpp,mm} implementing that interface — the scene, material, lighting, Python API, and batching layers are reused unchanged.

  • Windows would use a WGL/EGL context + OPAQUE_WIN32 external-memory handles for the CUDA interop.
  • macOS is a different target: there is no CUDA on Apple platforms, so a Mac backend would use Filament's Metal backend and export to PyTorch via MPS (MTLBuffer → torch-MPS) rather than torch.cuda.

These are not yet implemented (they need the respective hardware to develop and validate on), but the codebase is structured so they slot in without a fork.

Layout

mujofil_warp/        Python package (WarpRenderer, make_config, presets)
native/              C++ renderer + pybind module + build scripts
  renderer_gl.cpp      OpenGL single-sync zero-copy backend
  renderer_warp.cpp    Vulkan shared-device zero-copy backend
  render_module.cpp    pybind bindings (shared by both backends)
examples/            runnable demos
benchmarks/          the benchmark suite behind the numbers above
spikes/              isolated feasibility proofs (GL↔CUDA, Vulkan↔CUDA, DLPack)
docs/ARCHITECTURE.md design + phased integration plan

Relationship to mujofil

mujofil-warp reuses the CPU-MuJoCo mujofil renderer's scene/material/light source but is a separate build — the published mujofil package is untouched. Use mujofil for high-fidelity CPU-MuJoCo vector-env rendering; use mujofil-warp when you want MJWarp's GPU-resident physics with photoreal, zero-copy observations.

License

Apache-2.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mujofil_warp-0.1.0.tar.gz (6.4 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

mujofil_warp-0.1.0-cp313-cp313-manylinux_2_34_x86_64.whl (10.6 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.34+ x86-64

mujofil_warp-0.1.0-cp312-cp312-manylinux_2_34_x86_64.whl (10.6 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ x86-64

mujofil_warp-0.1.0-cp311-cp311-manylinux_2_34_x86_64.whl (10.6 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.34+ x86-64

mujofil_warp-0.1.0-cp310-cp310-manylinux_2_34_x86_64.whl (10.6 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.34+ x86-64

File details

Details for the file mujofil_warp-0.1.0.tar.gz.

File metadata

  • Download URL: mujofil_warp-0.1.0.tar.gz
  • Upload date:
  • Size: 6.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mujofil_warp-0.1.0.tar.gz
Algorithm Hash digest
SHA256 d7a2bc7b6c991177c14f2316425785aea64dac490d565e887996c54163291970
MD5 6f024f1e158651dc970cde9f3501515a
BLAKE2b-256 a13d0cfc9b04dd3ff5e6c3b7eda66045cb558fe9b550e72a6a7b91658858c790

See more details on using hashes here.

Provenance

The following attestation bundles were made for mujofil_warp-0.1.0.tar.gz:

Publisher: wheels.yml on tau-intelligence/mujofil-warp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mujofil_warp-0.1.0-cp313-cp313-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for mujofil_warp-0.1.0-cp313-cp313-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 d32f54e6412171fd7ffd47949c44cdf25a082455375c51a9a1d2955fb53e9bb6
MD5 791407e6db548c34df010d0e8bc5765f
BLAKE2b-256 d70a0fca4ad2df1e728c5a0cd17a025a74ec3636fdd4ccb24db38e491a08caaf

See more details on using hashes here.

Provenance

The following attestation bundles were made for mujofil_warp-0.1.0-cp313-cp313-manylinux_2_34_x86_64.whl:

Publisher: wheels.yml on tau-intelligence/mujofil-warp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mujofil_warp-0.1.0-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for mujofil_warp-0.1.0-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 fc5b1a164bf7fd9135653ee55e92b3427769092c2620a8f130b7a9101c549faf
MD5 acbdb5697c996361601d42f5381940eb
BLAKE2b-256 4ab24a430cf2a3e37ae7d95345b9467cc43705128f24738cce83376a30ecff66

See more details on using hashes here.

Provenance

The following attestation bundles were made for mujofil_warp-0.1.0-cp312-cp312-manylinux_2_34_x86_64.whl:

Publisher: wheels.yml on tau-intelligence/mujofil-warp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mujofil_warp-0.1.0-cp311-cp311-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for mujofil_warp-0.1.0-cp311-cp311-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 a16c20a56a0e9bb4452f4a7d1e83eb40fab72e294b3519815b0f8d8f62c4e0f9
MD5 b31fb2e6bd179a023a1ea1440795953b
BLAKE2b-256 741a08ec901b4069fa8d6bd2237654456c59b4bbfb4f1bdc587dcbf0c58eb637

See more details on using hashes here.

Provenance

The following attestation bundles were made for mujofil_warp-0.1.0-cp311-cp311-manylinux_2_34_x86_64.whl:

Publisher: wheels.yml on tau-intelligence/mujofil-warp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mujofil_warp-0.1.0-cp310-cp310-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for mujofil_warp-0.1.0-cp310-cp310-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 70c34317f415d4376a3b94e552f3b666f0b38da29befcbb99b5eb3468cdb64f9
MD5 6dd8c6a1383414214389920e1048829d
BLAKE2b-256 f50de564cfc3854c11469c2afe775630aab55149e18996af4be8fef071b4ad90

See more details on using hashes here.

Provenance

The following attestation bundles were made for mujofil_warp-0.1.0-cp310-cp310-manylinux_2_34_x86_64.whl:

Publisher: wheels.yml on tau-intelligence/mujofil-warp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page