Skip to main content

Photoreal Filament PBR rendering for GPU-resident MuJoCo (MJWarp), zero-copy to PyTorch

Project description

mujofil-warp

Photoreal PBR rendering for GPU-resident MuJoCo (MJWarp), zero-copy to PyTorch.

MJWarp simulates thousands of parallel MuJoCo worlds entirely on the GPU, but its built-in batch renderer is a deliberately low-fidelity single-hit raycaster (flat Lambertian, no PBR / IBL / reflections, and it cannot load GLB environments).

mujofil-warp pairs MJWarp's GPU-resident physics with Google Filament's physically-based renderer (PBR materials, image-based lighting, soft shadows, SSAO) and delivers each rendered frame straight to PyTorch as a CUDA tensor — no CPU round-trip.

📖 Full documentation: docs/getting started, API guide, feature reference, cookbook & troubleshooting.

🖥️ Running CPU MuJoCo instead? Use the CPU edition, mujofil (photoreal frames as NumPy arrays).

Highlights

  • Zero-copy to torch.cuda. Filament renders into GPU memory that CUDA imports directly; observations arrive as torch.cuda tensors with no GPU→CPU→GPU bounce.
  • GPU-resident pipeline. MJWarp steps physics on the GPU; only a tiny transform array crosses to the host. Pixels never leave the GPU.
  • Photoreal. Full PBR metalness/roughness, IBL, soft shadows, SSAO, MSAA, filmic tone mapping — renders complete GLB environments MJWarp/MuJoCo can't.
  • Two backends. An OpenGL single-sync path and a Vulkan shared-device path, selectable at runtime.

Performance (RTX 4060 Laptop, 8 GiB)

All numbers are env-steps/s (= cameras/s), MJWarp GPU physics → torch.cuda.

vs vanilla MuJoCo, same scene, same workload (ours adds PBR + zero-copy):

128px N=512 256px N=512 256px N=1024
mujofil-warp (GL) 10,675 9,949 10,628
vanilla mujoco.Renderer 8,394 4,808 5,021
speedup 1.27× 2.07× 2.12×

We beat vanilla MuJoCo by 1.25–2.12× on equal work — the gap widens at higher resolution because zero-copy avoids the CPU readback that scales with pixels.

Full photoreal warehouse (3 GLB meshes + IBL + 16 spotlights + SSAO — geometry vanilla MuJoCo and MJWarp cannot even load): ~3,200 cam/s at 128px, holding flat from N=64 to N=2048.

GL vs Vulkan backend (full warehouse): the GL single-sync path is 1.3× faster and, critically, its sync cost is constant across N (one flushAndWait), where the Vulkan path's grows linearly with batch size.

vs MJWarp's own raycaster: MJWarp scales to ~42,000 cam/s at N=2048 — but that is flat Lambertian on bare objects (no PBR/IBL, no GLB environments). At small N (≤32) mujofil-warp is faster and photoreal; at large N MJWarp wins raw throughput by trading away all visual fidelity. Different categories: MJWarp is a parallel raycaster, this is a photoreal rasterizer.

Quickstart

import mujoco, mujoco_warp as mjw, warp as wp, torch
from mujofil_warp import WarpRenderer

mjm = mujoco.MjModel.from_xml_path("scene.xml")
M = mjw.put_model(mjm)
d = mjw.make_data(mjm, nworld=32)
host = [mujoco.MjData(mjm) for _ in range(32)]

r = WarpRenderer(width=256, height=256, batch_size=32, preset="high")
r.load_model(mjm)

mjw.step(M, d); wp.synchronize()
gx = d.geom_xpos.numpy(); gm = d.geom_xmat.numpy().reshape(32, mjm.ngeom, 9)
for i, h in enumerate(host):
    h.geom_xpos[:] = gx[i]; h.geom_xmat[:] = gm[i]

obs = r.render_batch(mjm, host, cam_id=0)   # (32, 256, 256, 4) uint8 torch.cuda

See examples/minimal_render.py for a runnable demo.

Quality toggles

Every fidelity feature is an independent toggle so you can reproduce the throughput/fidelity trade-offs in benchmarks/ on your own hardware:

from mujofil_warp import WarpRenderer, make_config

# keyword toggles
r = WarpRenderer(width=256, batch_size=32, ssao=False, shadows=True, msaa=True)

# or a named preset, optionally overriding individual toggles
r = WarpRenderer(width=256, batch_size=32, preset="fast")          # SSAO off, ~2x
r = WarpRenderer(width=256, batch_size=32, preset="high", bloom=True)

# or an explicit config
cfg = make_config(width=256, height=256, batch_size=32, exposure=1.6)
r = WarpRenderer(config=cfg)
Toggle Effect Notes
ssao screen-space ambient occlusion biggest cost — ~2× faster when off
ssao_quality SSAO quality low/medium/high/ultra affects look more than speed
ssao_ssct SSAO cone tracing (contact shadows) small extra cost on top of SSAO
shadows soft shadow maps
msaa / msaa_samples multi-sample AA 2 / 4 / 8
bloom HDR bloom off by default
fxaa fast approximate AA alternative to MSAA
exposure linear exposure before tone mapping
tone_mapping FILMIC vs LINEAR
dithering temporal dithering reduces banding

Presets: high (photoreal, default), medium (high-quality SSAO, no cone tracing), fast (SSAO off, ~2×), ultra (8× MSAA + bloom), raw (no AO/shadows/AA, ~3×).

Backends

Select at runtime with MUJOFIL_WARP_BACKEND:

  • gl (default) — OpenGL single-sync. Renders N worlds into N imported GL textures bracketed by one flushAndWait, then exports via GL↔CUDA interop. Sync cost is constant in N; fastest in the warehouse. Requires an X display (DISPLAY); when none is available it automatically falls back to Vulkan.
  • vulkan — shared Vulkan device + exportable swapchain + CUDA external-memory import. Works fully headless (no X), but the 2-frame in-flight cap makes its sync cost grow with batch size.
# default is gl; force a backend explicitly with the env var:
MUJOFIL_WARP_BACKEND=gl     python examples/minimal_render.py --preset high
MUJOFIL_WARP_BACKEND=vulkan python examples/minimal_render.py --preset high

Installation

pip install mujofil-warp

The wheel is self-contained: Filament and the CUDA runtime are statically baked in, the compiled materials ship inside it, and libc++ is bundled. There is no CUDA toolkit, no Filament, and no mujofil to install — the only hard requirement at runtime is an NVIDIA GPU + driver.

Supported environments

Because the package contains no CUDA device code (only host-side runtime calls), a single wheel is portable across GPUs and driver versions:

Dimension Support
GPU Any NVIDIA GPU (Turing / Ampere / Ada / Hopper / …) — no compute-capability lock-in
Driver / CUDA NVIDIA driver ≥ R525 (CUDA 12.0+). One wheel, all newer drivers
OS Linux x86_64, glibc ≥ 2.34 (Ubuntu 22.04+, Debian 12+, RHEL/Alma/Rocky 9+, Fedora 35+)
Python CPython 3.10 – 3.13

Not yet supported: aarch64 (Jetson/Grace), glibc < 2.34 (Ubuntu 20.04 / RHEL 8), non-NVIDIA GPUs. These need a from-source Filament build (planned).

Headless / display

Both backends are fully headless — no X server, no display, nothing extra to install beyond the NVIDIA driver:

  • GL (default) uses surfaceless EGL, so it renders headless at full speed on a bare GPU server (cloud, cluster, container). This is the recommended path for vision-RL training.
  • Vulkan is also headless (shared device + exportable swapchain).

GL auto-falls back to Vulkan only if the GL module fails to initialize.

Building from source

Most users never need this — pip install mujofil-warp ships prebuilt wheels. Build from source only to hack on the C++ or target an unsupported environment.

Prerequisites (the native modules and Filament are built with Clang + libc++):

Tool Debian/Ubuntu RHEL/Fedora/Alma
Clang + libc++ dev clang libc++-dev libc++abi-dev clang + libc++ (LLVM release)
CUDA toolkit (headers + static cudart) nvidia-cuda-toolkit cuda-cudart-devel-12-x cuda-driver-devel-12-x
EGL / GL dev headers libegl1-mesa-dev libgl1-mesa-dev mesa-libEGL-devel mesa-libGL-devel
Build tools (source-built Filament only) git cmake ninja-build git cmake ninja-build

Then:

git clone https://github.com/tau-intelligence/mujofil-warp
cd mujofil-warp
CC=clang CXX=clang++ pip install .

How Filament is resolved (the GL backend's headless EGL rendering needs a custom EGL-enabled Filament — Google's prebuilt Linux Filament is GLX-only). CMakeLists.txt tries, in order:

  1. FILAMENT_DIR=/path/to/egl-filament if you set it — used as-is (fastest).
  2. Download a prebuilt EGL Filament artifact (seconds). The default path.
  3. Build from source via packaging/build_filament_egl.sh (~20–30 min) if the download is unavailable — this is the step that needs git/cmake/ninja.

So a plain pip install . is one command; supply FILAMENT_DIR to skip the download/build entirely:

CC=clang CXX=clang++ FILAMENT_DIR=/path/to/egl-filament pip install .

The EGL Filament artifact is reproducible from source:

packaging/build_filament_egl.sh ./_filament_egl   # clone + patch + build

Dev rebuilds (no full reinstall)

For iterating on the C++ without a full pip install, the two helper scripts build the modules in place (point FILAMENT_DIR at the EGL Filament build):

bash native/build_gl.sh   # OpenGL single-sync, headless EGL -> _mujofil_warp_gl
bash native/build.sh      # Vulkan zero-copy                  -> _mujofil_warp

Architecture & porting

mujofil-warp is one core with pluggable rendering backends, so new platforms are added as a backend — not a fork.

mujofil_warp/__init__.py     Python API, presets, backend selection   (shared)
native/render_module.cpp     pybind bindings, batching                (shared)
native/vendor/core/          scene / material / light bridge          (shared)
native/renderer_gl.cpp       Linux: surfaceless EGL  + CUDA interop   (backend)
native/renderer_warp.cpp     Linux: Vulkan device    + CUDA interop   (backend)

Everything platform-specific lives behind the vf_mujoco::Renderer interface (context creation, GPU→tensor interop). Adding macOS or Windows means adding one renderer_*.{cpp,mm} implementing that interface — the scene, material, lighting, Python API, and batching layers are reused unchanged.

  • Windows would use a WGL/EGL context + OPAQUE_WIN32 external-memory handles for the CUDA interop.
  • macOS is a different target: there is no CUDA on Apple platforms, so a Mac backend would use Filament's Metal backend and export to PyTorch via MPS (MTLBuffer → torch-MPS) rather than torch.cuda.

These are not yet implemented (they need the respective hardware to develop and validate on), but the codebase is structured so they slot in without a fork.

Layout

mujofil_warp/        Python package (WarpRenderer, make_config, presets)
native/              C++ renderer + pybind module + build scripts
  renderer_gl.cpp      OpenGL single-sync zero-copy backend
  renderer_warp.cpp    Vulkan shared-device zero-copy backend
  render_module.cpp    pybind bindings (shared by both backends)
examples/            runnable demos
benchmarks/          the benchmark suite behind the numbers above
spikes/              isolated feasibility proofs (GL↔CUDA, Vulkan↔CUDA, DLPack)
docs/ARCHITECTURE.md design + phased integration plan

Relationship to mujofil

mujofil-warp reuses the CPU-MuJoCo mujofil renderer's scene/material/light source but is a separate build — the published mujofil package is untouched. Use mujofil for high-fidelity CPU-MuJoCo vector-env rendering; use mujofil-warp when you want MJWarp's GPU-resident physics with photoreal, zero-copy observations.

License

Apache-2.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mujofil_warp-0.1.3.tar.gz (7.3 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

mujofil_warp-0.1.3-cp313-cp313-manylinux_2_34_x86_64.whl (11.6 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.34+ x86-64

mujofil_warp-0.1.3-cp312-cp312-manylinux_2_34_x86_64.whl (11.6 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ x86-64

mujofil_warp-0.1.3-cp311-cp311-manylinux_2_34_x86_64.whl (11.6 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.34+ x86-64

mujofil_warp-0.1.3-cp310-cp310-manylinux_2_34_x86_64.whl (11.5 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.34+ x86-64

File details

Details for the file mujofil_warp-0.1.3.tar.gz.

File metadata

  • Download URL: mujofil_warp-0.1.3.tar.gz
  • Upload date:
  • Size: 7.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mujofil_warp-0.1.3.tar.gz
Algorithm Hash digest
SHA256 8920ee768d1851f02d28a9ac956e605f45fab399547a05a2764c3cc270a846b0
MD5 b8b9ba8e1924cda155d86481d905318a
BLAKE2b-256 db9b2163ae944af48382d7f011d64940c56a8b17ba3c89c4fc2f2439c78e3f0c

See more details on using hashes here.

Provenance

The following attestation bundles were made for mujofil_warp-0.1.3.tar.gz:

Publisher: wheels.yml on tau-intelligence/mujofil-warp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mujofil_warp-0.1.3-cp313-cp313-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for mujofil_warp-0.1.3-cp313-cp313-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 eb03fe2e59f48943cfee7fb39c6623075592bedda62ecee1b78b7ec47a2dae96
MD5 4dcfe2c22b71b703700f053152f092fb
BLAKE2b-256 b09d4232a1290f90198e67c9c1244620f2956cccc9d8110eec36cd535c36b293

See more details on using hashes here.

Provenance

The following attestation bundles were made for mujofil_warp-0.1.3-cp313-cp313-manylinux_2_34_x86_64.whl:

Publisher: wheels.yml on tau-intelligence/mujofil-warp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mujofil_warp-0.1.3-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for mujofil_warp-0.1.3-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 91f9ee80fb0f367116d01e2e580d9411dfdaff8546abf2dd5ab002ef13aea804
MD5 f8cc163013b5452d3620a1da686f2ec3
BLAKE2b-256 25d394b970f3cd101b16a2cfcd5364ac2f7f30b8cf7e2fe3ff71cd82f81035b1

See more details on using hashes here.

Provenance

The following attestation bundles were made for mujofil_warp-0.1.3-cp312-cp312-manylinux_2_34_x86_64.whl:

Publisher: wheels.yml on tau-intelligence/mujofil-warp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mujofil_warp-0.1.3-cp311-cp311-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for mujofil_warp-0.1.3-cp311-cp311-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 db5580c6844f074b323bfa998d6bd5d0d1a14dbf350e952fe567f457705ea37d
MD5 b8dad8f5360afd7f2084f80fa0c7c8c1
BLAKE2b-256 274a69ad6a0c047e1cb140e3d994cd61c7e0956990bc72adcfb20dcfccb3233a

See more details on using hashes here.

Provenance

The following attestation bundles were made for mujofil_warp-0.1.3-cp311-cp311-manylinux_2_34_x86_64.whl:

Publisher: wheels.yml on tau-intelligence/mujofil-warp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mujofil_warp-0.1.3-cp310-cp310-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for mujofil_warp-0.1.3-cp310-cp310-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 6500b6684bfc742649bdeb1c847eb281ef0da05d4cb629c04971201777b51f8d
MD5 d08b09bcb536337eb3b7078f19c50416
BLAKE2b-256 89cba4cd185aed1a5204aa4e76e6540394cb0965b87affb97ec2e800498758b6

See more details on using hashes here.

Provenance

The following attestation bundles were made for mujofil_warp-0.1.3-cp310-cp310-manylinux_2_34_x86_64.whl:

Publisher: wheels.yml on tau-intelligence/mujofil-warp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page