Skip to main content

Stream attention / activation heatmaps from lerobot policies to rerun, live or from a recorded dataset.

Project description

lerobot_attention_visualizer

See where your lerobot policy is looking, in real time. Per-camera attention overlays stream to rerun next to the raw image while the policy drives the arm — so you can eyeball whether the model is locked onto the block, the gripper, or a stray cable in the background.

Built for debugging vision-language-action policies on real hardware. If your VLA is misbehaving and you suspect the visual grounding rather than the action expert, this is the cheapest way to check. Wrap a policy in a context manager and three rerun streams (raw, heatmap, overlay) appear per camera:

from lerobot_attention_visualizer import SmolVLAAttention

viz = SmolVLAAttention(policy)
with viz:
    actions = policy.predict_action_chunk(obs_frame, ...)
    viz.log_overlay(obs)

That's the whole library surface. Everything in examples/ is one specific eval loop using it.

Demo

SmolVLA — attention rollout across SigLIP ViT layers, replayed from a recorded dataset (no hardware required):

ACT — ResNet-18 final-conv activation magnitude per camera while the arm executes live on hardware:

Attention overlay in the rerun viewer

Compatibility

Targets lerobot v0.5.1+ and LeRobotDataset v3.0. Hardware-agnostic: works with any robot lerobot supports (SO-100, SO-101, Aloha, …). CUDA is preferred for SmolVLA / π0; ACT runs comfortably on CPU.

Policies supported

  • SmolVLA — attention rollout across the SigLIP ViT layers. policies/smolvla.py.
  • π0 / π0.5 / π0-fast — same rollout, ported to PaliGemma's vision tower. One adapter (Pi0Attention) handles all three since they share the paligemma_with_expert.embed_image layout. policies/pi0.py.
  • ACT — per-spatial-cell activation magnitude of the ResNet-18 backbone's final conv stage. policies/act.py.

Visualizing your own custom policy? See docs/custom_policies.md — the library contracts on a small interface (HF-style vision encoder + a per-image entry point) and the tutorial walks through three integration paths.

What you get

Three rerun streams per camera per chunk:

attention/<cam>/image       # raw RGB
attention/<cam>/attention   # heatmap (red = high attention)
attention/<cam>/overlay     # blended 50/50

Updated once per RTC chunk (~every 10–20 control steps) for SmolVLA, and once per ACT-queue refill (every n_action_steps) for ACT — enough to read the story without burning compute.

Layout

src/lerobot_attention_visualizer/
├── visualizer/         # shared heatmap math + rerun streams
└── policies/           # per-policy adapters (smolvla, pi0, act)

examples/
├── smolvla_so101_rtc.py          # SmolVLA + RTC on a live SO-101
├── act_so101.py                  # ACT on a live SO-101
└── visualize_smolvla_dataset.py  # offline replay from a LeRobotDataset

docs/                   # tutorials (custom policies, etc.)
docs/resources/         # demo videos and screenshots

Install

Requires Python ≥ 3.12 and lerobot v0.5.1+. Use a fresh conda env so the heavy native deps (torch, cv2, pyrealsense, SDL/pygame) don't fight an existing install:

conda create -n lav python=3.12 -y
conda activate lav

Both routes below install the same extras. Pick the ones matching the policies you intend to visualize, plus any robot/camera extras for real hardware:

Use case Extra
SmolVLA smolvla
π0 / π0.5 / π0-fast pi
ACT (none — in core)
SO-100 / SO-101 motors feetech
Aloha aloha
Intel RealSense camera intelrealsense
All of the above all

PyPI

The quickest route — no git clone required:

pip install lerobot-attention-visualizer          # ACT only
pip install 'lerobot-attention-visualizer[smolvla]'           # + SmolVLA
pip install 'lerobot-attention-visualizer[smolvla,feetech,intelrealsense]'  # full SO-101 rig

From source

Clone the repo and install in editable mode so local edits take effect immediately:

git clone https://github.com/CursedRock17/lerobot_attention_visualizer
cd lerobot_attention_visualizer
pip install -e '.[smolvla]'                      # replace with your extras

If you need to track a specific lerobot git tag (e.g. during active lerobot development), install lerobot first — pip will leave it alone when resolving our deps:

pip install 'lerobot[smolvla,feetech] @ git+https://github.com/huggingface/lerobot.git@v0.5.1'
pip install -e '.[smolvla,feetech]'

Run the examples

No hardware? Start here — replay a recorded dataset and visualize attention frame-by-frame:

python examples/visualize_smolvla_dataset.py   # edit POLICY_PATH + DATASET_REPO_ID at top

Live on a robot — edit the constants at the top of each script (follower port, camera serials, task description) then:

python examples/smolvla_so101_rtc.py   # SmolVLA + RTC + rollout
python examples/act_so101.py           # ACT + ResNet activation

Toggle ATTENTION_ENABLED = False at the top of either live script to run the same control loop without the capture — useful for A/B-comparing the policy's behavior with the instrumentation removed.

Integrate into your own project

The whole library surface is two context managers; everything else in examples/ is just one user's eval glue. Drop into any existing lerobot control loop:

from lerobot_attention_visualizer import SmolVLAAttention   # or ACTAttention

viz = SmolVLAAttention(policy)
with viz:
    for step in range(num_steps):
        obs = robot.get_observation()
        # ... build the obs frame, call your policy as usual ...
        actions = policy.predict_action_chunk(obs_frame, ...)
        viz.log_overlay(obs)   # streams image / heatmap / overlay per camera

viz.log_overlay(obs) expects obs to be a dict mapping bare camera names (e.g. "top", not "observation.images.top") to HWC uint8 ndarrays — that matches what follower.get_observation() returns. It is a no-op on steps where no fresh forward happened (RTC queue still buffered, ACT queue not yet refilled), so it is safe to call every step.

For visualizing a custom policy that subclasses or borrows from SmolVLA / ACT, see docs/custom_policies.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lerobot_attention_visualizer-0.1.2.tar.gz (50.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lerobot_attention_visualizer-0.1.2-py3-none-any.whl (20.1 kB view details)

Uploaded Python 3

File details

Details for the file lerobot_attention_visualizer-0.1.2.tar.gz.

File metadata

File hashes

Hashes for lerobot_attention_visualizer-0.1.2.tar.gz
Algorithm Hash digest
SHA256 67d6d649925bdf20b762a38dd8e1657653c80a3f09af15875d99c75a189ead10
MD5 53f8fd7a40bbad7c24e7c311105a60fd
BLAKE2b-256 14de14fdcfb922c3b4509e043f69c671ed2f7f286a50c1e585c559dacf91248c

See more details on using hashes here.

File details

Details for the file lerobot_attention_visualizer-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for lerobot_attention_visualizer-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 d5278eef1bb0a629a2245e7ac4e1bdd2af87ab5332d115766862a8898c58f11c
MD5 3fb85f53b3a2824e792418551a26fa0e
BLAKE2b-256 d86926471d1ba6e54238b111497bb5788ca193814001bfc996871f56ca6044cd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page