Stream attention / activation heatmaps from lerobot policies to rerun, live or from a recorded dataset.
Project description
lerobot_attention_visualizer
See where your lerobot policy is looking, in real time. Per-camera attention overlays stream to rerun next to the raw image while the policy drives the arm — so you can eyeball whether the model is locked onto the block, the gripper, or a stray cable in the background.
Built for debugging vision-language-action policies on real hardware. If your VLA is misbehaving and you suspect the visual grounding rather than the action expert, this is the cheapest way to check. Wrap a policy in a context manager and three rerun streams (raw, heatmap, overlay) appear per camera:
from lerobot_attention_visualizer import SmolVLAAttention
viz = SmolVLAAttention(policy)
with viz:
actions = policy.predict_action_chunk(obs_frame, ...)
viz.log_overlay(obs)
That's the whole library surface. Everything in examples/ is one
specific eval loop using it.
Demo
SmolVLA — attention rollout across SigLIP ViT layers, replayed from a recorded dataset (no hardware required):
ACT — ResNet-18 final-conv activation magnitude per camera while the arm executes live on hardware:
Compatibility
Targets lerobot v0.5.1+ and LeRobotDataset v3.0. Hardware-agnostic: works with any robot lerobot supports (SO-100, SO-101, Aloha, …). CUDA is preferred for SmolVLA / π0; ACT runs comfortably on CPU.
Policies supported
- SmolVLA — attention rollout across the SigLIP ViT layers.
policies/smolvla.py. - π0 / π0.5 / π0-fast — same rollout, ported to PaliGemma's vision
tower. One adapter (
Pi0Attention) handles all three since they share thepaligemma_with_expert.embed_imagelayout.policies/pi0.py. - ACT — per-spatial-cell activation magnitude of the ResNet-18
backbone's final conv stage.
policies/act.py.
Visualizing your own custom policy?
See docs/custom_policies.md — the library
contracts on a small interface (HF-style vision encoder + a per-image
entry point) and the tutorial walks through three integration paths.
What you get
Three rerun streams per camera per chunk:
attention/<cam>/image # raw RGB
attention/<cam>/attention # heatmap (red = high attention)
attention/<cam>/overlay # blended 50/50
Updated once per RTC chunk (~every 10–20 control steps) for SmolVLA, and
once per ACT-queue refill (every n_action_steps) for ACT — enough to
read the story without burning compute.
Layout
src/lerobot_attention_visualizer/
├── visualizer/ # shared heatmap math + rerun streams
└── policies/ # per-policy adapters (smolvla, pi0, act)
examples/
├── smolvla_so101_rtc.py # SmolVLA + RTC on a live SO-101
├── act_so101.py # ACT on a live SO-101
└── visualize_smolvla_dataset.py # offline replay from a LeRobotDataset
docs/ # tutorials (custom policies, etc.)
docs/resources/ # demo videos and screenshots
Install
Requires Python ≥ 3.12 and lerobot v0.5.1+. Use a fresh conda env so the heavy native deps (torch, cv2, pyrealsense, SDL/pygame) don't fight an existing install:
conda create -n lav python=3.12 -y
conda activate lav
Both routes below install the same extras. Pick the ones matching the policies you intend to visualize, plus any robot/camera extras for real hardware:
| Use case | Extra |
|---|---|
| SmolVLA | smolvla |
| π0 / π0.5 / π0-fast | pi |
| ACT | (none — in core) |
| SO-100 / SO-101 motors | feetech |
| Aloha | aloha |
| Intel RealSense camera | intelrealsense |
| All of the above | all |
PyPI
The quickest route — no git clone required:
pip install lerobot-attention-visualizer # ACT only
pip install 'lerobot-attention-visualizer[smolvla]' # + SmolVLA
pip install 'lerobot-attention-visualizer[smolvla,feetech,intelrealsense]' # full SO-101 rig
From source
Clone the repo and install in editable mode so local edits take effect immediately:
git clone https://github.com/CursedRock17/lerobot_attention_visualizer
cd lerobot_attention_visualizer
pip install -e '.[smolvla]' # replace with your extras
If you need to track a specific lerobot git tag (e.g. during active lerobot development), install lerobot first — pip will leave it alone when resolving our deps:
pip install 'lerobot[smolvla,feetech] @ git+https://github.com/huggingface/lerobot.git@v0.5.1'
pip install -e '.[smolvla,feetech]'
Run the examples
No hardware? Start here — replay a recorded dataset and visualize attention frame-by-frame:
python examples/visualize_smolvla_dataset.py # edit POLICY_PATH + DATASET_REPO_ID at top
Live on a robot — edit the constants at the top of each script (follower port, camera serials, task description) then:
python examples/smolvla_so101_rtc.py # SmolVLA + RTC + rollout
python examples/act_so101.py # ACT + ResNet activation
Toggle ATTENTION_ENABLED = False at the top of either live script to
run the same control loop without the capture — useful for A/B-comparing
the policy's behavior with the instrumentation removed.
Integrate into your own project
The whole library surface is two context managers; everything else in
examples/ is just one user's eval glue. Drop into any existing lerobot
control loop:
from lerobot_attention_visualizer import SmolVLAAttention # or ACTAttention
viz = SmolVLAAttention(policy)
with viz:
for step in range(num_steps):
obs = robot.get_observation()
# ... build the obs frame, call your policy as usual ...
actions = policy.predict_action_chunk(obs_frame, ...)
viz.log_overlay(obs) # streams image / heatmap / overlay per camera
viz.log_overlay(obs) expects obs to be a dict mapping bare camera
names (e.g. "top", not "observation.images.top") to HWC uint8
ndarrays — that matches what follower.get_observation() returns. It is
a no-op on steps where no fresh forward happened (RTC queue still
buffered, ACT queue not yet refilled), so it is safe to call every step.
For visualizing a custom policy that subclasses or borrows from
SmolVLA / ACT, see docs/custom_policies.md.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file lerobot_attention_visualizer-0.1.2.tar.gz.
File metadata
- Download URL: lerobot_attention_visualizer-0.1.2.tar.gz
- Upload date:
- Size: 50.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
67d6d649925bdf20b762a38dd8e1657653c80a3f09af15875d99c75a189ead10
|
|
| MD5 |
53f8fd7a40bbad7c24e7c311105a60fd
|
|
| BLAKE2b-256 |
14de14fdcfb922c3b4509e043f69c671ed2f7f286a50c1e585c559dacf91248c
|
File details
Details for the file lerobot_attention_visualizer-0.1.2-py3-none-any.whl.
File metadata
- Download URL: lerobot_attention_visualizer-0.1.2-py3-none-any.whl
- Upload date:
- Size: 20.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d5278eef1bb0a629a2245e7ac4e1bdd2af87ab5332d115766862a8898c58f11c
|
|
| MD5 |
3fb85f53b3a2824e792418551a26fa0e
|
|
| BLAKE2b-256 |
d86926471d1ba6e54238b111497bb5788ca193814001bfc996871f56ca6044cd
|