An annotation tool for VLA (Vision-Language-Action) tasks.
Project description
LabelVLA
An Annotation Tool for VLA Tasks
LabelVLA desktop annotation interface
LabelVLA remote (browser) annotation interface — UE-blueprint dark theme
Why LabelVLA?
VLA (Vision-Language-Action) is a vision-centric paradigm for robotic manipulation tasks. Unlike traditional image/video annotation, VLA data has unique characteristics:
- Multi-modal time-series data: includes multi-camera video streams, robot joint angle sequences, end-effector poses, and more
- Episode-based organization: each episode represents a complete manipulation procedure
- Temporal annotation: requires segmenting the timeline into semantic segments rather than frame-by-frame labeling
There is currently no annotation tool purpose-built for VLA data. LabelVLA fills this gap with native support for the LeRobot v2.1 format and a timeline-centric annotation interface.
Features
- Native LeRobot v2.1 format support — directly reads parquet + mp4 data with no format conversion
- Multi-camera view — simultaneously displays head camera (large) and left/right wrist cameras (side panels)
- Joint angle curve visualization — plots all joint angles over time with per-joint toggle checkboxes
- Timeline segment annotation — divide the timeline into segments, each with a text description
- BBox annotation — draw bounding boxes on the head camera view; boxes automatically propagate to all frames within the same segment
- Moving object tracking — for objects that move within a segment, click on different frames to set keypoints; the system interpolates the motion path automatically
- Persistent annotations — saved as JSON files in the
segments/folder under the dataset directory - Remote annotation mode —
labelvla_rslaunches a FastAPI + browser UI so you can annotate datasets that live on a headless server
Supported Data Format
LabelVLA supports the standard LeRobot v2.1 directory structure:
dataset_folder/
├── meta/
│ ├── info.json # Dataset metadata (fps, features, camera list, etc.)
│ ├── episodes.jsonl # Frame count per episode
│ └── tasks.jsonl # Task descriptions
├── data/
│ └── chunk-000/
│ ├── episode_000000.parquet # Joint angles, velocity, actions, etc.
│ ├── episode_000001.parquet
│ └── ...
└── videos/
└── chunk-000/
├── observation.images.head/
│ ├── episode_000000.mp4
│ └── ...
├── observation.images.left_wrist/
│ └── ...
└── observation.images.right_wrist/
└── ...
Installation
Via pip
pip install labelvla
# Launch
labelvla
From source
git clone https://github.com/Kingdroper/labelVLA.git
cd labelVLA
# Using uv (recommended)
uv sync
uv run labelvla
# Or using pip
pip install -e .
labelvla
Dependencies
- Python >= 3.10
- PyQt5
- OpenCV (
opencv-python) - pandas + pyarrow
- matplotlib
- See
pyproject.tomlfor the full list
Quick Start
Step 1: Launch the application
labelvla
# or
uv run labelvla
Step 2: Open a LeRobot dataset
Click the LeRobot button in the toolbar or File menu, then select the dataset folder (the directory containing meta/info.json).
Step 3: Browse data
The LeRobot annotation window opens:
┌─────────────────────────────────────────────────┐
│ Episode: [dropdown ▼] [Save] │
├─────────────────────────────────────────────────┤
│ Joint angle curves (toggle individual joints) │
│ Click on curves to jump to that frame │
├─────────────────────────────────────────────────┤
│ ┌──────────────────┐ ┌───────────┐ │
│ │ Head camera │ │ L. wrist │ │
│ │ (large, bbox │ ├───────────┤ │
│ │ drawing here) │ │ R. wrist │ │
│ └──────────────────┘ └───────────┘ │
├─────────────────────────────────────────────────┤
│ [seg1][ seg2 ][seg3] timeline │
│ [<] ═══════════════════════════════ [>] 42/949 │
└─────────────────────────────────────────────────┘
- Scrub frames: drag the timeline slider or press
←→ - Switch episodes: use the top dropdown
- Joint curves: click "Joints ▼" to expand the joint selection panel and toggle visibility
Step 4: Create segments
In the right-side Segments panel:
- Click "+ Add": manually enter start frame, end frame, and text description
- Click "+ At Current": quickly create a segment starting at the current frame
Segments appear as colored blocks on the timeline and joint curve plot.
Step 5: Annotate bounding boxes
- Navigate to a frame within a segment
- Left-click and drag on the head camera view to draw a rectangle
- Enter the class name in the popup dialog
- The box applies to all frames in the segment (static objects)
Step 6: Track moving objects
For objects that move within a segment:
- In the right panel, select a segment, then select a bbox within it
- Click "Track Object" to enter tracking mode (button turns orange)
- Navigate to different frames and click on the object's center in the head camera view
- Each click records a keypoint (shown as a red dot); adjacent keypoints are linearly interpolated
- You can click on every frame, or skip frames — the system fills in the gaps
- Press Esc or click the button again to exit tracking mode
- Click "Clear Path" to remove all motion keypoints
Step 7: Save
- Click the Save button or press
Ctrl+S - Annotations are auto-saved when switching episodes or closing the window
Remote Annotation (labelvla_rs)
Need to annotate a LeRobot dataset that sits on a headless server? Launch a browser-based frontend with a FastAPI backend:
# On the server (or locally):
labelvla_rs --host 0.0.0.0 --port 8000 \
--dataset /path/to/lerobot_dataset # optional pre-load
Then open http://<server>:8000/ in any browser. If you skip --dataset, the landing page lets you enter a server-side dataset path.
- Same workflow as desktop — the web UI mirrors every feature of the native
labelvla: timeline segments, bbox drawing, moving-object tracking, joint curves, keyboard shortcuts (←/→,Ctrl+S,Esc). - UE-Blueprint style — dark, grid-backed theme that stays readable over long annotation sessions.
- Zero-install client — the browser is the only requirement; no
pip installon the annotator's machine. - Works with tunnels — point
ngrok,cloudflared, or an SSH tunnel at the port to annotate from anywhere. - Shared storage — annotations are written to
{dataset}/segments/episode_NNNNNN.jsonwith exactly the same schema as the desktop app, so both entry points interoperate.
The desktop labelvla command is unaffected — remote mode is purely additive.
Annotation Output Format
Annotations are saved to {dataset_dir}/segments/episode_NNNNNN.json:
{
"episode_index": 0,
"segments": [
{
"start_frame": 0,
"end_frame": 120,
"text": "reach for domino",
"bboxes": [
{
"x": 100.0,
"y": 200.0,
"width": 50.0,
"height": 50.0,
"label": "domino",
"keypoints": []
},
{
"x": 300.0,
"y": 150.0,
"width": 40.0,
"height": 40.0,
"label": "gripper",
"keypoints": [
{"frame": 0, "cx": 320.0, "cy": 170.0},
{"frame": 60, "cx": 150.0, "cy": 220.0},
{"frame": 120, "cx": 120.0, "cy": 210.0}
],
"interpolated_centers": [
{"frame": 0, "cx": 320.0, "cy": 170.0},
{"frame": 1, "cx": 317.2, "cy": 170.8},
{"frame": 2, "cx": 314.3, "cy": 171.7},
"... (one entry per frame, 121 total)",
{"frame": 120, "cx": 120.0, "cy": 210.0}
]
}
]
}
]
}
Field reference:
| Field | Description |
|---|---|
start_frame / end_frame |
Start and end frame indices of the segment |
text |
Text description of the segment |
bboxes[].x/y/width/height |
Original position and size of the bounding box |
bboxes[].label |
Object class name |
bboxes[].keypoints |
Motion keypoint list (empty = static object) |
keypoints[].frame |
Keyframe index |
keypoints[].cx/cy |
Box center coordinates at this frame |
bboxes[].interpolated_centers |
Pre-computed per-frame box center coordinates (moving objects only, ready to use without re-interpolation) |
Keyboard Shortcuts
| Shortcut | Action |
|---|---|
← / → |
Previous / next frame |
Ctrl+S |
Save annotations |
Ctrl+W |
Close window |
Esc |
Exit tracking mode |
Changelog
v0.2.0 (2026-04-25)
- NEW:
labelvla_rs— remote annotation server. Boots a FastAPI backend + browser SPA so you can annotate LeRobot datasets that live on a headless server. Same workflow as the desktop app (timeline segments, bbox drawing, moving-object tracking, joint curves, keyboard shortcuts), exposed over HTTP. UI uses a UE-blueprint-inspired dark theme. - Annotations written by the remote server share the exact same
{dataset}/segments/episode_NNNNNN.jsonschema as the desktop app, so both entry points interoperate. - Internal: serialize OpenCV
VideoCaptureaccess to avoid an ffmpeg threading assertion under uvicorn's threadpool. - Internal: package metadata lookup now resolves the new
labelvladistribution name (fixes a crash on freshpip install labelvlaintroduced in v0.1.x).
v0.1.1 (2026-04-19)
- Each bbox now carries a unique
idthat stays consistent across frames within its segment (both static and moving bboxes).
v0.1.0 (2026-04-18)
- Initial public release. Native LeRobot v2.1 support, multi-camera view, joint-angle curves, timeline segment annotation, bbox annotation with motion keypoint tracking, JSON persistence.
Acknowledgements
LabelVLA is built on top of labelme. We thank the labelme project for providing the foundational framework.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file labelvla-0.2.0.tar.gz.
File metadata
- Download URL: labelvla-0.2.0.tar.gz
- Upload date:
- Size: 13.9 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
56af58b323ba259fb61c947313f95da4d5225fe6760d432b290cca40ae8d1f24
|
|
| MD5 |
cc49678b9ee9a6ef3c76b6d0ec10ab75
|
|
| BLAKE2b-256 |
880a1d0e94f0555ca5ff58d6460b7d898bfe353f6282b367b143604da2e58a83
|
File details
Details for the file labelvla-0.2.0-py3-none-any.whl.
File metadata
- Download URL: labelvla-0.2.0-py3-none-any.whl
- Upload date:
- Size: 614.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
429336519c8c9115bfbc584e575c4822af71dbcb96570637629ac70950974da4
|
|
| MD5 |
a52d05113aad79d76edbfc2e04356165
|
|
| BLAKE2b-256 |
21b889e178bbabcd90e8b2baf38c9d352e904c27b94d6ab72cc99cf7a352daa0
|