a robot teleoperation server for XR headsets
Project description
XR Robot Teleop Server
A high-performance Python server designed to stream 360° panoramic video and full-body tracking data to XR headsets (like Quest 3, Vision Pro) for immersive robot teleoperation. It uses WebRTC for low-latency communication and features a modular architecture for easy customization.
Key Features
- Low-Latency Streaming: Built on
aiortcandFastAPIfor efficient WebRTC communication. - 360° Video Reprojection: Dynamically transforms 360° equirectangular video into a perspective view based on the user's head orientation, providing an immersive FPV experience.
- Full Body Tracking: Receives, deserializes, and processes full-body, upper-body, and hand skeleton data from XR clients (e.g., a Unity app).
- Modular & Extensible: Easily replace video sources (e.g., from a file, a live camera, or a GoPro stream) and video transformations with your own custom classes.
- Hardware Acceleration: Utilizes FFmpeg's hardware acceleration capabilities (
videotoolbox,d3d11va,vaapi, etc.) for efficient video decoding, minimizing CPU load. - Coordinate System Handling: Includes utilities to convert coordinate systems between different platforms (e.g., Unity's left-handed Y-up to a standard right-handed Z-up).
- Rich Data Schemas: Provides clear Python enumerations for OpenXR and OVR skeleton bone IDs, simplifying data interpretation.
- Integrated Visualization: Comes with an example to visualize the received body pose data in 3D using the Rerun SDK.
System Architecture
The server acts as a bridge between an XR client (e.g., a VR headset running a Unity application) and a robot control system. The client sends head pose and body tracking data to the server, which in turn processes it and sends back a reprojected video stream. The processed body pose can then be used to command a robot or a simulation.
graph TD
subgraph "XR Client (e.g., Unity on Headset)"
A[User Input: Head Pose + Body Tracking] --> B{WebRTC Connection};
B --> C["Send Data Channels (body_pose, camera)"];
D[Receive Video Stream] --> E[Display to User];
B --> D;
end
subgraph "Python Server (xr-robot-teleop-server)"
F{WebRTCServer};
G[Data Channel Handlers];
H[Video Pipeline];
F <--> G;
F <--> H;
subgraph G [Data Channel Handlers]
G1["'body_pose' handler"];
G2["'camera' handler"];
G1 --> G3[Deserialize Pose Data];
G2 --> G4[Update Head Pose State];
end
subgraph H [Video Pipeline]
H1[VideoSource: 360° Video];
H2[VideoTransform: Reprojection];
H3[Reprojected Video Track];
H1 --> H2;
H2 --> H3;
end
G4 --> H2;
end
subgraph "Robot Control System (Application Layer)"
I["Robot/Simulation (e.g. MuJoCo)"];
G3 --> I;
end
C --> F;
H3 --> D;
style XR Client fill:#d4fcd7,stroke:#333,stroke-width:2px
style Python Server fill:#cde4f7,stroke:#333,stroke-width:2px
style Robot Control System fill:#f7e8cd,stroke:#333,stroke-width:2px
Getting Started
Installation
pip install xr-robot-teleop-server
Development
-
Clone the repository:
git clone https://github.com/your-username/xr-robot-teleop-server.git cd xr-robot-teleop-server
-
Install the package: For basic usage, install the package in editable mode:
pip install -e .
To include dependencies for the visualization example (
rerun), install with the[viz]extra:pip install -e ".[viz]"
Running the Examples
This project includes examples to demonstrate its capabilities.
Record and Visualize Full Body Pose
This example starts a WebRTC server that listens for full-body tracking data from a client, deserializes it, and visualizes the skeleton in 3D using Rerun.
-
Run the server:
python examples/record_full_body_pose.py --visualize
The server will start and print:
INFO Starting xr-robot-teleop-server... -
Connect your client: Connect your XR client to the server at
http://<your-server-ip>:8080. Once the WebRTC connection is established, the server will start receiving data. -
Visualize: A Rerun viewer window will spawn, displaying the received skeleton poses in real-time.
Usage
The core of the server is the WebRTCServer class. You can instantiate it with factories for creating video tracks and handlers for data channels.
from xr_robot_teleop_server.streaming import WebRTCServer
from xr_robot_teleop_server.sources import FFmpegFileSource
from xr_robot_teleop_server.transforms import EquilibEqui2Pers
from aiortc import MediaStreamTrack
import numpy as np
# 1. Define a shared state object (optional)
# This object is shared between the video track and data channel handlers
# for a single peer connection.
class AppState:
def __init__(self):
self.pitch = 0.0
self.yaw = 0.0
self.roll = 0.0
# 2. Define a handler for an incoming data channel
def on_camera_message(message: str, state: AppState):
# Parse message and update the shared state
data = json.loads(message)
state.pitch = np.deg2rad(float(data.get("pitch", 0.0)))
state.yaw = np.deg2rad(float(data.get("yaw", 0.0)))
# 3. Define a factory for the video track
# This factory creates the video source, the transform, and the track itself.
class ReprojectionTrack(MediaStreamTrack):
kind = "video"
def __init__(self, state: AppState):
super().__init__()
self.state = state
self.source = FFmpegFileSource("path/to/your/360_video.mp4")
self.transform = EquilibEqui2Pers(output_width=1280, output_height=720, fov_x=90.0)
async def recv(self):
equi_frame_rgb = next(self.source)
rot = {"pitch": self.state.pitch, "yaw": self.state.yaw, "roll": self.state.roll}
perspective_frame = self.transform.transform(frame=equi_frame_rgb, rot=rot)
# ... create and return av.VideoFrame ...
# 4. Configure and run the server
if __name__ == "__main__":
data_handlers = {
"camera": on_camera_message,
# Add other handlers for "body_pose", "left_hand", etc.
}
server = WebRTCServer(
state_factory=AppState,
video_track_factory=ReprojectionTrack,
datachannel_handlers=data_handlers,
)
server.run()
Project Structure
.
├── examples/ # Example scripts showing how to use the server
├── src/
│ └── xr_robot_teleop_server/
│ ├── schemas/ # Data schemas for skeletons (OpenXR) and poses
│ ├── sources/ # Pluggable video sources (FFmpeg, OpenCV)
│ ├── streaming/ # Core WebRTC server logic
│ └── transforms/ # Pluggable video transformations (e.g., reprojection)
├── tests/ # Unit and integration tests
└── pyproject.toml # Project metadata and dependencies
Dependencies
- Core:
aiortc,fastapi,uvicorn,numpy,loguru - Transforms:
equilib - Video Sources:
opencv-python(optional),ffmpeg(must be in system PATH forFFmpegFileSource) - Visualization:
rerun-sdk
See pyproject.toml for detailed dependency information.
License
This project is licensed under the terms of the LICENSE file.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file xr_robot_teleop_server-0.1.8.tar.gz.
File metadata
- Download URL: xr_robot_teleop_server-0.1.8.tar.gz
- Upload date:
- Size: 14.6 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a4f50e58844d35e3d5dc92a4622720d4224f3e92b0424f9b423d6737e8594fc0
|
|
| MD5 |
b83cdefa51047a6d08567bb23bf7e27e
|
|
| BLAKE2b-256 |
fa2acde491f0b29c6d2867a6ec82f8be4eb8fc917930725a160b00941d7a4727
|
File details
Details for the file xr_robot_teleop_server-0.1.8-py3-none-any.whl.
File metadata
- Download URL: xr_robot_teleop_server-0.1.8-py3-none-any.whl
- Upload date:
- Size: 27.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3e2d005fd5b7a18ff94a5705c59afae80005fecfe4b673a5212e0defb3f91960
|
|
| MD5 |
5161906ab0a489c83a00026e154e6fce
|
|
| BLAKE2b-256 |
3d032a6a7a2b1ead133fa60774c57b506e86d382dcb9f9b54e287886c7cd4bfa
|