Skip to main content

An Efficient and Scalable Data Collection and Management Framework For Robotics Learning

Project description

🦊 Fog-RT-X

🦊 Fog-RT-X: An Efficient and Scalable Data Collection and Management Framework For Robotics Learning. Support Open-X-Embodiment, 🤗HuggingFace.

🦊 Fog-RT-X considers both speed 🚀 and memory efficiency 📈 with active metadata and lazily-loaded trajectory data. It supports flexible and distributed dataset partitioning. It provides native support to cloud storage.

Design Doc | Dataset Visualization

Install

pip install fog_x

Usage

import fog_x

# 🦊 Dataset Creation 
# from distributed dataset storage 
dataset = fog_x.Dataset(
    name="demo_ds",
    path="~/test_dataset", # can be AWS S3, Google Bucket! 
)  

# 🦊 Data collection: 
# create a new trajectory
episode = dataset.new_episode()
# collect step data for the episode
episode.add(feature = "arm_view", value = "image1.jpg")
# Automatically time-aligns and saves the trajectory
episode.close()

# 🦊 Data Loading:
# load from existing RT-X/Open-X datasets
dataset.load_rtx_episodes(
    name="berkeley_autolab_ur5",
    additional_metadata={"collector": "User 2"}
)

# 🦊 Data Management and Analytics: 
# Compute and memory efficient filter, map, aggregate, groupby
episode_info = dataset.get_episode_info()
desired_episodes = episode_info.filter(episode_info["collector"] == "User 2")

# 🦊 Data Sharing and Usage:
# Export and share the dataset as standard Open-X-Embodiment format
# it also supports hugging face, and more!
dataset.export(desired_episodes, format="rtx")
# Load with pytorch dataloader
torch.utils.data.DataLoader(dataset.as_pytorch_dataset(desired_episodes))

Design

🦊 Fog-RT-X recognizes most post-processing, analytics and management involves the trajectory-level data, such as tags, while actual trajectory steps are rarely read, written and transformed. Acessing and modifying trajectory data is very expensive and hard.

As a result, 🦊 Fog-RT-X proposes

  • a user-friendly metadata table via Pandas Datframe for speed and freedom
  • a LazyFrame from Polars for the trajectory dataset that only loads and transform the data if needed
  • parquet as storage format for distributed storage and columnar support compared to tensorflow records
  • Easy and automatic RT-X/Open-X dataset export and pytorch dataloading

More Coming Soon!

Currently we see a more than 60% space saving on some existing RT-X datasets. This can be even more by re-paritioning the dataset. Our next steps can be found in the planning doc. Feedback welcome through issues or PR to planning doc!

We also note we are at a beta-testing phase. We make our best effort to be backward-compatible but interfaces may be unstable.

Development

Read the CONTRIBUTING.md file.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fog_x-0.1.0b3.tar.gz (66.2 kB view details)

Uploaded Source

Built Distribution

fog_x-0.1.0b3-py3-none-any.whl (89.1 kB view details)

Uploaded Python 3

File details

Details for the file fog_x-0.1.0b3.tar.gz.

File metadata

  • Download URL: fog_x-0.1.0b3.tar.gz
  • Upload date:
  • Size: 66.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.10.4

File hashes

Hashes for fog_x-0.1.0b3.tar.gz
Algorithm Hash digest
SHA256 9ac88927688139a12e0b167571782a86197bf485bd0e7c33f52fe8c8f35ac8b1
MD5 ffef611d5e4a4d242f111d536cb32787
BLAKE2b-256 30c2e9f0a9befd283a5a96355eb295f2a3694b4c0bf94a0c51bd0113e1b13a5b

See more details on using hashes here.

File details

Details for the file fog_x-0.1.0b3-py3-none-any.whl.

File metadata

  • Download URL: fog_x-0.1.0b3-py3-none-any.whl
  • Upload date:
  • Size: 89.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.10.4

File hashes

Hashes for fog_x-0.1.0b3-py3-none-any.whl
Algorithm Hash digest
SHA256 f887f62b254160d47f7c916bdc218a61e1280460b5c005cd217c3d5463bea3b9
MD5 965a54e555b14a2be1fbf42b5c608a1b
BLAKE2b-256 65ab17c7c6af8dc01ae73e45b9ab44006254fff94ab6d7e900f824930cd059e7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page