Skip to main content

LIV: Language-Image Representations and Rewards for Robotic Control

Project description

LIV: Language-Image Representations and Rewards for Robotics Control

International Conference on Machine Learning (ICML), 2023

[Project Page] [Arxiv][Paper] [Dataset] [HuggingFace Model Page]

Jason Yecheng Ma1, Vikash Kumar2, Amy Zhang2, Osbert Bastani1, Dinesh Jayaraman1

1University of Pennsylvania, 2Meta AI

This is the official repository for LIV, an algorithm for pre-training, fine-tuning, and reward learning for language-conditioned robotic control. This repository contains examples for using the pre-trained LIV model as well as training LIV from scratch using any custom video dataset.

Table of Contents

Installation

LIV Usage Examples

LIV Training

LIV Fine-Tuned Reward Curve Visualization

Installation

Create a conda environment where the packages will be installed.

conda create --name liv-env python=3.9
conda activate liv-env

Then, in the root directory of this repository, run:

pip install -e .;
cd liv/models/clip; pip install -e .;

LIV Usage Examples

Quick start:

from liv import load_liv
liv = load_liv()
liv.eval()

The following code snippet demonstrates an example for loading the model as well as performing inference on an example image and text (python liv/examples/liv_static.py):

import clip
import torch
import torchvision.transforms as T
from PIL import Image 

from liv import load_liv

device = "cuda" if torch.cuda.is_available() else "cpu"

# loading LIV
liv = load_liv()
liv.eval()
transform = T.Compose([T.ToTensor()])

# pre-process image and text
image = transform(Image.open("sample_video/frame_0000033601.jpg")).unsqueeze(0).to(device)
text = clip.tokenize(["open microwave", "close microwave", "wipe floor"]).to(device)

# compute LIV image and text embedding
with torch.no_grad():
    img_embedding = liv(input=image, modality="vision")
    text_embedding = liv(input=text, modality="text")

# compute LIV value
img_text_value = liv.module.sim(img_embedding, text_embedding)
# Output: [ 0.1151, -0.0151, -0.0997]

We have also included an example for generating multi-modal reward curves on text-annotated videos. You can try it here:

cd liv/examples
python liv_example.py

This should generate the following animated reward curves in liv/examples:

Training LIV Representation

Our codebase supports training LIV on both the EpicKitchen dataset that was used in pre-training our released LIV model as well as any custom video dataset. The video dataset directory should use the following structure:

my_dataset_path/
    video0/
        0.png
        1.png
        ...
    video1/
    video2/
    ...
    manifest.csv

The manifest.csv file should contain rows of directory, text, num_frames, which indicates the absolute path, text annotation, and length of each video, respectively.

Then, you can use LIV to fine-tune a pre-trained vision-language model (e.g., LIV, CLIP) on your dataset by (1) adding a <my_dataset_name>.yaml file that specifies the dataset name and path in /cfgs/dataset:

python train_liv.py training=finetune dataset=my_dataset_name

We have provided an example of LIV fine-tuning using the realrobot dataset we used in the paper.

For EpicKitchen or equivalent large-scale pre-training, we suggest using config pretrain.yaml (the config for the released LIV model):

python train_liv.py  training=pretrain dataset=epickitchen

Each training run will generate a training run folder under train_liv_realrobot and the reward curves for intermediate model snapshots will be saved in \reward_curves of the run folder.

Multi-Modal Reward Curve Generation

We can use the same training code to also only generate the (animated) reward curves by setting eval=True

python train_liv.py eval=true dataset=epickitchen animate=True

We can also specify a model path (e.g., snapshot.pt saved in a run folder) and generate reward curves on the dataset the model is LIV fine-tuned with:

python train_liv.py eval=True load_snap=PATH_TO_LIV_MODEL dataset=realrobot animate=True 

In the run folder, you should see animated reward curves like the following:

License

The source code in this repository is licensed under the CC BY-NC 4.0 License.

Citation

If you find this repository or paper useful for your research, please cite

@article{ma2023liv,
  title={LIV: Language-Image Representations and Rewards for Robotic Control},
  author={Ma, Yecheng Jason and Liang, William and Som, Vaidehi and Kumar, Vikash and Zhang, Amy and Bastani, Osbert and Jayaraman, Dinesh},
  journal={arXiv preprint arXiv:2306.00958},
  year={2023}
}

Ackowledgements

Parts of this code are adapted from VIP and CLIP.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

LIV-robotics-0.1.3.tar.gz (1.4 MB view details)

Uploaded Source

Built Distribution

LIV_robotics-0.1.3-py3-none-any.whl (1.4 MB view details)

Uploaded Python 3

File details

Details for the file LIV-robotics-0.1.3.tar.gz.

File metadata

  • Download URL: LIV-robotics-0.1.3.tar.gz
  • Upload date:
  • Size: 1.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.12

File hashes

Hashes for LIV-robotics-0.1.3.tar.gz
Algorithm Hash digest
SHA256 8b7dde3d6205b0bd346116ef847ed4feb3d21b08a2563e5a426945179d25db50
MD5 ca106a9b7c99b3aec9c9703b6904a9c1
BLAKE2b-256 fa0169099d5ac5e8dd23cafe4fe4e325fbd333a966bf6c269e27074740198cd6

See more details on using hashes here.

File details

Details for the file LIV_robotics-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: LIV_robotics-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 1.4 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.12

File hashes

Hashes for LIV_robotics-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 28e9ec521be11e14a66cd6d2323d97a5348eb77515b2ba7881cf0deecb60a080
MD5 bb33bb4c6c80b371505ae2c5ce22617e
BLAKE2b-256 3f209d7642add35bca087a607d5bb326889052949ed67575826d99badda9dc97

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page