Skip to main content

vima - Pytorch

Project description

Multi-Modality

VIM

A simple implementation of "VIMA: General Robot Manipulation with Multimodal Prompts"

Original implementation Link

Appreciation

  • Lucidrains
  • Agorians

Install

pip install vima


Usage

import torch
from vima import Vima

# Generate a random input sequence
x = torch.randint(0, 256, (1, 1024)).cuda()

# Initialize VIMA model
model = Vima()

# Pass the input sequence through the model
output = model(x)

MultiModal Iteration

  • Pass in text and and image tensors into vima
import torch
from vima.vima import VimaMultiModal

#usage
img = torch.randn(1, 3, 256, 256)
text = torch.randint(0, 20000, (1, 1024))


model = VimaMultiModal()
output = model(text, img)

License

MIT

Citations

@inproceedings{jiang2023vima,
  title     = {VIMA: General Robot Manipulation with Multimodal Prompts},
  author    = {Yunfan Jiang and Agrim Gupta and Zichen Zhang and Guanzhi Wang and Yongqiang Dou and Yanjun Chen and Li Fei-Fei and Anima Anandkumar and Yuke Zhu and Linxi Fan},
  booktitle = {Fortieth International Conference on Machine Learning},
  year      = {2023}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vima-0.0.2.tar.gz (25.3 kB view hashes)

Uploaded Source

Built Distribution

vima-0.0.2-py3-none-any.whl (26.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page