palme - Pytorch

These details have not been verified by PyPI

Project links

Homepage

Project description

🌴 PALM-E: A Multi-Modal AI Model

model architecture

This is the open source implementation of the SOTA multi-modality foundation model "PALM-E: An Embodied Multimodal Language Model" from Google, PALM-E is a single large embodied multimodal model, that can address a variety of embodied reasoning tasks, from a variety of observation modalities, on multiple embodiments, and further, exhibits positive transfer: the model benefits from diverse joint training across internet-scale language, vision, and visual-language domains.

PAPER LINK: PaLM-E: An Embodied Multimodal Language Model

Discord

Note

This is just the model architecture, no pretrained weights, no tokenizer
To actually conduct inference you would need to --> setup tokenizer for text and images -> train -> inference
If you are doing research into multi modal models and would like to train this model and release it open source join the agora lab by clicking on the banner!

Appreciation

All the creators in Agora, Join Agora the community of AI engineers changing the world with their creations.
LucidRains for inspiring me to devote myself to open source AI

🚀 Quick Start

Installation 📦

pip install palme

Usage 🎨

import torch
from palme.model import PalmE

#usage
img = torch.randn(1, 3, 256, 256)
caption = torch.randint(0, 20000, (1, 1024))

model = PalmE()
output = model(img, caption)
print(output.shape) # (1, 1024, 20000)

Training

Here is a summary table of the key training hyperparameters mentioned in the paper:

Hyperparameter	Value
Batch size	2048
Learning rate	1.5e-4
Warmup steps	10,000
Gradient accumulation steps	4
Weight decay	0.01
Dropout rate	0.1
Embedding dropout rate	0.1
Attention dropout rate	0.1
Optimizer	AdamW
Gradient clipping	1.0

The key details are:

Batch size of 2048
Learning rate of 1.5e-4 with 10k warmup steps
AdamW optimizer
Dropout of 0.1 on embeddings, attention, and full model
Weight decay of 0.01
Gradient clipping of 1.0

They used a fairly standard transformer hyperparameters configuration. The large batch size and gradient accumulation allows them to train huge models.

Set the environment variables:
- ENTITY_NAME: Your wandb project name
- OUTPUT_DIR: Directory to save the weights (e.g., ./weights)
- MASTER_ADDR: For distributed training
- MASTER_PORT For master port distributed training
- RANK- Number of nodes services
- WORLD_SIZE Number of gpus
Configure the training:
- Accelerate Config
- Enable Deepspeed 3
- Accelerate launch train.py

For more information, refer to the Training SOP.

Dataset Strategy

Here is a summary table of the key datasets mentioned in the paper:

Dataset	Tasks	Size	Link
TAMP	Robotic manipulation planning, VQA	96,000 scenes	Custom dataset
Language Table	Robotic manipulation planning	Custom dataset	Link
Mobile Manipulation	Robotic navigation and manipulation planning, VQA	2912 sequences	Based on SayCan dataset
WebLI	Image-text retrieval	66M image-caption pairs	Link
VQAv2	Visual question answering	1.1M questions on COCO images	Link
OK-VQA	Visual question answering requiring external knowledge	14,031 questions on COCO images	Link
COCO	Image captioning	330K images with captions	Link
Wikipedia	Text corpus	N/A	Link

The key robotics datasets were collected specifically for this work, while the larger vision-language datasets (WebLI, VQAv2, OK-VQA, COCO) are standard benchmarks in that field. The datasets range from tens of thousands of examples for the robotics domains to tens of millions for the internet-scale vision-language data.

Contribute || Be Part of the PALM-E Adventure 🤝

Your brilliance is needed! Join us, and together, let's make PALM-E even more awe-inspiring:

Get Your Copy: Fork the PALM-E repo.
Make It Local: Clone your fork.
Prep Your Tools: Install the necessities.
Discover & Innovate: Dive into the code.
Craft Your Magic: Branch and code away.
Show & Tell: Push your changes and craft a pull request.

🐞 Fixes, 🎨 enhancements, 📝 docs, or 💡 ideas – all are welcome! Let's shape the future of AI, hand in hand.

Citation

@article{driess2023palme,
  title={PALM-E: An Embodied Multimodal Language Model},
  author={Driess, Danny and Xia, Fei and Sajjadi, Mehdi S. M. and Lynch, Corey and Chowdhery, Aakanksha and Ichter, Brian and Wahid, Ayzaan and Tompson, Jonathan and Vuong, Quan and Yu, Tianhe and Huang, Wenlong and Chebotar, Yevgen and Sermanet, Pierre and Duckworth, Daniel and Levine, Sergey and Vanhoucke, Vincent and Hausman, Karol and Toussaint, Marc and Greff, Klaus and Zeng, Andy and Mordatch, Igor and Florence, Pete},
  journal={arXiv preprint arXiv:2303.03378},
  year={2023},
  url={https://doi.org/10.48550/arXiv.2303.03378}
}

Roadmap

URGENT: Debug Tokenizer, make sure multi-modal inputs work.
Create Dataset Strategy
Upload Training Documentation
Get Training running with multi-modal

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.1.2

Jan 29, 2024

0.0.9

Sep 5, 2023

0.0.7

Aug 17, 2023

0.0.6

Aug 15, 2023

0.0.5

Aug 15, 2023

0.0.4

Aug 11, 2023

0.0.3

Aug 9, 2023

0.0.2

Aug 3, 2023

0.0.1

Jul 28, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

palme-0.1.2.tar.gz (10.7 kB view details)

Uploaded Jan 29, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

palme-0.1.2-py3-none-any.whl (10.2 kB view details)

Uploaded Jan 29, 2024 Python 3

File details

Details for the file palme-0.1.2.tar.gz.

File metadata

Download URL: palme-0.1.2.tar.gz
Upload date: Jan 29, 2024
Size: 10.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.3.2 CPython/3.11.0 Darwin/22.4.0

File hashes

Hashes for palme-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`7cdd3bc42cffaa21e311dcd72d5f3666f7a495378cdfa2891a8a5f1eb05bac65`
MD5	`f862965aaa9406c6b1dba8706f56ec66`
BLAKE2b-256	`38bf9f20ac19aa6d9a78ef79f49055c9bc2690e09534f14d6982a70168a15e0d`

See more details on using hashes here.

File details

Details for the file palme-0.1.2-py3-none-any.whl.

File metadata

Download URL: palme-0.1.2-py3-none-any.whl
Upload date: Jan 29, 2024
Size: 10.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.3.2 CPython/3.11.0 Darwin/22.4.0

File hashes

Hashes for palme-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`fb48ddd8016b367f0c54de45e70d13ddca35084fd209e3aa88ad41a01db5a560`
MD5	`54cc2aadc335778920006884a279d6d9`
BLAKE2b-256	`417783cd0a4624eb865250eefa434b97086ea7469320b6ad31b3711ed01b112e`

See more details on using hashes here.

palme 0.1.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

🌴 PALM-E: A Multi-Modal AI Model

Note

Appreciation

🚀 Quick Start

Installation 📦

Usage 🎨

Training

Dataset Strategy

Contribute || Be Part of the PALM-E Adventure 🤝

Citation

Roadmap

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes