Pali - PyTorch

These details have not been verified by PyPI

Project links

GitHub Statistics

Project description

Pali: A Multimodal Model

Discord

The open source implementation of the Multi-Modality AI model from "PaLI: Scaling Language-Image Learning in 100+ Languages"

🌟 Appreciation

Big bear hugs 🐻💖 to LucidRains for the fab x_transformers and for championing the open source AI cause.

🚀 Quick Start

pip install pali-torch

🧙 Usage

import torch
from pali.model import Pali

model = Pali()

img = torch.randn(1, 3, 256, 256)
prompt = torch.randint(0, 256, (1, 1024))
mask = torch.ones(1, 1024).bool()
output_text = torch.randint(0, 256, (1, 1024))

result = model.process(img, prompt, output_text, mask)
print(result)

result = result.backward()
print(result)

Datasets Strategy

Dataset strategy as closely shown in the paper.

Here is a markdown table with metadata and links to the datasets on HuggingFace for the datasets used:

Dataset	Description	Size	Languages	Link
WebLI	Large-scale web crawled image-text dataset	10B images, 12B captions	109 languages	Private
CC3M	Conceptual Captions dataset	3M image-text pairs	English	Link
CC3M-35L	Translated version of CC3M to 35 languages	105M image-text pairs	36 languages	Private
VQAv2	VQA dataset built on COCO images	204K images, 1.1M QA pairs	English	Link
VQ2A-CC3M	VQA dataset built from CC3M	3M image-text pairs	English	Private
VQ2A-CC3M-35L	Translated version of VQ2A-CC3M to 35 languages	105M image-text pairs	36 languages	Private
Open Images	Large scale image dataset	9M images with labels	English	Link
Visual Genome	Image dataset with dense annotations	108K images with annotations	English	Link
Object365	Image dataset for object detection	500K images with labels	English	Private

The key datasets used for pre-training PaLI include:

WebLI: A large-scale multilingual image-text dataset crawled from the web, comprising 10B images and 12B captions in 109 languages.
CC3M-35L: CC3M Conceptual Captions dataset machine translated into 35 additional languages, totaling 105M image-text pairs in 36 languages.
VQ2A-CC3M-35L: VQA dataset based on CC3M, also translated into 35 languages.

The model was evaluated on diverse tasks using standard datasets like VQAv2, Open Images, COCO Captions etc. Links and details provided above.

Todo

Make a table of datasets used in paper,
Provide training script
Provide usage/inference scripts

🎉 Features

Double the Power: MT5 for text and ViT for images - Pali's the superhero we didn't know we needed! 💪📖🖼️
Winning Streak: With roots in the tried-and-true MT5 & ViT, success is in Pali's DNA. 🏆
Ready, Set, Go: No fuss, no muss! Get Pali rolling in no time. ⏱️
Easy-Peasy: Leave the heavy lifting to Pali and enjoy your smooth sailing. 🛳️

🌆 Real-World Use-Cases

E-commerce: Jazz up those recs! Understand products inside-out with images & descriptions. 🛍️
Social Media: Be the smart reply guru for posts with pics & captions. 📱
Healthcare: Boost diagnostics with insights from images & textual data. 🏥

📜 License

MIT

📚 Citation

@inproceedings{chen2022pali,
  title={PaLI: Scaling Language-Image Learning in 100+ Languages},
  author={Chen, Xi and Wang, Xiao},
  booktitle={Conference on Neural Information Processing Systems (NeurIPS)},
  year={2022}
}

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

Release history Release notifications | RSS feed

0.0.9

Mar 20, 2024

0.0.8

Mar 20, 2024

0.0.7

Nov 3, 2023

0.0.6

Nov 3, 2023

0.0.5

Sep 3, 2023

This version

0.0.4

Sep 3, 2023

0.0.3

Aug 5, 2023

0.0.2

Aug 3, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pali_torch-0.0.4.tar.gz (25.6 kB view hashes)

Uploaded Sep 3, 2023 Source

Built Distribution

pali_torch-0.0.4-py3-none-any.whl (24.2 kB view hashes)

Uploaded Sep 3, 2023 Python 3

Hashes for pali_torch-0.0.4.tar.gz

Hashes for pali_torch-0.0.4.tar.gz
Algorithm	Hash digest
SHA256	`984a7a354e89d6d4d0f1b030fe2a2f1f127d38b89a9b931eea2625d066df5800`
MD5	`d0355e8fb3d426edf01cc01a8f55a45f`
BLAKE2b-256	`5f5fb44bfe9c0ab9bc0b987885ad950d0b968e453aa587c881890c1cf380f734`

Hashes for pali_torch-0.0.4-py3-none-any.whl

Hashes for pali_torch-0.0.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`df3da3d295a6f551026d3112fa6106c83014ecf6c2361a4062471eed37cb6647`
MD5	`5fd9efc853bde06245a9e3d508a5b503`
BLAKE2b-256	`b37e2364e464d143cb2409632270a811457ca7100848f63f84838f09ba435ab8`