Paper - Pytorch
Project description
Palm2 Adapter
Implementation of "PaLM2-VAdapter:" from the multi-modal model paper: "PaLM2-VAdapter: Progressively Aligned Language Model Makes a Strong Vision-language Adapter".
This model uses a perceiver resampler with a depth of 1 + a tiny palm to efficiently learn the features behind the images and then map them to the same space as the big model.
install
$ pip install palm2-vadapter
usage
import torch
from palm_vadapter.main import PaLM2VAdapter
# Random text and image tensors
text = torch.randint(0, 1000, (1, 32), dtype=torch.long)
# Image tensor
img = torch.randn(1, 3, 224, 224)
# Initialize PaLM2VAdapter model
model = PaLM2VAdapter(
tiny_dim=512,
dim=512,
num_tokens=10000,
seq_length=32,
depth=6,
heads=8,
image_size=224,
patch_size=16,
)
# Forward pass through the model
out = model(text, img)
# Print the shape of the output
print(out.shape)
License
MIT
Citation
@misc{xiao2024palm2vadapter,
title={PaLM2-VAdapter: Progressively Aligned Language Model Makes a Strong Vision-language Adapter},
author={Junfei Xiao and Zheng Xu and Alan Yuille and Shen Yan and Boyu Wang},
year={2024},
eprint={2402.10896},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
palm_vadapter-0.0.1.tar.gz
(7.1 kB
view hashes)
Built Distribution
Close
Hashes for palm_vadapter-0.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 89668b235f180dc3269cd5703cc5616854b9dfd7ebe80a9655b3dd37bd2208c3 |
|
MD5 | b0811b81d459f9a54b4ed8831ad5a263 |
|
BLAKE2b-256 | 12050a2ded7ee2e0afda301e009ec36097ade6f137ccb11b423595c94eda44ed |