Transfusion in Pytorch
Project description
Transfusion - Pytorch (wip)
Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI.
Once completed, will also extend this to flow matching, as well as audio, video, perhaps even policies.
Install
$ pip install transfusion-pytorch
Usage
One modality, say images
from torch import randint, randn
from transfusion_pytorch import Transfusion
model = Transfusion(
num_text_tokens = 256,
dim_latent = 384,
transformer = dict(
dim = 512,
depth = 8
)
)
text_and_images = [
[randint(0, 256, (16,)), randn(4, 384), randint(0, 256, (8,)), randn(6, 384)],
[randint(0, 256, (16,)), randn(7, 384), randint(0, 256, (5,)), randn(2, 384), randint(0, 256, (9,))]
]
loss = model(text_and_images)
loss.backward()
# after much training
one_multimodal_sample = model.sample()
Multiple different modalities
from torch import randint, randn
from transfusion_pytorch import Transfusion
model = Transfusion(
num_text_tokens = 256,
dim_latent = (384, 192), # specify multiple latent dimensions
transformer = dict(
dim = 512,
depth = 8
)
)
# then for the Tensors of type float, you can pass a tuple[int, Tensor] and specify the modality index in the first position
text_images_and_audio = [
[randint(0, 256, (16,)), (0, randn(4, 384)), randint(0, 256, (8,)), (1, randn(6, 192))],
[randint(0, 256, (16,)), randn(7, 384), randint(0, 256, (5,)), (1, randn(2, 192)), randint(0, 256, (9,))]
]
loss = model(text_images_and_audio)
loss.backward()
# after much training
one_multimodal_sample = model.sample()
Citations
@inproceedings{Zhou2024TransfusionPT,
title = {Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model},
author = {Chunting Zhou and Lili Yu and Arun Babu and Kushal Tirumala and Michihiro Yasunaga and Leonid Shamis and Jacob Kahn and Xuezhe Ma and Luke Zettlemoyer and Omer Levy},
year = {2024},
url = {https://api.semanticscholar.org/CorpusID:271909855}
}
@misc{Rubin2024,
author = {Ohad Rubin},
url = {https://medium.com/@ohadrubin/exploring-weight-decay-in-layer-normalization-challenges-and-a-reparameterization-solution-ad4d12c24950}
}
@article{Nguyen2024MinPS,
title = {Min P Sampling: Balancing Creativity and Coherence at High Temperature},
author = {Minh Nguyen and Andrew Baker and Andreas Kirsch and Clement Neo},
journal = {ArXiv},
year = {2024},
volume = {abs/2407.01082},
url = {https://api.semanticscholar.org/CorpusID:270870613}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
transfusion_pytorch-0.0.28.tar.gz
(349.6 kB
view hashes)
Built Distribution
Close
Hashes for transfusion_pytorch-0.0.28.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 51c33ecd6c3c18e663a6a181493028bce0f9aee0cf339a9328f40d7b645966f6 |
|
MD5 | e7ce24ed9d1cbccbf2c7379e761259e4 |
|
BLAKE2b-256 | 52cfc4e61143dc4453e3bf5e99db0b026147ad54194229b309c199783fa43b61 |
Close
Hashes for transfusion_pytorch-0.0.28-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e6d48006eccf5741e2f4812570e78048d9b923bc52fec0a70880c1430d79df7a |
|
MD5 | b13d2b5e82e503999fe2628b893a8ef3 |
|
BLAKE2b-256 | 29b739087dc1f5ba7dd82f7f7a9e1198e41e0f679b08fe20b7b7551aa891d855 |