Transfusion in Pytorch
Project description
Transfusion - Pytorch (wip)
Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI.
In this repo, we will substitute diffusion with flow matching given the success of Flux from Black Forest Labs (but will keep the original paper title given Transflow does not have the same ring). This repository will also attempt to extend to any number of modalities.
Install
$ pip install transfusion-pytorch
Usage
One modality, say images
from torch import randint, randn
from transfusion_pytorch import Transfusion
model = Transfusion(
num_text_tokens = 256,
dim_latent = 384,
transformer = dict(
dim = 512,
depth = 8
)
)
text_and_images = [
[randint(0, 256, (16,)), randn(4, 384), randint(0, 256, (8,)), randn(6, 384)],
[randint(0, 256, (16,)), randn(7, 384), randint(0, 256, (5,)), randn(2, 384), randint(0, 256, (9,))]
]
loss = model(text_and_images)
loss.backward()
# after much training
one_multimodal_sample = model.sample()
Multiple different modalities
from torch import randint, randn
from transfusion_pytorch import Transfusion
model = Transfusion(
num_text_tokens = 256,
dim_latent = (384, 192), # specify multiple latent dimensions
transformer = dict(
dim = 512,
depth = 8
)
)
# then for the Tensors of type float, you can pass a tuple[int, Tensor] and specify the modality index in the first position
text_images_and_audio = [
[randint(0, 256, (16,)), (0, randn(4, 384)), randint(0, 256, (8,)), (1, randn(6, 192))],
[randint(0, 256, (16,)), randn(7, 384), randint(0, 256, (5,)), (1, randn(2, 192)), randint(0, 256, (9,))]
]
loss = model(text_images_and_audio)
loss.backward()
# after much training
one_multimodal_sample = model.sample()
Citations
@inproceedings{Zhou2024TransfusionPT,
title = {Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model},
author = {Chunting Zhou and Lili Yu and Arun Babu and Kushal Tirumala and Michihiro Yasunaga and Leonid Shamis and Jacob Kahn and Xuezhe Ma and Luke Zettlemoyer and Omer Levy},
year = {2024},
url = {https://api.semanticscholar.org/CorpusID:271909855}
}
@misc{Rubin2024,
author = {Ohad Rubin},
url = {https://medium.com/@ohadrubin/exploring-weight-decay-in-layer-normalization-challenges-and-a-reparameterization-solution-ad4d12c24950}
}
@article{Nguyen2024MinPS,
title = {Min P Sampling: Balancing Creativity and Coherence at High Temperature},
author = {Minh Nguyen and Andrew Baker and Andreas Kirsch and Clement Neo},
journal = {ArXiv},
year = {2024},
volume = {abs/2407.01082},
url = {https://api.semanticscholar.org/CorpusID:270870613}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
transfusion_pytorch-0.0.30.tar.gz
(350.1 kB
view hashes)
Built Distribution
Close
Hashes for transfusion_pytorch-0.0.30.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | b70182c1978816f957e9b3f5054ea72c7a54288b53016bc0ea89390fd62e8241 |
|
MD5 | f69f1639836edbeb63e5097b71f1a27f |
|
BLAKE2b-256 | a8c54ac0e605f957b059d7a7ec4bd268c776460758928c21a34604d56af89f38 |
Close
Hashes for transfusion_pytorch-0.0.30-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 52341aaaffc4cfe855c2347641e0113b31d343e222beaaab787117bc546a600e |
|
MD5 | fcb4df96f95f79297528dfca5b60b311 |
|
BLAKE2b-256 | 931c7a4d2e5d0871f55b14ae29b93a3f7eba5dc92588d12049e23facb6a39aae |