Transfusion in Pytorch
Project description
Transfusion - Pytorch (wip)
Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI.
Once completed, will also extend this to flow matching, as well as audio, video, perhaps even policies.
Install
$ pip install transfusion-pytorch
Usage
import torch
from transfusion_pytorch import Transfusion
model = Transfusion(
num_text_tokens = 256,
transformer = dict(
dim = 512,
depth = 8
)
)
text_ids = torch.randint(0, 256, (1, 1024))
modality_tokens = [[
torch.randn(1, 6, 512),
torch.randn(1, 4, 512)
]]
modality_positions = [[
(2, 6),
(10, 4)
]] # (offset, length)
loss, breakdown = model(
text_ids,
modality_tokens = modality_tokens,
modality_positions = modality_positions
)
loss.backward()
Citations
@inproceedings{Zhou2024TransfusionPT,
title = {Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model},
author = {Chunting Zhou and Lili Yu and Arun Babu and Kushal Tirumala and Michihiro Yasunaga and Leonid Shamis and Jacob Kahn and Xuezhe Ma and Luke Zettlemoyer and Omer Levy},
year = {2024},
url = {https://api.semanticscholar.org/CorpusID:271909855}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
transfusion_pytorch-0.0.1.tar.gz
(345.2 kB
view hashes)
Built Distribution
Close
Hashes for transfusion_pytorch-0.0.1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4abdd746f1004b9f97b16082a9bfb4fc638108936537c3d3100bbbb1675e4846 |
|
MD5 | 4949fcf92a5e6a3e063b733e9fca8fb0 |
|
BLAKE2b-256 | 0a1fa54028521812607b5b44e8b781dc24bb2573b33f539f34c5e4616c5d46eb |
Close
Hashes for transfusion_pytorch-0.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b1d546123de13bda1a0236c2ec90e97b39fd2d006486c15372db19e079a206e4 |
|
MD5 | ccd6b7b57a38ea67928e08e247099460 |
|
BLAKE2b-256 | 87279d3fe4b02ec945fbc60653c1a05a33392e021fdaefab4776a76052faf80d |