Swarms - Pytorch
Project description
BRAVE or Swarms of Vision Transformers
Implementation of the paper: "BRAVE : Broadening the visual encoding of vision-language models". BRAVE achieves state-of-the-art performance on a broad range of captioning and VQA benchmarks and significantly reduces the aforementioned issues of VLMs, while requiring a smaller number of trainable parameters than existing methods and having a more compressed representation.
install
pip3 install brave-torch
usage
import torch
from brave_torch.main import SwarmOfViTs
# IMG Tensor
x = torch.randn(1, 3, 224, 224)
# Model
model = SwarmOfViTs(
image_size=224,
patch_size=32,
encoder_dim=512,
encoder_depth=6,
encoder_heads=8,
num_of_vits=4
)
# Forward
out = model(x)
print(out)
Citations
Todo
- Citation link
- Citation Bibtex
- Diagram photo
- Implement Andromeda Base LLM architecture
- Provide multi-modal tokenizer
- Train and release the model
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
brave_torch-4.7.9.tar.gz
(6.2 kB
view hashes)
Built Distribution
Close
Hashes for brave_torch-4.7.9-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1ecd965fe9ef4ae554a31075cde450dc3f342bc29b44f0d4d4b591f388db7c83 |
|
MD5 | 74dc100e1d8e5ffd8ef63f25b7c07cca |
|
BLAKE2b-256 | eb7fe260d2829f894d9a9ab35c77dd76606566f9fc836a329393d3d0e3c3cba4 |