Swarms - Pytorch
Project description
BRAVE or Swarms of Vision Transformers
Implementation of the paper: "BRAVE : Broadening the visual encoding of vision-language models". BRAVE achieves state-of-the-art performance on a broad range of captioning and VQA benchmarks and significantly reduces the aforementioned issues of VLMs, while requiring a smaller number of trainable parameters than existing methods and having a more compressed representation
install
pip3 install brave-torch
usage
import torch
from brave_torch.main import SwarmOfViTs
# IMG Tensor
x = torch.randn(1, 3, 224, 224)
# Model
model = SwarmOfViTs(
image_size=224,
patch_size=32,
encoder_dim=512,
encoder_depth=6,
encoder_heads=8,
num_of_vits=4
)
# Forward
out = model(x)
print(out)
Citations
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
brave_torch-4.7.8.tar.gz
(4.4 kB
view hashes)
Built Distribution
Close
Hashes for brave_torch-4.7.8-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1949d375e9350d0df15e17db7c9788ca1461600d2a39aae09c5eb69a6be47a1c |
|
MD5 | d999bbd470f21f0b59b4e032e1dd60d9 |
|
BLAKE2b-256 | 5833926a4797d488029cd03b2e4cda80ebbc8d67ee926ab0b053dd54b88f3a00 |