Skip to main content

Vision Llama - Pytorch

Project description

Multi-Modality

Vision LLama

Implementation of VisionLLaMA from the paper: "VisionLLaMA: A Unified LLaMA Interface for Vision Tasks" in PyTorch and Zeta. PAPER LINK

install

$ pip install vision-llama

usage

import torch
from vision_llama.main import VisionLlama

# Forward Tensor
x = torch.randn(1, 3, 224, 224)

# Create an instance of the VisionLlamaBlock model with the specified parameters
model = VisionLlama(
    dim=768, depth=12, channels=3, heads=12, num_classes=1000
)


# Print the shape of the output tensor when x is passed through the model
print(model(x))

License

MIT

Citation

@misc{chu2024visionllama,
    title={VisionLLaMA: A Unified LLaMA Interface for Vision Tasks}, 
    author={Xiangxiang Chu and Jianlin Su and Bo Zhang and Chunhua Shen},
    year={2024},
    eprint={2403.00522},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

todo

  • Implement the AS2DRoPE rope, might just use axial rotary embeddings instead, my implementation is really bad
  • Implement the GSA attention, i implemented it but's bad
  • Add imagenet training script with distributed

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vision_llama-0.0.8.tar.gz (7.3 kB view hashes)

Uploaded Source

Built Distribution

vision_llama-0.0.8-py3-none-any.whl (7.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page