Skip to main content

Vision Llama - Pytorch

Project description

Multi-Modality

Vision LLama

Implementation of VisionLLaMA from the paper: "VisionLLaMA: A Unified LLaMA Interface for Vision Tasks" in PyTorch and Zeta. PAPER LINK

install

$ pip install vision-llama

usage

import torch
from vision_llama.main import VisionLlama

# Forward Tensor
x = torch.randn(1, 3, 224, 224)

# Create an instance of the VisionLlamaBlock model with the specified parameters
model = VisionLlama(
    dim=768, depth=12, channels=3, heads=12, num_classes=1000
)


# Print the shape of the output tensor when x is passed through the model
print(model(x))

License

MIT

Citation

@misc{chu2024visionllama,
    title={VisionLLaMA: A Unified LLaMA Interface for Vision Tasks}, 
    author={Xiangxiang Chu and Jianlin Su and Bo Zhang and Chunhua Shen},
    year={2024},
    eprint={2403.00522},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

todo

  • Implement the AS2DRoPE rope, might just use axial rotary embeddings instead, my implementation is really bad
  • Implement the GSA attention, i implemented it but's bad
  • Add imagenet training script with distributed

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vision_llama-0.0.8.tar.gz (7.3 kB view details)

Uploaded Source

Built Distribution

vision_llama-0.0.8-py3-none-any.whl (7.5 kB view details)

Uploaded Python 3

File details

Details for the file vision_llama-0.0.8.tar.gz.

File metadata

  • Download URL: vision_llama-0.0.8.tar.gz
  • Upload date:
  • Size: 7.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.2 CPython/3.11.0 Darwin/23.3.0

File hashes

Hashes for vision_llama-0.0.8.tar.gz
Algorithm Hash digest
SHA256 5adc93a897c33fed5db0f4fa05f7ec6254986990f4f4691b38e39f2d9d02cb6a
MD5 1b95dd41e192fd3ccf2f78bf6f301437
BLAKE2b-256 5168d3bd820836cfb702b873d7af9adc2eda4300ebf9758abe5e90f1a076ff98

See more details on using hashes here.

File details

Details for the file vision_llama-0.0.8-py3-none-any.whl.

File metadata

  • Download URL: vision_llama-0.0.8-py3-none-any.whl
  • Upload date:
  • Size: 7.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.2 CPython/3.11.0 Darwin/23.3.0

File hashes

Hashes for vision_llama-0.0.8-py3-none-any.whl
Algorithm Hash digest
SHA256 e9ba5d07001b8115eff47e07bfbca15838d75c1f53065dfef5da0ccd2ffa7e28
MD5 71021388427164bc509ee2843abc9c5d
BLAKE2b-256 a97d3bdcd336d5f261367182e6f8c3549476c78238bc56b11a12be4f8e6ffa20

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page