Skip to main content

Qwen VL - Pytorch

Project description

Multi-Modality

Qwen-VL

My personal implementation of the model from "Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities", they haven't released model code yet sooo... For more details, please refer to the full paper.

Install

pip3 install qwen


Usage

# Importing the necessary libraries
import torch
from qwen import Qwen

# Creating an instance of the Qwen model
model = Qwen()

# Generating random text and image tensors
text = torch.randint(0, 20000, (1, 1024))
img = torch.randn(1, 3, 256, 256)

# Passing the image and text tensors through the model
out = model(img, text)  # (1, 1024, 20000)

Todo

  • Position aware vision language adapter, compresses image features. Singer layer cross attention module inited randomly => group of trainable embeddings as query vectors + image features from the visual encoder as keys for cross attention ops => OUTPUT: compresses visual feature sequence to a fixed lnegth of 256, 2d absolute positional encodings are integrated into the cross attentions mechanisms query key pairs => compressed feature sequence of length of 256 => fed into decoder llm

  • Bounding Boxes, for any given accurate bounding box, a norm process is applied in the range [0, 1000] and transformed into a string format (Xtope, Ytople)(Xottomright, Ybottomright) -> the string is tokenized as text and does not require positional vocabulary. Detection strings and regular text strings, two special tokens and are added to the beginning and end of the bounding box string. + another sed of special tokens ( and ) is introduced.

Citations

Please use the following to cite this work:

@article{bai2023qwen,
  title={Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities},
  author={Bai, Jinze and Bai, Shuai and Yang, Shusheng and Wang, Shijie and Tan, Sinan and Wang, Peng and Lin, Junyang and Zhou, Chang and Zhou, Jingren},
  journal={arXiv preprint arXiv:2308.12966},
  year={2023},
  url={https://doi.org/10.48550/arXiv.2308.12966}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

qwen-0.1.1.tar.gz (4.4 kB view details)

Uploaded Source

Built Distribution

qwen-0.1.1-py3-none-any.whl (4.3 kB view details)

Uploaded Python 3

File details

Details for the file qwen-0.1.1.tar.gz.

File metadata

  • Download URL: qwen-0.1.1.tar.gz
  • Upload date:
  • Size: 4.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.2 CPython/3.11.0 Darwin/22.4.0

File hashes

Hashes for qwen-0.1.1.tar.gz
Algorithm Hash digest
SHA256 3aa2d2afd1c2842909f2e59ffce16a53fb6c02ba0993633d128dee17905c6afe
MD5 288143190cff778089d83febc14f526e
BLAKE2b-256 55ec182ead9028328d988eb8f55b1da46d0e90789cfaa733e6cacae0d6c671dc

See more details on using hashes here.

File details

Details for the file qwen-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: qwen-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 4.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.2 CPython/3.11.0 Darwin/22.4.0

File hashes

Hashes for qwen-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 5c18e1e895195079ea7be7ee332c6eb2159a3dfddef2b47ef56daee5bd104d6c
MD5 b4ea28885a28d24779956b2323c1b5eb
BLAKE2b-256 c2ad74d014e77c54a5221a67167184a233b18936cb9fb24ea58e0562ec781aea

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page