Skip to main content

Screen AI - Pytorch

Project description

Multi-Modality

Screen AI

Implementation of the ScreenAI model from the paper: "A Vision-Language Model for UI and Infographics Understanding". The flow is: img + text -> patch sizes -> vit -> embed + concat -> attn + ffn -> cross attn + ffn + self attn -> to out. PAPER LINK:

Install

pip3 install screenai

Usage

import torch
from screenai.main import ScreenAI

# Create a tensor for the image
image = torch.rand(1, 3, 224, 224)

# Create a tensor for the text
text = torch.randn(1, 1, 512)

# Create an instance of the ScreenAI model with specified parameters
model = ScreenAI(
    patch_size=16,
    image_size=224,
    dim=512,
    depth=6,
    heads=8,
    vit_depth=4,
    multi_modal_encoder_depth=4,
    llm_decoder_depth=4,
    mm_encoder_ff_mult=4,
)

# Perform forward pass of the model with the given text and image tensors
out = model(text, image)

# Print the shape of the output tensor
print(out)

License

MIT

Citation

@misc{baechler2024screenai,
    title={ScreenAI: A Vision-Language Model for UI and Infographics Understanding}, 
    author={Gilles Baechler and Srinivas Sunkara and Maria Wang and Fedir Zubach and Hassan Mansoor and Vincent Etter and Victor Cărbune and Jason Lin and Jindong Chen and Abhanshu Sharma},
    year={2024},
    eprint={2402.04615},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

Todo

  • Implement the nn.ModuleList([]) in the encoder and decoder

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

screenai-0.0.8.tar.gz (6.9 kB view details)

Uploaded Source

Built Distribution

screenai-0.0.8-py3-none-any.whl (6.7 kB view details)

Uploaded Python 3

File details

Details for the file screenai-0.0.8.tar.gz.

File metadata

  • Download URL: screenai-0.0.8.tar.gz
  • Upload date:
  • Size: 6.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.2 CPython/3.11.0 Darwin/22.4.0

File hashes

Hashes for screenai-0.0.8.tar.gz
Algorithm Hash digest
SHA256 88bfa0d00baa0c01cb8bca8010f679d1034ac13c0b4918763bb0e3121151169d
MD5 449d9518ea058b0b1f33050a0f407021
BLAKE2b-256 d8c0a9e62577833c187bf32933aa338d288ddada1e8420b273857cc3f6b3e075

See more details on using hashes here.

File details

Details for the file screenai-0.0.8-py3-none-any.whl.

File metadata

  • Download URL: screenai-0.0.8-py3-none-any.whl
  • Upload date:
  • Size: 6.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.2 CPython/3.11.0 Darwin/22.4.0

File hashes

Hashes for screenai-0.0.8-py3-none-any.whl
Algorithm Hash digest
SHA256 83728637490180adb0c2d8eabc359d82a4f70cea79f53e960a7daed2e0056ef6
MD5 7dcf6222fc66f413f4faf92f86f302df
BLAKE2b-256 842a14fc880153ea6fcd779d30188c46061260c5624f6b6284493ea1b0a4fc3d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page