Skip to main content

Using pretrained encoder and language models to generate captions from multimedia inputs.

Project description

ClipCap

Using pretrained encoder and language models to generate captions from multimedia inputs, allowing high fidelity text generation using the rich textual detail already learned by pretrained LMs on tasks such as image captioning, VQA, audio captioning and more.

More details and results to come soon.

Installation

By default, the encoders remained uninstalled for ease of access. View the data preprocessing documentation for info on how to install these.

pip install git+https://github.com/TheoCoombes/ClipCap.git

Supported Encoders

  • CLIP for tasks such as Image Captioning, VQA etc.
  • CLAP for tasks such as Audio Captioning, Audio Question Answering, etc.

Data Preprocessing

You can run the data preprocess script using the command below. (More info)

python3 -m clipcap.preprocess --help

Training

You can run the training script using preprocessed data with the command below. (More info)

python3 -m clipcap.train --help

Acknowledgments

This repository is heavily based on @rmokady's original implementation of ClipCap and also contains modified versions of @rom1504's clip-inference and embedding-reader libraries. Many thanks to both for their amazing work :)

TODO

Improved documentation and eval + inference scripts to come soon.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ClipCap-1.0.0.tar.gz (18.6 kB view details)

Uploaded Source

Built Distribution

ClipCap-1.0.0-py3-none-any.whl (26.7 kB view details)

Uploaded Python 3

File details

Details for the file ClipCap-1.0.0.tar.gz.

File metadata

  • Download URL: ClipCap-1.0.0.tar.gz
  • Upload date:
  • Size: 18.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.8.12

File hashes

Hashes for ClipCap-1.0.0.tar.gz
Algorithm Hash digest
SHA256 6ae30a19bc67a5772f479021c9fbe83c478b273c44a249f64cc3bc987f80e63c
MD5 15a496ccf15ad7e07e5e078137e1f84e
BLAKE2b-256 c8cb16900d5360f79fdb99cd6a2a6b64799985e76451ea543eae3cca61f3e805

See more details on using hashes here.

File details

Details for the file ClipCap-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: ClipCap-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 26.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.8.12

File hashes

Hashes for ClipCap-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 60587f85ccbce75ed43f3c3dc8dd56add261dba0e6f3401edd280f522e83e95f
MD5 0fa86860af4bcc85c80133e0c98b5173
BLAKE2b-256 1e03e673ce27b2c9fc8f5ba2cd85347936acc25780bc26a3e3a5a6aa3cbbc341

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page