Using pretrained encoder and language models to generate captions from multimedia inputs.
Project description
ClipCap
Using pretrained encoder and language models to generate captions from multimedia inputs, allowing high fidelity text generation using the rich textual detail already learned by pretrained LMs on tasks such as image captioning, VQA, audio captioning and more.
More details and results to come soon.
Installation
By default, the encoders remained uninstalled for ease of access. View the data preprocessing documentation for info on how to install these.
pip install git+https://github.com/TheoCoombes/ClipCap.git
Supported Encoders
- CLIP for tasks such as Image Captioning, VQA etc.
- CLAP for tasks such as Audio Captioning, Audio Question Answering, etc.
Data Preprocessing
You can run the data preprocess script using the command below. (More info)
python3 -m clipcap.preprocess --help
Training
You can run the training script using preprocessed data with the command below. (More info)
python3 -m clipcap.train --help
Acknowledgments
This repository is heavily based on @rmokady's original implementation of ClipCap and also contains modified versions of @rom1504's clip-inference and embedding-reader libraries. Many thanks to both for their amazing work :)
TODO
Improved documentation and eval + inference scripts to come soon.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file ClipCap-1.0.0.tar.gz
.
File metadata
- Download URL: ClipCap-1.0.0.tar.gz
- Upload date:
- Size: 18.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.8.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6ae30a19bc67a5772f479021c9fbe83c478b273c44a249f64cc3bc987f80e63c |
|
MD5 | 15a496ccf15ad7e07e5e078137e1f84e |
|
BLAKE2b-256 | c8cb16900d5360f79fdb99cd6a2a6b64799985e76451ea543eae3cca61f3e805 |
File details
Details for the file ClipCap-1.0.0-py3-none-any.whl
.
File metadata
- Download URL: ClipCap-1.0.0-py3-none-any.whl
- Upload date:
- Size: 26.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.8.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 60587f85ccbce75ed43f3c3dc8dd56add261dba0e6f3401edd280f522e83e95f |
|
MD5 | 0fa86860af4bcc85c80133e0c98b5173 |
|
BLAKE2b-256 | 1e03e673ce27b2c9fc8f5ba2cd85347936acc25780bc26a3e3a5a6aa3cbbc341 |