A modern implementation of simple image captioning
Project description
Modern-Caption
Description:
This Python script provides functionality for generating image captions using the "modern-caption" library. It utilizes CLIP (Contrastive Language-Image Pretraining) and GPT-2 models for image encoding and caption generation, respectively. Users can generate captions for images using different pre-trained models and decoding methods.
Installation:
-
Dependencies:
torchcliptransformersnumpyPILscikit-image
-
Setup:
- Ensure you have Python installed on your system.
- Dependencies will be installed during setup.
- Setup will not overwrite existing installs of PyTorch or Torchvision.
- Install using pip:
pip install modern-caption
Usage:
-
Importing the Module:
from mcaption import Caption
-
Initializing the Caption Generator:
cap = Caption(model='conceptual', device='cpu', prefix_length=10)
-
Generating Captions:
import skimage.io as io image = io.imread("images/cover.jpg") # Load the image caption_conceptual = cap.predict(image, beam=True) # Generate caption with beam search print("Conceptual Model Caption:", caption_conceptual)
Explanation:
- The
Captionclass initializes the image-captioning functionality with options to specify the model, device, and other parameters. - Users can create multiple instances of the
Captionclass to compare different models or share common resources. - The
predictmethod generates captions for images using the specified model and decoding method.
Examples:
from mcaption import Caption
import skimage.io as io
# Initialize Caption generator with the 'conceptual' model
cap = Caption(model='conceptual', device='cpu', prefix_length=10)
# Load an image and generate a caption using beam search
image = io.imread("images/cover.jpg")
caption_conceptual = cap.predict(image, beam=True)
print("Conceptual Model Caption:", caption_conceptual)
# Initialize another Caption generator with the 'coco' model, inheriting from the previous one
cap2 = Caption(model='coco', inherit=cap)
# Generate a caption using greedy decoding
caption_coco = cap2.predict(image, beam=False)
print("COCO Model Caption:", caption_coco)
Notes:
- The script demonstrates how to initialize and use the image-captioning functionality provided by the "modern-caption" library.
- Users can experiment with different models and decoding methods to obtain varied captions.
- Ensure that the image paths are correct and accessible.
- This library used the following repository for conceptual reference.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mcaption-0.0.6.tar.gz.
File metadata
- Download URL: mcaption-0.0.6.tar.gz
- Upload date:
- Size: 6.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.0.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f719e58173559f13bd27618448fdd5eae5a786fdf42709d1b59bb97e199e8884
|
|
| MD5 |
a0af1185b2827283f6f2200374d62004
|
|
| BLAKE2b-256 |
d2193cdaeb4230cfb637e3ac29e27bf972c7f73a85f0897257309c445d57b310
|
File details
Details for the file mcaption-0.0.6-py3-none-any.whl.
File metadata
- Download URL: mcaption-0.0.6-py3-none-any.whl
- Upload date:
- Size: 7.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.0.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2bdf234dc97fd15d394820ad6da7d4e1442c638c1037ab47b93adf27f713a82e
|
|
| MD5 |
2e974b3e8fa4dbd8537ad47bae1421fb
|
|
| BLAKE2b-256 |
75165b1586317ce5a9bc4f7e6638649c9ab5f6345417d73e0d405073a125dc77
|