Skip to main content

A modern implementation of simple image captioning

Project description

Modern-Caption


Description:

This Python script provides functionality for generating image captions using the "modern-caption" library. It utilizes CLIP (Contrastive Language-Image Pretraining) and GPT-2 models for image encoding and caption generation, respectively. Users can generate captions for images using different pre-trained models and decoding methods.


Installation:

  1. Dependencies:

    • torch
    • clip
    • transformers
    • numpy
    • PIL
    • scikit-image
  2. Setup:

    • Ensure you have Python installed on your system.
    • Dependencies will be installed during setup.
    • Setup will not overwrite existing installs of PyTorch or Torchvision.
    • Install using pip:
      pip install modern-caption
      

Usage:

  1. Importing the Module:

    from mcaption import Caption
    
  2. Initializing the Caption Generator:

    cap = Caption(model='conceptual', device='cpu', prefix_length=10)
    
  3. Generating Captions:

    import skimage.io as io
    
    image = io.imread("images/cover.jpg")  # Load the image
    caption_conceptual = cap.predict(image, beam=True)  # Generate caption with beam search
    print("Conceptual Model Caption:", caption_conceptual)
    

Explanation:

  • The Caption class initializes the image-captioning functionality with options to specify the model, device, and other parameters.
  • Users can create multiple instances of the Caption class to compare different models or share common resources.
  • The predict method generates captions for images using the specified model and decoding method.

Examples:

from mcaption import Caption
import skimage.io as io

# Initialize Caption generator with the 'conceptual' model
cap = Caption(model='conceptual', device='cpu', prefix_length=10)

# Load an image and generate a caption using beam search
image = io.imread("images/cover.jpg")
caption_conceptual = cap.predict(image, beam=True)
print("Conceptual Model Caption:", caption_conceptual)

# Initialize another Caption generator with the 'coco' model, inheriting from the previous one
cap2 = Caption(model='coco', inherit=cap)

# Generate a caption using greedy decoding
caption_coco = cap2.predict(image, beam=False)
print("COCO Model Caption:", caption_coco)

Notes:

  • The script demonstrates how to initialize and use the image-captioning functionality provided by the "modern-caption" library.
  • Users can experiment with different models and decoding methods to obtain varied captions.
  • Ensure that the image paths are correct and accessible.
  • This library used the following repository for conceptual reference.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mcaption-0.0.6.tar.gz (6.4 kB view hashes)

Uploaded Source

Built Distribution

mcaption-0.0.6-py3-none-any.whl (7.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page