Skip to main content

A modern implementation of simple image captioning

Project description

Modern-Caption


Description:

This Python script provides functionality for generating image captions using the "modern-caption" library. It utilizes CLIP (Contrastive Language-Image Pretraining) and GPT-2 models for image encoding and caption generation, respectively. Users can generate captions for images using different pre-trained models and decoding methods.


Installation:

  1. Dependencies:

    • torch
    • clip
    • transformers
    • numpy
    • PIL
    • scikit-image
  2. Setup:

    • Ensure you have Python installed on your system.
    • Dependencies will be installed during setup.
    • Setup will not overwrite existing installs of PyTorch or Torchvision.
    • Install using pip:
      pip install modern-caption
      

Usage:

  1. Importing the Module:

    from mcaption import Caption
    
  2. Initializing the Caption Generator:

    cap = Caption(model='conceptual', device='cpu', prefix_length=10)
    
  3. Generating Captions:

    import skimage.io as io
    
    image = io.imread("images/cover.jpg")  # Load the image
    caption_conceptual = cap.predict(image, beam=True)  # Generate caption with beam search
    print("Conceptual Model Caption:", caption_conceptual)
    

Explanation:

  • The Caption class initializes the image-captioning functionality with options to specify the model, device, and other parameters.
  • Users can create multiple instances of the Caption class to compare different models or share common resources.
  • The predict method generates captions for images using the specified model and decoding method.

Examples:

from mcaption import Caption
import skimage.io as io

# Initialize Caption generator with the 'conceptual' model
cap = Caption(model='conceptual', device='cpu', prefix_length=10)

# Load an image and generate a caption using beam search
image = io.imread("images/cover.jpg")
caption_conceptual = cap.predict(image, beam=True)
print("Conceptual Model Caption:", caption_conceptual)

# Initialize another Caption generator with the 'coco' model, inheriting from the previous one
cap2 = Caption(model='coco', inherit=cap)

# Generate a caption using greedy decoding
caption_coco = cap2.predict(image, beam=False)
print("COCO Model Caption:", caption_coco)

Notes:

  • The script demonstrates how to initialize and use the image-captioning functionality provided by the "modern-caption" library.
  • Users can experiment with different models and decoding methods to obtain varied captions.
  • Ensure that the image paths are correct and accessible.
  • This library used the following repository for conceptual reference.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mcaption-0.0.5.tar.gz (6.5 kB view details)

Uploaded Source

Built Distribution

mcaption-0.0.5-py3-none-any.whl (7.0 kB view details)

Uploaded Python 3

File details

Details for the file mcaption-0.0.5.tar.gz.

File metadata

  • Download URL: mcaption-0.0.5.tar.gz
  • Upload date:
  • Size: 6.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for mcaption-0.0.5.tar.gz
Algorithm Hash digest
SHA256 153ab33ea5446ead5d7e5453306a669f0dc370fdee482b8363bae700ce088434
MD5 817889dc9cd153e701ebeb214a664964
BLAKE2b-256 4c81c4f3f55657323c7618ecbab0fe41e4ec6d564165cb983357c4e9369a5e7d

See more details on using hashes here.

File details

Details for the file mcaption-0.0.5-py3-none-any.whl.

File metadata

  • Download URL: mcaption-0.0.5-py3-none-any.whl
  • Upload date:
  • Size: 7.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for mcaption-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 cb37ea9600fc49674529b11f67a1b9d6318240e571136edf5195241d31c51b0c
MD5 6b6c32523ba44881538450b364929a2d
BLAKE2b-256 0039bd48618d49c124981c06fcfc8539646aae44e71a81ae0f3443f6833b64fe

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page