Skip to main content

A modern implementation of simple image captioning

Project description

Modern-Caption


Description:

This Python script provides functionality for generating image captions using the "modern-caption" library. It utilizes CLIP (Contrastive Language-Image Pretraining) and GPT-2 models for image encoding and caption generation, respectively. Users can generate captions for images using different pre-trained models and decoding methods.


Installation:

  1. Dependencies:

    • torch
    • clip
    • transformers
    • numpy
    • PIL
    • scikit-image
  2. Setup:

    • Ensure you have Python installed on your system.
    • Dependencies will be installed during setup.
    • Setup will not overwrite existing installs of PyTorch or Torchvision.
    • Install using pip:
      pip install modern-caption
      

Usage:

  1. Importing the Module:

    from mcaption import Caption
    
  2. Initializing the Caption Generator:

    cap = Caption(model='conceptual', device='cpu', prefix_length=10)
    
  3. Generating Captions:

    import skimage.io as io
    
    image = io.imread("images/cover.jpg")  # Load the image
    caption_conceptual = cap.predict(image, beam=True)  # Generate caption with beam search
    print("Conceptual Model Caption:", caption_conceptual)
    

Explanation:

  • The Caption class initializes the image-captioning functionality with options to specify the model, device, and other parameters.
  • Users can create multiple instances of the Caption class to compare different models or share common resources.
  • The predict method generates captions for images using the specified model and decoding method.

Examples:

from mcaption import Caption
import skimage.io as io

# Initialize Caption generator with the 'conceptual' model
cap = Caption(model='conceptual', device='cpu', prefix_length=10)

# Load an image and generate a caption using beam search
image = io.imread("images/cover.jpg")
caption_conceptual = cap.predict(image, beam=True)
print("Conceptual Model Caption:", caption_conceptual)

# Initialize another Caption generator with the 'coco' model, inheriting from the previous one
cap2 = Caption(model='coco', inherit=cap)

# Generate a caption using greedy decoding
caption_coco = cap2.predict(image, beam=False)
print("COCO Model Caption:", caption_coco)

Notes:

  • The script demonstrates how to initialize and use the image-captioning functionality provided by the "modern-caption" library.
  • Users can experiment with different models and decoding methods to obtain varied captions.
  • Ensure that the image paths are correct and accessible.
  • This library used the following repository for conceptual reference.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mcaption-0.0.6.tar.gz (6.4 kB view details)

Uploaded Source

Built Distribution

mcaption-0.0.6-py3-none-any.whl (7.0 kB view details)

Uploaded Python 3

File details

Details for the file mcaption-0.0.6.tar.gz.

File metadata

  • Download URL: mcaption-0.0.6.tar.gz
  • Upload date:
  • Size: 6.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for mcaption-0.0.6.tar.gz
Algorithm Hash digest
SHA256 f719e58173559f13bd27618448fdd5eae5a786fdf42709d1b59bb97e199e8884
MD5 a0af1185b2827283f6f2200374d62004
BLAKE2b-256 d2193cdaeb4230cfb637e3ac29e27bf972c7f73a85f0897257309c445d57b310

See more details on using hashes here.

File details

Details for the file mcaption-0.0.6-py3-none-any.whl.

File metadata

  • Download URL: mcaption-0.0.6-py3-none-any.whl
  • Upload date:
  • Size: 7.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for mcaption-0.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 2bdf234dc97fd15d394820ad6da7d4e1442c638c1037ab47b93adf27f713a82e
MD5 2e974b3e8fa4dbd8537ad47bae1421fb
BLAKE2b-256 75165b1586317ce5a9bc4f7e6638649c9ab5f6345417d73e0d405073a125dc77

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page