Skip to main content

Tool to generate captions to use as input for pre-trained image generation models, aligned with some pre-existing captions.

Project description

Synthetic Image Caption Generator

A tool that uses Qwen to generate image captions similar to those in a given dataset. The model learns from example prompts and generates new captions that match their style and structure.

This tool is designed to be used for deep generative dataset augmentation.

Features

  • Uses Qwen from HuggingFace🤗 Transformers
  • Supports few-shot learning with configurable number of examples
  • Reads prompts from .txt files in a dataset directory
  • Flexible CLI with multiple configuration options
  • Can generate multiple captions in one run
  • Output to file or stdout

Installation

To install from PyPI:

pip install synthetic-image-caption-generator

To install locally:

pip install -e .

Or install dependencies manually:

pip install transformers>=4.30.0 torch>=2.0.0 accelerate>=0.20.0

Usage

Basic Usage

Generate a caption using 5 example prompts from the dataset:

generate-captions /path/to/dataset

Specify Number of Examples

Use a different number of example prompts (e.g., 10):

generate-captions /path/to/dataset --num-examples 10

Choose a Model

Select a different Qwen model (e.g., smaller or larger):

# Use the smaller 7B model
generate-captions /path/to/dataset --model qwen2.5-7b

# Use the larger 72B model
generate-captions /path/to/dataset --model qwen2.5-72b

# Use Qwen3 14B
generate-captions /path/to/dataset --model qwen3-14b

Generate Multiple Captions

Generate 5 captions:

generate-captions /path/to/dataset --num-generate 5

Specify Object Information

Include information about what's in the image:

generate-captions /path/to/dataset --object-info "the image contains 3 elephants"
generate-captions /path/to/dataset --object-info "the main crop in the field is soybean"

Save to File

Save generated captions to a file:

generate-captions /path/to/dataset --num-generate 10 --output generated_captions.txt

Advanced Options

generate-captions /path/to/dataset \
  --model qwen2.5-14b \
  --num-examples 8 \
  --num-generate 5 \
  --temperature 0.8 \
  --max-length 300 \
  --object-info "a cityscape with tall buildings" \
  --output captions.txt

Command-Line Arguments

  • dataset_dir (required): Path to directory containing .txt files with caption prompts
  • --model: Qwen model to use (default: qwen2.5-32b). Options: qwen2.5-0.5b, qwen2.5-1.5b, qwen2.5-3b, qwen2.5-7b, qwen2.5-14b, qwen2.5-32b, qwen2.5-72b, qwen3-14b, qwen3-32b
  • --num-examples: Number of example prompts to provide to the model (default: 5)
  • --num-generate: Number of captions to generate (default: 1)
  • --output: Output file to save generated captions (optional, prints to stdout if not specified)
  • --temperature: Temperature for text generation (default: 0.7, higher = more creative)
  • --max-length: Maximum length of generated caption (default: 256)
  • --object-info: Information about objects/content in the image (e.g., "the image contains 3 elephants")

Dataset Format

The dataset directory should contain .txt files, each with one or more prompts. Each prompt should be on its own line.

Example dataset structure:

dataset/
├── captions1.txt
├── captions2.txt
└── captions3.txt

Example content of captions1.txt:

A serene landscape with mountains in the background and a lake in the foreground
An urban street scene with people walking and cars passing by
A close-up portrait of a person smiling at the camera

Requirements

  • Python >= 3.8
  • CUDA-capable GPU recommended
  • Sufficient GPU memory depending on chosen model:
    • Small models (0.5B-3B): 4-8GB VRAM
    • Medium models (7B-14B): 12-24GB VRAM
    • Large models (32B): 24-40GB VRAM
    • Extra large models (72B): 40GB+ VRAM or quantization required

How It Works

  1. The script reads all prompts from .txt files in the specified directory
  2. It randomly samples a specified number of prompts as examples
  3. These examples are formatted into a few-shot prompt for Qwen2.5-32B-Instruct
  4. The model generates a new caption that matches the style and structure of the examples
  5. The generated caption(s) are output to stdout or saved to a file

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

synthetic_image_caption_generator-0.0.1.tar.gz (5.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file synthetic_image_caption_generator-0.0.1.tar.gz.

File metadata

File hashes

Hashes for synthetic_image_caption_generator-0.0.1.tar.gz
Algorithm Hash digest
SHA256 80a8ba779ab066638ef9995c351cfce301efaa2ef680d8b2e0b6b7e16825be97
MD5 dffc3704d37ecbe3557e8dbb3e89f9be
BLAKE2b-256 b949ec95fefd419987a38a34f54c70fce9af7113240b8142de2ef5400338e598

See more details on using hashes here.

Provenance

The following attestation bundles were made for synthetic_image_caption_generator-0.0.1.tar.gz:

Publisher: pypi-publish.yml on alexsenden/synthetic-image-caption-generator

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file synthetic_image_caption_generator-0.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for synthetic_image_caption_generator-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 d9fe05248fb06472374b297d06818e1102c3e29c9d5721a9b1e449603965ce29
MD5 595afde84089d0f0cf1224cd62bc85d3
BLAKE2b-256 6da2bace4ac6b01c4199e3ef00fba14694cc5ebc50e8e9c4fe291395a8cc1c43

See more details on using hashes here.

Provenance

The following attestation bundles were made for synthetic_image_caption_generator-0.0.1-py3-none-any.whl:

Publisher: pypi-publish.yml on alexsenden/synthetic-image-caption-generator

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page