Skip to main content

Tool to generate captions to use as input for pre-trained image generation models, aligned with some pre-existing captions.

Project description

Synthetic Image Caption Generator

A tool that uses Qwen to generate image captions similar to those in a given dataset. The model learns from example prompts and generates new captions that match their style and structure.

This tool is designed to be used for deep generative dataset augmentation.

Features

  • Uses Qwen from HuggingFace🤗 Transformers
  • Supports few-shot learning with configurable number of examples
  • Reads prompts from .txt files in a dataset directory
  • Flexible CLI with multiple configuration options
  • Can generate multiple captions in one run
  • Output to file or stdout

Installation

To install from PyPI:

pip install synthetic-image-caption-generator

To install locally:

pip install -e .

Or install dependencies manually:

pip install transformers>=4.30.0 torch>=2.0.0 accelerate>=0.20.0

Usage

Downloading Models

Before generating captions, you can pre-download a model to the Hugging Face cache for offline use:

# Download the default model (qwen2.5-32b)
download-caption-model

# Download a specific model
download-caption-model --model qwen2.5-7b

# Download a larger model
download-caption-model --model qwen2.5-72b

Basic Usage

Generate a caption using 5 example prompts from the dataset:

generate-captions /path/to/dataset

Specify Number of Examples

Use a different number of example prompts (e.g., 10):

generate-captions /path/to/dataset --num-examples 10

Choose a Model

Select a different Qwen model (e.g., smaller or larger):

# Use the smaller 7B model
generate-captions /path/to/dataset --model qwen2.5-7b

# Use the larger 72B model
generate-captions /path/to/dataset --model qwen2.5-72b

# Use Qwen3 14B
generate-captions /path/to/dataset --model qwen3-14b

Generate Multiple Captions

Generate 5 captions:

generate-captions /path/to/dataset --num-generate 5

Specify Object Information

Include information about what's in the image:

generate-captions /path/to/dataset --object-info "the image contains 3 elephants"
generate-captions /path/to/dataset --object-info "the main crop in the field is soybean"

Save to File

Save generated captions to a file:

generate-captions /path/to/dataset --num-generate 10 --output generated_captions.txt

Advanced Options

generate-captions /path/to/dataset \
  --model qwen2.5-14b \
  --num-examples 8 \
  --num-generate 5 \
  --temperature 0.8 \
  --max-length 300 \
  --object-info "a cityscape with tall buildings" \
  --output captions.txt

Command-Line Arguments

generate-captions

  • dataset_dir (required): Path to directory containing .txt files with caption prompts
  • --model: Qwen model to use (default: qwen2.5-32b). Options: qwen2.5-0.5b, qwen2.5-1.5b, qwen2.5-3b, qwen2.5-7b, qwen2.5-14b, qwen2.5-32b, qwen2.5-72b, qwen3-14b, qwen3-32b
  • --num-examples: Number of example prompts to provide to the model (default: 5)
  • --num-generate: Number of captions to generate (default: 1)
  • --output: Output file to save generated captions (optional, prints to stdout if not specified)
  • --temperature: Temperature for text generation (default: 0.7, higher = more creative)
  • --max-length: Maximum length of generated caption (default: 256)
  • --object-info: Information about objects/content in the image (e.g., "the image contains 3 elephants")

download-caption-model

  • --model: Qwen model to download (default: qwen2.5-32b). Options: qwen2.5-0.5b, qwen2.5-1.5b, qwen2.5-3b, qwen2.5-7b, qwen2.5-14b, qwen2.5-32b, qwen2.5-72b, qwen3-14b, qwen3-32b

Dataset Format

The dataset directory should contain .txt files, each with one or more prompts. Each prompt should be on its own line.

Example dataset structure:

dataset/
├── captions1.txt
├── captions2.txt
└── captions3.txt

Example content of captions1.txt:

A serene landscape with mountains in the background and a lake in the foreground
An urban street scene with people walking and cars passing by
A close-up portrait of a person smiling at the camera

Requirements

  • Python >= 3.8
  • CUDA-capable GPU recommended
  • Sufficient GPU memory depending on chosen model:
    • Small models (0.5B-3B): 4-8GB VRAM
    • Medium models (7B-14B): 12-24GB VRAM
    • Large models (32B): 24-40GB VRAM
    • Extra large models (72B): 40GB+ VRAM or quantization required

How It Works

  1. The script reads all prompts from .txt files in the specified directory
  2. It randomly samples a specified number of prompts as examples
  3. These examples are formatted into a few-shot prompt for Qwen2.5-32B-Instruct
  4. The model generates a new caption that matches the style and structure of the examples
  5. The generated caption(s) are output to stdout or saved to a file

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

synthetic_image_caption_generator-0.0.4.tar.gz (6.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file synthetic_image_caption_generator-0.0.4.tar.gz.

File metadata

File hashes

Hashes for synthetic_image_caption_generator-0.0.4.tar.gz
Algorithm Hash digest
SHA256 3687250150fd118f5275f35af86e033a6c2f3d5b8c905cd381c9471499d986fd
MD5 04e4fa62b6eef315fd9f93351a6c29ff
BLAKE2b-256 e1290c4b7936d0f72e78410bae15ec369d44abc4269841acc9f5c60961e00ad0

See more details on using hashes here.

Provenance

The following attestation bundles were made for synthetic_image_caption_generator-0.0.4.tar.gz:

Publisher: pypi-publish.yml on alexsenden/synthetic-image-caption-generator

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file synthetic_image_caption_generator-0.0.4-py3-none-any.whl.

File metadata

File hashes

Hashes for synthetic_image_caption_generator-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 ae2d1d6054cf9bfb29eaf54ff4404a6c91ddc498ae6b7a144be5f5a36fbc600a
MD5 538c93dc2e4c60834fe4b94c4e4916df
BLAKE2b-256 d70fff317f98b5ce7c97088e3efd3737726610e5e30fc2855d911141c469d30b

See more details on using hashes here.

Provenance

The following attestation bundles were made for synthetic_image_caption_generator-0.0.4-py3-none-any.whl:

Publisher: pypi-publish.yml on alexsenden/synthetic-image-caption-generator

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page