Tool to generate captions to use as input for pre-trained image generation models, aligned with some pre-existing captions.

These details have not been verified by PyPI

Project description

Synthetic Image Caption Generator

A tool that uses Qwen to generate image captions similar to those in a given dataset. The model learns from example prompts and generates new captions that match their style and structure.

This tool is designed to be used for deep generative dataset augmentation.

Features

Uses Qwen from HuggingFace🤗 Transformers
Supports few-shot learning with configurable number of examples
Reads prompts from .txt files in a dataset directory
Flexible CLI with multiple configuration options
Can generate multiple captions in one run
Output to file or stdout

Installation

To install from PyPI:

pip install synthetic-image-caption-generator

To install locally:

pip install -e .

Or install dependencies manually:

pip install transformers>=4.30.0 torch>=2.0.0 accelerate>=0.20.0

Usage

Basic Usage

Generate a caption using 5 example prompts from the dataset:

generate-captions /path/to/dataset

Specify Number of Examples

Use a different number of example prompts (e.g., 10):

generate-captions /path/to/dataset --num-examples 10

Choose a Model

Select a different Qwen model (e.g., smaller or larger):

# Use the smaller 7B model
generate-captions /path/to/dataset --model qwen2.5-7b

# Use the larger 72B model
generate-captions /path/to/dataset --model qwen2.5-72b

# Use Qwen3 14B
generate-captions /path/to/dataset --model qwen3-14b

Generate Multiple Captions

Generate 5 captions:

generate-captions /path/to/dataset --num-generate 5

Specify Object Information

Include information about what's in the image:

generate-captions /path/to/dataset --object-info "the image contains 3 elephants"

generate-captions /path/to/dataset --object-info "the main crop in the field is soybean"

Save to File

Save generated captions to a file:

generate-captions /path/to/dataset --num-generate 10 --output generated_captions.txt

Advanced Options

generate-captions /path/to/dataset \
  --model qwen2.5-14b \
  --num-examples 8 \
  --num-generate 5 \
  --temperature 0.8 \
  --max-length 300 \
  --object-info "a cityscape with tall buildings" \
  --output captions.txt

Command-Line Arguments

dataset_dir (required): Path to directory containing .txt files with caption prompts
--model: Qwen model to use (default: qwen2.5-32b). Options: qwen2.5-0.5b, qwen2.5-1.5b, qwen2.5-3b, qwen2.5-7b, qwen2.5-14b, qwen2.5-32b, qwen2.5-72b, qwen3-14b, qwen3-32b
--num-examples: Number of example prompts to provide to the model (default: 5)
--num-generate: Number of captions to generate (default: 1)
--output: Output file to save generated captions (optional, prints to stdout if not specified)
--temperature: Temperature for text generation (default: 0.7, higher = more creative)
--max-length: Maximum length of generated caption (default: 256)
--object-info: Information about objects/content in the image (e.g., "the image contains 3 elephants")

Dataset Format

The dataset directory should contain .txt files, each with one or more prompts. Each prompt should be on its own line.

Example dataset structure:

dataset/
├── captions1.txt
├── captions2.txt
└── captions3.txt

Example content of captions1.txt:

A serene landscape with mountains in the background and a lake in the foreground
An urban street scene with people walking and cars passing by
A close-up portrait of a person smiling at the camera

Requirements

Python >= 3.8
CUDA-capable GPU recommended
Sufficient GPU memory depending on chosen model:
- Small models (0.5B-3B): 4-8GB VRAM
- Medium models (7B-14B): 12-24GB VRAM
- Large models (32B): 24-40GB VRAM
- Extra large models (72B): 40GB+ VRAM or quantization required

How It Works

The script reads all prompts from .txt files in the specified directory
It randomly samples a specified number of prompts as examples
These examples are formatted into a few-shot prompt for Qwen2.5-32B-Instruct
The model generates a new caption that matches the style and structure of the examples
The generated caption(s) are output to stdout or saved to a file

License

MIT

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.0.4

Jan 22, 2026

0.0.3

Jan 22, 2026

0.0.2

Jan 22, 2026

This version

0.0.1

Jan 15, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

synthetic_image_caption_generator-0.0.1.tar.gz (5.7 kB view details)

Uploaded Jan 15, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

synthetic_image_caption_generator-0.0.1-py3-none-any.whl (7.6 kB view details)

Uploaded Jan 15, 2026 Python 3

File details

Details for the file synthetic_image_caption_generator-0.0.1.tar.gz.

File metadata

Download URL: synthetic_image_caption_generator-0.0.1.tar.gz
Upload date: Jan 15, 2026
Size: 5.7 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for synthetic_image_caption_generator-0.0.1.tar.gz
Algorithm	Hash digest
SHA256	`80a8ba779ab066638ef9995c351cfce301efaa2ef680d8b2e0b6b7e16825be97`
MD5	`dffc3704d37ecbe3557e8dbb3e89f9be`
BLAKE2b-256	`b949ec95fefd419987a38a34f54c70fce9af7113240b8142de2ef5400338e598`

See more details on using hashes here.

Provenance

The following attestation bundles were made for synthetic_image_caption_generator-0.0.1.tar.gz:

Publisher: pypi-publish.yml on alexsenden/synthetic-image-caption-generator

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: synthetic_image_caption_generator-0.0.1.tar.gz
- Subject digest: 80a8ba779ab066638ef9995c351cfce301efaa2ef680d8b2e0b6b7e16825be97
- Sigstore transparency entry: 829264948
- Sigstore integration time: Jan 15, 2026
Source repository:
- Permalink: alexsenden/synthetic-image-caption-generator@30856f6509eaef06955fd4a2c890e1e8b3068db9
- Branch / Tag: refs/tags/v0.0.1
- Owner: https://github.com/alexsenden
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pypi-publish.yml@30856f6509eaef06955fd4a2c890e1e8b3068db9
- Trigger Event: push

File details

Details for the file synthetic_image_caption_generator-0.0.1-py3-none-any.whl.

File metadata

Download URL: synthetic_image_caption_generator-0.0.1-py3-none-any.whl
Upload date: Jan 15, 2026
Size: 7.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for synthetic_image_caption_generator-0.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d9fe05248fb06472374b297d06818e1102c3e29c9d5721a9b1e449603965ce29`
MD5	`595afde84089d0f0cf1224cd62bc85d3`
BLAKE2b-256	`6da2bace4ac6b01c4199e3ef00fba14694cc5ebc50e8e9c4fe291395a8cc1c43`

See more details on using hashes here.

Provenance

The following attestation bundles were made for synthetic_image_caption_generator-0.0.1-py3-none-any.whl:

Publisher: pypi-publish.yml on alexsenden/synthetic-image-caption-generator

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: synthetic_image_caption_generator-0.0.1-py3-none-any.whl
- Subject digest: d9fe05248fb06472374b297d06818e1102c3e29c9d5721a9b1e449603965ce29
- Sigstore transparency entry: 829264952
- Sigstore integration time: Jan 15, 2026
Source repository:
- Permalink: alexsenden/synthetic-image-caption-generator@30856f6509eaef06955fd4a2c890e1e8b3068db9
- Branch / Tag: refs/tags/v0.0.1
- Owner: https://github.com/alexsenden
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pypi-publish.yml@30856f6509eaef06955fd4a2c890e1e8b3068db9
- Trigger Event: push

synthetic-image-caption-generator 0.0.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Synthetic Image Caption Generator

Features

Installation

Usage

Basic Usage

Specify Number of Examples

Choose a Model

Generate Multiple Captions

Specify Object Information

Save to File

Advanced Options

Command-Line Arguments

Dataset Format

Requirements

How It Works

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance