Tool to generate captions to use as input for pre-trained image generation models, aligned with some pre-existing captions.
Project description
Synthetic Image Caption Generator
A tool that uses Qwen to generate image captions similar to those in a given dataset. The model learns from example prompts and generates new captions that match their style and structure.
This tool is designed to be used for deep generative dataset augmentation.
Features
- Uses Qwen from HuggingFace🤗 Transformers
- Supports few-shot learning with configurable number of examples
- Reads prompts from .txt files in a dataset directory
- Flexible CLI with multiple configuration options
- Can generate multiple captions in one run
- Output to file or stdout
Installation
To install from PyPI:
pip install synthetic-image-caption-generator
To install locally:
pip install -e .
Or install dependencies manually:
pip install transformers>=4.30.0 torch>=2.0.0 accelerate>=0.20.0
Usage
Downloading Models
Before generating captions, you can pre-download a model to the Hugging Face cache for offline use:
# Download the default model (qwen2.5-32b)
download-caption-model
# Download a specific model
download-caption-model --model qwen2.5-7b
# Download a larger model
download-caption-model --model qwen2.5-72b
Basic Usage
Generate a caption using 5 example prompts from the dataset:
generate-captions /path/to/dataset
Specify Number of Examples
Use a different number of example prompts (e.g., 10):
generate-captions /path/to/dataset --num-examples 10
Choose a Model
Select a different Qwen model (e.g., smaller or larger):
# Use the smaller 7B model
generate-captions /path/to/dataset --model qwen2.5-7b
# Use the larger 72B model
generate-captions /path/to/dataset --model qwen2.5-72b
# Use Qwen3 14B
generate-captions /path/to/dataset --model qwen3-14b
Generate Multiple Captions
Generate 5 captions:
generate-captions /path/to/dataset --num-generate 5
Specify Object Information
Include information about what's in the image:
generate-captions /path/to/dataset --object-info "the image contains 3 elephants"
generate-captions /path/to/dataset --object-info "the main crop in the field is soybean"
Save to File
Save generated captions to a file:
generate-captions /path/to/dataset --num-generate 10 --output generated_captions.txt
Advanced Options
generate-captions /path/to/dataset \
--model qwen2.5-14b \
--num-examples 8 \
--num-generate 5 \
--temperature 0.8 \
--max-length 300 \
--object-info "a cityscape with tall buildings" \
--output captions.txt
Command-Line Arguments
generate-captions
dataset_dir(required): Path to directory containing .txt files with caption prompts--model: Qwen model to use (default: qwen2.5-32b). Options: qwen2.5-0.5b, qwen2.5-1.5b, qwen2.5-3b, qwen2.5-7b, qwen2.5-14b, qwen2.5-32b, qwen2.5-72b, qwen3-14b, qwen3-32b--num-examples: Number of example prompts to provide to the model (default: 5)--num-generate: Number of captions to generate (default: 1)--output: Output file to save generated captions (optional, prints to stdout if not specified)--temperature: Temperature for text generation (default: 0.7, higher = more creative)--max-length: Maximum length of generated caption (default: 256)--object-info: Information about objects/content in the image (e.g., "the image contains 3 elephants")
download-caption-model
--model: Qwen model to download (default: qwen2.5-32b). Options: qwen2.5-0.5b, qwen2.5-1.5b, qwen2.5-3b, qwen2.5-7b, qwen2.5-14b, qwen2.5-32b, qwen2.5-72b, qwen3-14b, qwen3-32b
Dataset Format
The dataset directory should contain .txt files, each with one or more prompts. Each prompt should be on its own line.
Example dataset structure:
dataset/
├── captions1.txt
├── captions2.txt
└── captions3.txt
Example content of captions1.txt:
A serene landscape with mountains in the background and a lake in the foreground
An urban street scene with people walking and cars passing by
A close-up portrait of a person smiling at the camera
Requirements
- Python >= 3.8
- CUDA-capable GPU recommended
- Sufficient GPU memory depending on chosen model:
- Small models (0.5B-3B): 4-8GB VRAM
- Medium models (7B-14B): 12-24GB VRAM
- Large models (32B): 24-40GB VRAM
- Extra large models (72B): 40GB+ VRAM or quantization required
How It Works
- The script reads all prompts from .txt files in the specified directory
- It randomly samples a specified number of prompts as examples
- These examples are formatted into a few-shot prompt for Qwen2.5-32B-Instruct
- The model generates a new caption that matches the style and structure of the examples
- The generated caption(s) are output to stdout or saved to a file
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file synthetic_image_caption_generator-0.0.2.tar.gz.
File metadata
- Download URL: synthetic_image_caption_generator-0.0.2.tar.gz
- Upload date:
- Size: 6.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
40be6b8fc563845bd0067a32d6477f3f2429e183a9c28031215dd625206da13f
|
|
| MD5 |
577816259030601b68394ca5f688fe5a
|
|
| BLAKE2b-256 |
d3710228af582106f9527ed1c786c903ef24e790e7ae382d069983aea33a4deb
|
Provenance
The following attestation bundles were made for synthetic_image_caption_generator-0.0.2.tar.gz:
Publisher:
pypi-publish.yml on alexsenden/synthetic-image-caption-generator
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
synthetic_image_caption_generator-0.0.2.tar.gz -
Subject digest:
40be6b8fc563845bd0067a32d6477f3f2429e183a9c28031215dd625206da13f - Sigstore transparency entry: 845156166
- Sigstore integration time:
-
Permalink:
alexsenden/synthetic-image-caption-generator@9d9ad74238aae1192ec884572560e7b6fdf9025d -
Branch / Tag:
refs/tags/v0.0.2 - Owner: https://github.com/alexsenden
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi-publish.yml@9d9ad74238aae1192ec884572560e7b6fdf9025d -
Trigger Event:
push
-
Statement type:
File details
Details for the file synthetic_image_caption_generator-0.0.2-py3-none-any.whl.
File metadata
- Download URL: synthetic_image_caption_generator-0.0.2-py3-none-any.whl
- Upload date:
- Size: 8.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0975715312be267afd5971bc6de15756014c885507b01d0f0127a7b2e6f1ed6d
|
|
| MD5 |
7e5d55b7ab4ef27e29c7485f003ecf84
|
|
| BLAKE2b-256 |
fa34143aa0ea19d730e92177ccb1df2604ba8aefd9a9c3e75aa9e1fb312fcaf9
|
Provenance
The following attestation bundles were made for synthetic_image_caption_generator-0.0.2-py3-none-any.whl:
Publisher:
pypi-publish.yml on alexsenden/synthetic-image-caption-generator
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
synthetic_image_caption_generator-0.0.2-py3-none-any.whl -
Subject digest:
0975715312be267afd5971bc6de15756014c885507b01d0f0127a7b2e6f1ed6d - Sigstore transparency entry: 845156176
- Sigstore integration time:
-
Permalink:
alexsenden/synthetic-image-caption-generator@9d9ad74238aae1192ec884572560e7b6fdf9025d -
Branch / Tag:
refs/tags/v0.0.2 - Owner: https://github.com/alexsenden
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi-publish.yml@9d9ad74238aae1192ec884572560e7b6fdf9025d -
Trigger Event:
push
-
Statement type: