Skip to main content

Uses a VLM to caption images from a dataset.

Project description

VLM Captioner

Uses a VLM (initially configured to use Qwen2.5-VL-32B-Instruct) to caption images from a dataset.

Dataset Structure

One VLM prompt will be used for each entire image directory. For each image directory, a mirror file structure is created with the suffix _caption. This structure contains individual .txt caption files with filenames matching that of their image counterparts.

dataset/
└── top_level_folder_1/
    ├── image_folder_1 (contains prompt for entire folder)/
    │   ├── prompt.txt
    │   ├── image_1.png
    │   ├── image_2.png
    │   └── ...
    └── ...

Running

First, install the required packages:

pip install -r requirements.txt

Then, run the script:

python vlm_caption_cli.py --input_dir=<input_dir> [--model=<vlm_model>]

Command Line Args

Required Args:
--input_dir=<input_dir> || The path of the input directory containing images to be captioned.
Optional Args:
--model=<vlm_model> || VLM to use to generate captions
--max_length=<max_new_tokens> || Maximum number of new tokens before truncation
--ignore_substring=<ignore_substring> || Ignore files/directories containing this substring
--num_captions=<number_of_captions> || Number of captions to generate per image
--overwrite=<True/False> || If true, overwrites captions that already exist
--output_dir=<output_dir> || The directory to act as the root of the caption file structure. Defaults to `<input_dir>_caption`.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vlm_dataset_captioner-0.0.2.tar.gz (4.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vlm_dataset_captioner-0.0.2-py3-none-any.whl (5.6 kB view details)

Uploaded Python 3

File details

Details for the file vlm_dataset_captioner-0.0.2.tar.gz.

File metadata

  • Download URL: vlm_dataset_captioner-0.0.2.tar.gz
  • Upload date:
  • Size: 4.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for vlm_dataset_captioner-0.0.2.tar.gz
Algorithm Hash digest
SHA256 703c78d837bc7a1fc712b067c6f1a6bcd1ae66e360b135875b1d6a1389dd523f
MD5 b1789182448c3610fb7639547525236d
BLAKE2b-256 b75731a976a68c88b08f9ca9ba768b4d8c6fbd68736148568b4ec3fe6d10011e

See more details on using hashes here.

Provenance

The following attestation bundles were made for vlm_dataset_captioner-0.0.2.tar.gz:

Publisher: pypi-publish.yml on alexsenden/vlm-dataset-captioner

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file vlm_dataset_captioner-0.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for vlm_dataset_captioner-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 a371378158486c9572b60af0b22f9b8748ecb35c542c325e6c38ae90ff64a0d3
MD5 556a5dd7ee720ca4893245566749ab1d
BLAKE2b-256 a322f7e2275f6a0d7bb51a92ce607d3f50b8b58537ee390fd5e44e0795ec2222

See more details on using hashes here.

Provenance

The following attestation bundles were made for vlm_dataset_captioner-0.0.2-py3-none-any.whl:

Publisher: pypi-publish.yml on alexsenden/vlm-dataset-captioner

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page