Skip to main content

Uses a VLM to caption images from a dataset.

Project description

VLM Captioner

Uses a VLM (initially configured to use Qwen2.5-VL-32B-Instruct) to caption images from a dataset.

Dataset Structure

One VLM prompt will be used for each entire image directory. For each image directory, a mirror file structure is created with the suffix _caption. This structure contains individual .txt caption files with filenames matching that of their image counterparts.

dataset/
└── top_level_folder_1/
    ├── image_folder_1 (contains prompt for entire folder)/
    │   ├── prompt.txt
    │   ├── image_1.png
    │   ├── image_2.png
    │   └── ...
    └── ...

Running

First, install the required packages:

pip install -r requirements.txt

Then, run the script:

python vlm_caption_cli.py --input_dir=<input_dir> [--model=<vlm_model>]

Command Line Args

Required Args:
--input_dir=<input_dir> || The path of the input directory containing images to be captioned.
Optional Args:
--model=<vlm_model> || VLM to use to generate captions
--max_length=<max_new_tokens> || Maximum number of new tokens before truncation
--ignore_substring=<ignore_substring> || Ignore files/directories containing this substring
--num_captions=<number_of_captions> || Number of captions to generate per image
--overwrite=<True/False> || If true, overwrites captions that already exist
--output_dir=<output_dir> || The directory to act as the root of the caption file structure. Defaults to `<input_dir>_caption`.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vlm_dataset_captioner-0.0.4.tar.gz (4.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vlm_dataset_captioner-0.0.4-py3-none-any.whl (5.8 kB view details)

Uploaded Python 3

File details

Details for the file vlm_dataset_captioner-0.0.4.tar.gz.

File metadata

  • Download URL: vlm_dataset_captioner-0.0.4.tar.gz
  • Upload date:
  • Size: 4.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for vlm_dataset_captioner-0.0.4.tar.gz
Algorithm Hash digest
SHA256 4b712901663042f88c94b01cf08aa3c13855074dd85916f53d96a6e09d8f3b06
MD5 62999bd0e12c741e48b1bf9bad882438
BLAKE2b-256 8bbf3099d837f07e6cda400fa22f09315e9997b2eb4e7531fc6025411f73249b

See more details on using hashes here.

Provenance

The following attestation bundles were made for vlm_dataset_captioner-0.0.4.tar.gz:

Publisher: pypi-publish.yml on alexsenden/vlm-dataset-captioner

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file vlm_dataset_captioner-0.0.4-py3-none-any.whl.

File metadata

File hashes

Hashes for vlm_dataset_captioner-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 e05058223215301563035d4263c61384c1fc6a9f9e44588a62c3eaa215c5b864
MD5 fc773665a30fb76f5a942956f9554e7f
BLAKE2b-256 ae5ae8e9b87a905822e7566aaa90e929b6a8388a104872a86b177dd54162fc95

See more details on using hashes here.

Provenance

The following attestation bundles were made for vlm_dataset_captioner-0.0.4-py3-none-any.whl:

Publisher: pypi-publish.yml on alexsenden/vlm-dataset-captioner

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page