Skip to main content

Uses a VLM to caption images from a dataset.

Project description

VLM Captioner

Uses a VLM (initially configured to use Qwen2.5-VL-32B-Instruct) to caption images from a dataset.

Dataset Structure

One VLM prompt will be used for each entire image directory. For each image directory, a mirror file structure is created with the suffix _caption. This structure contains individual .txt caption files with filenames matching that of their image counterparts.

dataset/
└── top_level_folder_1/
    ├── image_folder_1 (contains prompt for entire folder)/
    │   ├── prompt.txt
    │   ├── image_1.png
    │   ├── image_2.png
    │   └── ...
    └── ...

Running

First, install the required packages:

pip install -r requirements.txt

Then, run the script:

python vlm_caption_cli.py --input_dir=<input_dir> [--model=<vlm_model>]

Command Line Args

Required Args:
--input_dir=<input_dir> || The path of the input directory containing images to be captioned.
Optional Args:
--model=<vlm_model> || VLM to use to generate captions
--max_length=<max_new_tokens> || Maximum number of new tokens before truncation
--ignore_substring=<ignore_substring> || Ignore files/directories containing this substring
--num_captions=<number_of_captions> || Number of captions to generate per image
--overwrite=<True/False> || If true, overwrites captions that already exist
--output_dir=<output_dir> || The directory to act as the root of the caption file structure. Defaults to `<input_dir>_caption`.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vlm_dataset_captioner-0.0.1.tar.gz (4.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vlm_dataset_captioner-0.0.1-py3-none-any.whl (5.3 kB view details)

Uploaded Python 3

File details

Details for the file vlm_dataset_captioner-0.0.1.tar.gz.

File metadata

  • Download URL: vlm_dataset_captioner-0.0.1.tar.gz
  • Upload date:
  • Size: 4.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for vlm_dataset_captioner-0.0.1.tar.gz
Algorithm Hash digest
SHA256 82c37e0b3504e5f8b1bf57fd2a23ea94cd0cfeffd3a4b6e564b15cdd1689f0a1
MD5 0dfb0c5b89281e08925d7be30b201884
BLAKE2b-256 b71674a55c2b6fac686bfb100ea4ac33c0fc958396c9ee06f9f329b806cddca8

See more details on using hashes here.

Provenance

The following attestation bundles were made for vlm_dataset_captioner-0.0.1.tar.gz:

Publisher: pypi-publish.yml on alexsenden/vlm-dataset-captioner

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file vlm_dataset_captioner-0.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for vlm_dataset_captioner-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 2fcdeefbd6d821cbac9aab197a02eab2423ba76ecfc3eabbb10804d4c872ef8d
MD5 9f05075211a5d6811bcf0bb2dfd5f02d
BLAKE2b-256 01389e6c689eb01212593261c272c01ac47aff87212efda31894a3c26b77f4bd

See more details on using hashes here.

Provenance

The following attestation bundles were made for vlm_dataset_captioner-0.0.1-py3-none-any.whl:

Publisher: pypi-publish.yml on alexsenden/vlm-dataset-captioner

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page