Skip to main content

Uses a VLM to caption images from a dataset.

Project description

VLM Captioner

Uses a VLM (initially configured to use Qwen2.5-VL-32B-Instruct) to caption images from a dataset.

Dataset Structure

One VLM prompt will be used for each entire image directory. For each image directory, a mirror file structure is created with the suffix _caption. This structure contains individual .txt caption files with filenames matching that of their image counterparts.

dataset/
└── top_level_folder_1/
    ├── image_folder_1 (contains prompt for entire folder)/
    │   ├── prompt.txt
    │   ├── image_1.png
    │   ├── image_2.png
    │   └── ...
    └── ...

Running

First, install the required packages:

pip install -r requirements.txt

Then, run the script:

python vlm_caption_cli.py --input_dir=<input_dir> [--model=<vlm_model>]

Command Line Args

Required Args:
--input_dir=<input_dir> || The path of the input directory containing images to be captioned.
Optional Args:
--model=<vlm_model> || VLM to use to generate captions
--max_length=<max_new_tokens> || Maximum number of new tokens before truncation
--ignore_substring=<ignore_substring> || Ignore files/directories containing this substring
--num_captions=<number_of_captions> || Number of captions to generate per image
--overwrite=<True/False> || If true, overwrites captions that already exist
--output_dir=<output_dir> || The directory to act as the root of the caption file structure. Defaults to `<input_dir>_caption`.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vlm_dataset_captioner-0.0.3.tar.gz (4.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vlm_dataset_captioner-0.0.3-py3-none-any.whl (5.7 kB view details)

Uploaded Python 3

File details

Details for the file vlm_dataset_captioner-0.0.3.tar.gz.

File metadata

  • Download URL: vlm_dataset_captioner-0.0.3.tar.gz
  • Upload date:
  • Size: 4.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for vlm_dataset_captioner-0.0.3.tar.gz
Algorithm Hash digest
SHA256 0fe76405601509c8edd0f281042ea93beeee9c167079766c9efe8f500bffde4c
MD5 0459fe59bd8e130aea63d1cd1ba93665
BLAKE2b-256 fa7c2f7209c054f844e29acaebcde20e877ebf1b594a35dbf9013b7621b397e6

See more details on using hashes here.

Provenance

The following attestation bundles were made for vlm_dataset_captioner-0.0.3.tar.gz:

Publisher: pypi-publish.yml on alexsenden/vlm-dataset-captioner

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file vlm_dataset_captioner-0.0.3-py3-none-any.whl.

File metadata

File hashes

Hashes for vlm_dataset_captioner-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 7ca61bf64292bc0bd1456ba59d1d37eb15279c7976aff8c30618009019f3f7e2
MD5 82e8179df502fae55a4b39b0b34347af
BLAKE2b-256 e53c732e1d0e866ed7bc3ca8ad843f388e9677f51caedd60cbe8a54e8312865b

See more details on using hashes here.

Provenance

The following attestation bundles were made for vlm_dataset_captioner-0.0.3-py3-none-any.whl:

Publisher: pypi-publish.yml on alexsenden/vlm-dataset-captioner

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page