Uses a VLM to caption images from a dataset.

These details have not been verified by PyPI

Project description

VLM Captioner

Uses a VLM (initially configured to use Qwen2.5-VL-32B-Instruct) to caption images from a dataset.

Dataset Structure

One VLM prompt will be used for each entire image directory. For each image directory, a mirror file structure is created with the suffix _caption. This structure contains individual .txt caption files with filenames matching that of their image counterparts.

dataset/
└── top_level_folder_1/
    ├── image_folder_1 (contains prompt for entire folder)/
    │   ├── prompt.txt
    │   ├── image_1.png
    │   ├── image_2.png
    │   └── ...
    └── ...

Running

First, install the required packages:

pip install -r requirements.txt

Then, run the script:

python vlm_caption_cli.py --input_dir=<input_dir> [--model=<vlm_model>]

Command Line Args

Required Args:

--input_dir=<input_dir> || The path of the input directory containing images to be captioned.

Optional Args:

--model=<vlm_model> || VLM to use to generate captions
--max_length=<max_new_tokens> || Maximum number of new tokens before truncation
--ignore_substring=<ignore_substring> || Ignore files/directories containing this substring
--num_captions=<number_of_captions> || Number of captions to generate per image
--overwrite=<True/False> || If true, overwrites captions that already exist
--output_dir=<output_dir> || The directory to act as the root of the caption file structure. Defaults to `<input_dir>_caption`.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.0.4

Feb 3, 2026

This version

0.0.3

Jan 30, 2026

0.0.2

Jan 29, 2026

0.0.1

Jan 29, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vlm_dataset_captioner-0.0.3.tar.gz (4.3 kB view details)

Uploaded Jan 30, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

vlm_dataset_captioner-0.0.3-py3-none-any.whl (5.7 kB view details)

Uploaded Jan 30, 2026 Python 3

File details

Details for the file vlm_dataset_captioner-0.0.3.tar.gz.

File metadata

Download URL: vlm_dataset_captioner-0.0.3.tar.gz
Upload date: Jan 30, 2026
Size: 4.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for vlm_dataset_captioner-0.0.3.tar.gz
Algorithm	Hash digest
SHA256	`0fe76405601509c8edd0f281042ea93beeee9c167079766c9efe8f500bffde4c`
MD5	`0459fe59bd8e130aea63d1cd1ba93665`
BLAKE2b-256	`fa7c2f7209c054f844e29acaebcde20e877ebf1b594a35dbf9013b7621b397e6`

See more details on using hashes here.

Provenance

The following attestation bundles were made for vlm_dataset_captioner-0.0.3.tar.gz:

Publisher: pypi-publish.yml on alexsenden/vlm-dataset-captioner

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: vlm_dataset_captioner-0.0.3.tar.gz
- Subject digest: 0fe76405601509c8edd0f281042ea93beeee9c167079766c9efe8f500bffde4c
- Sigstore transparency entry: 871703375
- Sigstore integration time: Jan 30, 2026
Source repository:
- Permalink: alexsenden/vlm-dataset-captioner@04014cbf9927e03ba8836ed74bd357083c952353
- Branch / Tag: refs/tags/v0.0.3
- Owner: https://github.com/alexsenden
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pypi-publish.yml@04014cbf9927e03ba8836ed74bd357083c952353
- Trigger Event: push

File details

Details for the file vlm_dataset_captioner-0.0.3-py3-none-any.whl.

File metadata

Download URL: vlm_dataset_captioner-0.0.3-py3-none-any.whl
Upload date: Jan 30, 2026
Size: 5.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for vlm_dataset_captioner-0.0.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7ca61bf64292bc0bd1456ba59d1d37eb15279c7976aff8c30618009019f3f7e2`
MD5	`82e8179df502fae55a4b39b0b34347af`
BLAKE2b-256	`e53c732e1d0e866ed7bc3ca8ad843f388e9677f51caedd60cbe8a54e8312865b`

See more details on using hashes here.

Provenance

The following attestation bundles were made for vlm_dataset_captioner-0.0.3-py3-none-any.whl:

Publisher: pypi-publish.yml on alexsenden/vlm-dataset-captioner

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: vlm_dataset_captioner-0.0.3-py3-none-any.whl
- Subject digest: 7ca61bf64292bc0bd1456ba59d1d37eb15279c7976aff8c30618009019f3f7e2
- Sigstore transparency entry: 871703389
- Sigstore integration time: Jan 30, 2026
Source repository:
- Permalink: alexsenden/vlm-dataset-captioner@04014cbf9927e03ba8836ed74bd357083c952353
- Branch / Tag: refs/tags/v0.0.3
- Owner: https://github.com/alexsenden
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pypi-publish.yml@04014cbf9927e03ba8836ed74bd357083c952353
- Trigger Event: push

vlm-dataset-captioner 0.0.3

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

VLM Captioner

Dataset Structure

Running

Command Line Args

Required Args:

Optional Args:

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance