Self-contained distributed community captioning system

These details have not been verified by PyPI

Project description

CaptionFlow

scalable, fault-tolerant vLLM-powered image captioning.

a fast websocket-based orchestrator paired with lightweight gpu workers achieves exceptional performance for batched requests through vLLM.

orchestrator: hands out work in chunked shards, collects captions, checkpoints progress, and keeps simple stats.
workers (vLLM): connect to the orchestrator, stream in image samples, batch them, and generate 1..N captions per image using prompts supplied by the orchestrator.
config-driven: all components read YAML config; flags can override.

no conda. just venv + pip.

install

python -m venv .venv
source .venv/bin/activate  # windows: .venv\Scripts\activate
pip install --upgrade pip
pip install -e .  # installs the `caption-flow` command

quickstart (single box)

copy + edit the sample configs

cp examples/orchestrator/local_image_files.yaml my-orchestrator.yaml
cp examples/worker.yaml my-worker.yaml
cp examples/monitor.yaml my-monitor.yaml   # optional terminal interface

set a unique shared token in both my-orchestrator.yaml and my-worker.yaml (see auth.worker_tokens in the orchestrator config and worker.token in the worker config).

if you use private hugging face datasets/models, export HUGGINGFACE_HUB_TOKEN before starting anything.

start the orchestrator

caption-flow orchestrator --config my-orchestrator.yaml

start one or more vLLM workers

# gpu 0 on the same host
caption-flow worker --config my-worker.yaml --gpu-id 0

# your second GPU
caption-flow worker --config my-worker.yaml --gpu-id 1

# on a remote host
caption-flow worker --config my-worker.yaml --server ws://your.hostname.address:8765

(optional) start the monitor

caption-flow monitor --config my-monitor.yaml

export the data

% caption-flow export --help                                                                                                                                      
Usage: caption-flow export [OPTIONS]

  Export caption data to various formats.

Options:
  --format [jsonl|json|csv|txt|huggingface_hub|all] Export format (default: jsonl)

jsonl: create JSON line file in the specified --output path
csv: exports CSV-compatible data columns to the --output path containing incomplete metadata
json: creates a .json file for each sample inside the --output subdirectory containing complete metadata; useful for webdatasets
txt: creates .txt file for each sample inside the --output subdirectory containing ONLY captions
huggingface_hub: creates a dataset on Hugging Face Hub, possibly --private and --nsfw where necessary
all: creates all export formats in a specified --output directory

how it’s wired

orchestrator

websocket server (default 0.0.0.0:8765) with three client roles: workers, data-feeders, and admin.
dataset control: the orchestrator centrally defines the dataset (huggingface or local) and version/name. it chunk-slices shards and assigns work.
data serving to remote workers: local files can be captioned by remote workers that don't have access to the same files, automatically.
vLLM config broadcast: model, tp size, dtype, max seq len, memory targets, batching, sampling params, and inference prompts are all pushed to workers; workers can apply many changes without a model reload.
storage + checkpoints: captions buffer to disk with periodic checkpoints. chunk state is tracked so restarts don’t double-work.
auth: token lists for worker, monitor, and admin roles.

vLLM worker

one process per gpu. select the device with --gpu-id (or worker.gpu_id in YAML).
gets its marching orders from the orchestrator: dataset info, model, prompts, batch size, and sampling.
resilient: detects disconnects, abandons the current chunk cleanly, clears queues, reconnects, and resumes.
batched generate(): images are resized down for consistent batching; each image can get multiple captions (one per prompt).

dataset formats

huggingface hub or local based URL list datasets that are compatible with the datasets library
webdatasets shards containing full image data; also can be hosted on the hub
local folder filled with images; orchestrator will serve the data to workers

configuration path

config discovery order

for any component, the CLI looks for config in this order (first match wins):

--config /path/to/file.yaml
./<component>.yaml (current directory)
~/.caption-flow/<component>.yaml
$XDG_CONFIG_HOME/caption-flow/<component>.yaml
/etc/caption-flow/<component>.yaml
any $XDG_CONFIG_DIRS entries under caption-flow/
./examples/<component>.yaml (fallback)

tls / certificates

use the built-in helpers during development:

# self-signed certs for quick local testing
caption-flow generate_cert --self-signed --domain localhost --output-dir ./certs

# inspect any certificate file
caption-flow inspect_cert ./certs/fullchain.pem

then point the orchestrator at the resulting cert/key (or run --no-ssl for dev-only ws://).

tips & notes

multi-gpu: start one worker process per gpu (set --gpu-id or worker.gpu_id).
throughput: tune vllm.batch_size in the orchestrator config (or override with --batch-size at worker start). higher isn’t always better; watch VRAM.
prompts: add more strings under vllm.inference_prompts to get multiple captions per image; the worker returns only non-empty generations.
private HF: if your dataset/model needs auth, export HUGGINGFACE_HUB_TOKEN before caption-flow worker ....
self-signed ssl: pass --no-verify-ssl to workers/monitors in dev.
recovery: if you hard-crash mid-run, caption-flow scan_chunks --fix can reset abandoned chunks so the orchestrator can reissue them cleanly.

roadmap

hot config reload via the admin websocket path.
dedicated data-feeder clients (separate from gpu workers) that push samples into the orchestrator.
richer monitor TUI.

PRs welcome. keep it simple and fast.

architecture

┌─────────────┐     WebSocket      ┌─────────────┐
│   Worker    │◄──────────────────►│             │
│             │                    │             │     ┌──────────────┐
│             │◄───────────────────│             │────►│Arrow/Parquet │
└─────────────┘   HTTP (img data)  │ Orchestrator│     │   Storage    │
                                   │             │     └──────────────┘
┌─────────────┐                    │             │
│   Worker    │◄──────────────────►│             │
│             │                    │             │
│             │◄───────────────────│             │
└─────────────┘   HTTP (img data)  └─────────────┘
                                           ▲
┌─────────────┐                           │
│   Monitor   │◄──────────────────────────┘
└─────────────┘

Community Clusters

To contribute compute to a cluster:

Install caption-flow: pip install caption-flow
Get a worker token from the project maintainer
Run: caption-flow worker --server wss://project.domain.com:8765 --token YOUR_TOKEN

Your contributions will be tracked and attributed in the final dataset!

License

AGPLv3

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.4.2

Sep 12, 2025

0.4.1

Sep 11, 2025

0.4.0

Sep 10, 2025

0.3.4

Sep 5, 2025

0.3.3

Sep 4, 2025

This version

0.3.2

Sep 4, 2025

0.3.1

Sep 4, 2025

0.2.4

Aug 27, 2025

0.2.3

Aug 26, 2025

0.2.2

Aug 20, 2025

0.2.1

Aug 19, 2025

0.2.0

Aug 18, 2025

0.1.0

Aug 12, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

caption_flow-0.3.2.tar.gz (106.0 kB view details)

Uploaded Sep 4, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

caption_flow-0.3.2-py3-none-any.whl (116.8 kB view details)

Uploaded Sep 4, 2025 Python 3

File details

Details for the file caption_flow-0.3.2.tar.gz.

File metadata

Download URL: caption_flow-0.3.2.tar.gz
Upload date: Sep 4, 2025
Size: 106.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for caption_flow-0.3.2.tar.gz
Algorithm	Hash digest
SHA256	`30a47c366a57ce65c94646f39f44f10899bc945f78e68251a0f5419d924378ec`
MD5	`bf2d2ea75f39dbfac92c1aeeca030b9e`
BLAKE2b-256	`064f913b3ebddc096f67acbe14bebf84d1810c022ce1d9e8c5d93e053e2215e5`

See more details on using hashes here.

File details

Details for the file caption_flow-0.3.2-py3-none-any.whl.

File metadata

Download URL: caption_flow-0.3.2-py3-none-any.whl
Upload date: Sep 4, 2025
Size: 116.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for caption_flow-0.3.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0ddbef67e232ec23e3eb372f6d32538051549aefeccb4368ebde59d42f77d2a3`
MD5	`594e386ac42bde8f1748f6f26bbf8d78`
BLAKE2b-256	`0cf3288ca6a36f2c7c5677e7c5b728e7dd98cf3944f504fcc8245c39106bb0ad`

See more details on using hashes here.

caption-flow 0.3.2

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

CaptionFlow

install

quickstart (single box)

how it’s wired

orchestrator

vLLM worker

dataset formats

configuration path

config discovery order

tls / certificates

tips & notes

roadmap

architecture

Community Clusters

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes