Skip to main content

Self-contained distributed community captioning system

Project description

CaptionFlow

scalable, fault-tolerant vLLM-powered image captioning.

a fast websocket-based orchestrator paired with lightweight gpu workers achieves exceptional performance for batched requests through vLLM.

  • orchestrator: hands out work in chunked shards, collects captions, checkpoints progress, and keeps simple stats.
  • workers (vLLM): connect to the orchestrator, stream in image samples, batch them, and generate 1..N captions per image using prompts supplied by the orchestrator.
  • config-driven: all components read YAML config; flags can override.

no conda. just venv + pip.


install

python -m venv .venv
source .venv/bin/activate  # windows: .venv\Scripts\activate
pip install --upgrade pip
pip install -e .  # installs the `caption-flow` command

quickstart (single box)

  1. copy + edit the sample configs
cp examples/orchestrator/local_image_files.yaml my-orchestrator.yaml
cp examples/worker.yaml my-worker.yaml
cp examples/monitor.yaml my-monitor.yaml   # optional terminal interface

set a unique shared token in both my-orchestrator.yaml and my-worker.yaml (see auth.worker_tokens in the orchestrator config and worker.token in the worker config).

if you use private hugging face datasets/models, export HUGGINGFACE_HUB_TOKEN before starting anything.

  1. start the orchestrator
caption-flow orchestrator --config my-orchestrator.yaml
  1. start one or more vLLM workers
# gpu 0 on the same host
caption-flow worker --config my-worker.yaml --gpu-id 0

# your second GPU
caption-flow worker --config my-worker.yaml --gpu-id 1

# on a remote host
caption-flow worker --config my-worker.yaml --server ws://your.hostname.address:8765
  1. (optional) start the monitor
caption-flow monitor --config my-monitor.yaml
  1. export the data
% caption-flow export --help                                                                                                                                      
Usage: caption-flow export [OPTIONS]

  Export caption data to various formats.

Options:
  --format [jsonl|json|csv|txt|huggingface_hub|all] Export format (default: jsonl)
  • jsonl: create JSON line file in the specified --output path
  • csv: exports CSV-compatible data columns to the --output path containing incomplete metadata
  • json: creates a .json file for each sample inside the --output subdirectory containing complete metadata; useful for webdatasets
  • txt: creates .txt file for each sample inside the --output subdirectory containing ONLY captions
  • huggingface_hub: creates a dataset on Hugging Face Hub, possibly --private and --nsfw where necessary
  • all: creates all export formats in a specified --output directory

how it’s wired

orchestrator

  • websocket server (default 0.0.0.0:8765) with three client roles: workers, data-feeders, and admin.
  • dataset control: the orchestrator centrally defines the dataset (huggingface or local) and version/name. it chunk-slices shards and assigns work.
  • data serving to remote workers: local files can be captioned by remote workers that don't have access to the same files, automatically.
  • vLLM config broadcast: model, tp size, dtype, max seq len, memory targets, batching, sampling params, and inference prompts are all pushed to workers; workers can apply many changes without a model reload.
  • storage + checkpoints: captions buffer to disk with periodic checkpoints. chunk state is tracked so restarts don’t double-work.
  • auth: token lists for worker, monitor, and admin roles.

vLLM worker

  • one process per gpu. select the device with --gpu-id (or worker.gpu_id in YAML).
  • gets its marching orders from the orchestrator: dataset info, model, prompts, batch size, and sampling.
  • resilient: detects disconnects, abandons the current chunk cleanly, clears queues, reconnects, and resumes.
  • batched generate(): images are resized down for consistent batching; each image can get multiple captions (one per prompt).

dataset formats

  • huggingface hub or local based URL list datasets that are compatible with the datasets library
  • webdatasets shards containing full image data; also can be hosted on the hub
  • local folder filled with images; orchestrator will serve the data to workers

configuration path

config discovery order

for any component, the CLI looks for config in this order (first match wins):

  1. --config /path/to/file.yaml
  2. ./<component>.yaml (current directory)
  3. ~/.caption-flow/<component>.yaml
  4. $XDG_CONFIG_HOME/caption-flow/<component>.yaml
  5. /etc/caption-flow/<component>.yaml
  6. any $XDG_CONFIG_DIRS entries under caption-flow/
  7. ./examples/<component>.yaml (fallback)

tls / certificates

use the built-in helpers during development:

# self-signed certs for quick local testing
caption-flow generate_cert --self-signed --domain localhost --output-dir ./certs

# inspect any certificate file
caption-flow inspect_cert ./certs/fullchain.pem

then point the orchestrator at the resulting cert/key (or run --no-ssl for dev-only ws://).


tips & notes

  • multi-gpu: start one worker process per gpu (set --gpu-id or worker.gpu_id).
  • throughput: tune vllm.batch_size in the orchestrator config (or override with --batch-size at worker start). higher isn’t always better; watch VRAM.
  • prompts: add more strings under vllm.inference_prompts to get multiple captions per image; the worker returns only non-empty generations.
  • private HF: if your dataset/model needs auth, export HUGGINGFACE_HUB_TOKEN before caption-flow worker ....
  • self-signed ssl: pass --no-verify-ssl to workers/monitors in dev.
  • recovery: if you hard-crash mid-run, caption-flow scan_chunks --fix can reset abandoned chunks so the orchestrator can reissue them cleanly.

roadmap

  • hot config reload via the admin websocket path.
  • dedicated data-feeder clients (separate from gpu workers) that push samples into the orchestrator.
  • richer monitor TUI.

PRs welcome. keep it simple and fast.

architecture

┌─────────────┐     WebSocket      ┌─────────────┐
│   Worker    │◄──────────────────►│             │
│             │                    │             │     ┌──────────────┐
│             │◄───────────────────│             │────►│Arrow/Parquet │
└─────────────┘   HTTP (img data)  │ Orchestrator│     │   Storage    │
                                   │             │     └──────────────┘
┌─────────────┐                    │             │
│   Worker    │◄──────────────────►│             │
│             │                    │             │
│             │◄───────────────────│             │
└─────────────┘   HTTP (img data)  └─────────────┘
                                           ▲
┌─────────────┐                           │
│   Monitor   │◄──────────────────────────┘
└─────────────┘

Community Clusters

To contribute compute to a cluster:

  1. Install caption-flow: pip install caption-flow
  2. Get a worker token from the project maintainer
  3. Run: caption-flow worker --server wss://project.domain.com:8765 --token YOUR_TOKEN

Your contributions will be tracked and attributed in the final dataset!

License

AGPLv3

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

caption_flow-0.3.1.tar.gz (105.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

caption_flow-0.3.1-py3-none-any.whl (116.0 kB view details)

Uploaded Python 3

File details

Details for the file caption_flow-0.3.1.tar.gz.

File metadata

  • Download URL: caption_flow-0.3.1.tar.gz
  • Upload date:
  • Size: 105.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.13

File hashes

Hashes for caption_flow-0.3.1.tar.gz
Algorithm Hash digest
SHA256 a266f79211180282da4e156b8c2bcdb4ada794d34ff75b533f98ddd5e70eee76
MD5 38a58e55dd46e8be52bb99c37ccb786a
BLAKE2b-256 858b2a6b232dc48256bb2bc3b16c12c11edb056188ec2a15afa0928aad038494

See more details on using hashes here.

File details

Details for the file caption_flow-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: caption_flow-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 116.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.13

File hashes

Hashes for caption_flow-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 5c566e1750c33fc5d7dc52c966212a6949a41319d755a9ccf544410d13572c3b
MD5 d93c601e9fa2bb9559f817cc47e5e1f6
BLAKE2b-256 775256277ef2983a29cacedda2375542374e8b929ff35c1514a2762e73324a90

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page