Image captioning library combining Florence2, WD14, and VLM for comprehensive image analysis

These details have not been verified by PyPI

Project description

capflow

A multi-model image captioning library that combines Florence2, WD14, and Vision Language Models for comprehensive image analysis.

Features

Florence2: Detailed image captions with conservative generation parameters
WD14: Danbooru-style tagging for anime and illustrations
VLM: Async-first verification layer that resolves contradictions between models
EXIF extraction: Camera, lens, and capture settings metadata
Cross-platform support (CUDA, MPS, CPU)
Optimized prompts for AI image generation (Stable Diffusion, Midjourney, FLUX)

Installation

pip install capflow

Or with uv:

uv add capflow

Quick Start

Async (Recommended)

import os
import capflow as cf

# Initialize models (API key required for VLM)
florence2 = cf.Florence2()
wd14 = cf.WD14()
vlm = cf.VLM(api_key=os.getenv("OPENAI_API_KEY"))

# Generate captions
image_path = "your_image.jpg"

# Florence2: Detailed caption (sync)
caption = florence2.generate_caption(image_path, task="more_detailed_caption")

# WD14: Danbooru-style tags (sync)
tags = wd14.generate_tags(image_path)

# VLM: Verified and enhanced description (async)
description = await vlm.generate_caption(
    image_path,
    context=f"Florence2: {caption}\nWD14 Tags: {', '.join(tags)}"
)

# Extract EXIF metadata (sync)
exif_data = cf.extract_exif(image_path)
camera_info = cf.get_camera_info(exif_data)
settings = cf.get_capture_settings(exif_data)

Synchronous

import os
import capflow as cf

# Initialize models
florence2 = cf.Florence2()
wd14 = cf.WD14()
vlm = cf.VLM(api_key=os.getenv("OPENAI_API_KEY"))

image_path = "your_image.jpg"

# All synchronous calls
caption = florence2.generate_caption(image_path, task="more_detailed_caption")
tags = wd14.generate_tags(image_path)

# Use _sync methods for VLM
description = vlm.generate_caption_sync(
    image_path,
    context=f"Florence2: {caption}\nWD14 Tags: {', '.join(tags)}"
)

API Keys

API keys must be passed explicitly from your application:

import os
import capflow as cf

# Pass API key from environment variable
vlm = cf.VLM(api_key=os.getenv("OPENAI_API_KEY"))

# Or pass directly (not recommended for production)
vlm = cf.VLM(api_key="your-api-key")

VLM Async/Sync Methods

VLM provides both async (primary) and sync methods:

# Async methods (recommended)
await vlm.generate_caption(image)
await vlm.generate_caption_with_tags(image, tags)
await vlm.refine_caption(image, draft)

# Sync methods (works even from async context)
vlm.generate_caption_sync(image)
vlm.generate_caption_with_tags_sync(image, tags)
vlm.refine_caption_sync(image, draft)

The _sync methods automatically detect if they're running in an async context and handle it correctly, so they won't raise "event loop already running" errors.

Windows Ctrl+C Support

For proper Ctrl+C handling on Windows, use the run_async() helper:

import capflow as cf

vlm = cf.VLM(api_key="...")

async def process_images():
    for image in images:
        caption = await vlm.generate_caption(image)
        print(caption)

# Works correctly with Ctrl+C on Windows
cf.run_async(process_images())

This helper sets the proper event loop policy on Windows to ensure KeyboardInterrupt is handled correctly.

How It Works

capflow uses a three-stage pipeline:

WD14: Generates high-confidence tags (e.g., 1girl, solo, outdoors)
Florence2: Creates detailed natural language captions
VLM: Examines the image directly and resolves contradictions between WD14 and Florence2, converting tags to natural English

Example

Input: Image of a girl jumping outdoors

WD14 Output:

1girl, solo, dress, outdoors, jumping, motion_blur

Florence2 Output:

A young woman in an orange dress jumping in the air. There are a few people walking around.

VLM Final Output:

A photograph of a single woman mid-air, jumping in a rust-orange dress, barefoot with curly auburn hair,
dynamic extended-arms pose captured with slight motion blur, main subject left-of-center, low-angle wide shot,
foreground white marble steps, manicured hedges and formal garden behind her, large historic white stone
cathedral in the midground, overcast cloudy sky, natural daylight, joyful carefree atmosphere.

Note: VLM correctly identified "solo" from WD14 and ignored Florence2's hallucinated "a few people walking around".

License

MIT

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.4.0

Oct 21, 2025

0.3.2

Oct 20, 2025

0.3.1

Oct 20, 2025

0.3.0

Oct 20, 2025

This version

0.2.1

Oct 15, 2025

0.2.0

Oct 15, 2025

0.1.0

Oct 15, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

capflow-0.2.1.tar.gz (562.4 kB view details)

Uploaded Oct 15, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

capflow-0.2.1-py3-none-any.whl (17.3 kB view details)

Uploaded Oct 15, 2025 Python 3

File details

Details for the file capflow-0.2.1.tar.gz.

File metadata

Download URL: capflow-0.2.1.tar.gz
Upload date: Oct 15, 2025
Size: 562.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for capflow-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`97705de3715001c35c74f4d5ee357cdde7e17cc5ec1f91a275673ea018b36ad3`
MD5	`f3689f370c0f510d0e26a291e9bdf41e`
BLAKE2b-256	`132821d4063e8bcb54e8a6ada2f7ec93963e3863545798f5f1f6b3038f9688ad`

See more details on using hashes here.

File details

Details for the file capflow-0.2.1-py3-none-any.whl.

File metadata

Download URL: capflow-0.2.1-py3-none-any.whl
Upload date: Oct 15, 2025
Size: 17.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for capflow-0.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f32e559c4da5a181a9dde80bef47554590c50c81a99ad5f216518598e6121746`
MD5	`35e129368c778091e075429c1f1a7182`
BLAKE2b-256	`e700e5f8b5d3717ed2dabdf6962e54dc5fa73c0d3ed97bd2121d312ffe128f51`

See more details on using hashes here.

capflow 0.2.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

capflow

Features

Installation

Quick Start

Async (Recommended)

Synchronous

API Keys

VLM Async/Sync Methods

Windows Ctrl+C Support

How It Works

Example

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes