Skip to main content

Structured image captioning for training and generation

Project description

Argus Lens

Structured image captioning for training and generation.

Quick Start

pip install argus-lens[openai]
from argus_lens import ArgusLens

engine = ArgusLens(backend="openai", api_key="sk-...")
result = engine.caption("photo.jpg", trigger_word="sks_person")

print(result.final_caption)
print(result.caption_variants["training"])
print(result.caption_variants["zeroshot"])

Features

  • Multi-model backends: WD14, Florence-2 (local GPU/CPU) + OpenAI, HuggingFace, Replicate, NVIDIA NIM (cloud API)
  • Structured captions: Category-bucketed variants (identity, wardrobe, pose, setting, lighting, action)
  • Training-optimised: Tiered tag protection, omission cycles, CLIP/T5 token budgets, identity suppression
  • Zero-shot variant: Identity-first, prose-preferred captions for generation without LoRA
  • Hybrid pipelines: Mix local + cloud backends (e.g. WD14 tags + GPT-4o prose)
  • Backend-aware budgets: Automatic token limits for SDXL (60), Flux (200), SD3 (200)
  • CLI + Server: Command-line tool and optional FastAPI micro-server
  • Export formats: .txt sidecars, JSON, JSONL, CSV

Installation

pip handles all Python dependencies through extras. Pick the extras that match your use case:

# Assembly engine only (no model deps)
pip install argus-lens

# Local backends (GPU inference)
pip install argus-lens[local]      # WD14 + Florence-2
pip install argus-lens[wd14]       # WD14 only (CPU, no torch)
pip install argus-lens[torch]      # Florence-2 only

# Cloud backends (no GPU needed)
pip install argus-lens[openai]     # GPT-4o vision
pip install argus-lens[replicate]  # Replicate API

# Server (FastAPI + uvicorn)
pip install argus-lens[server,local,openai]

# Everything
pip install argus-lens[all]

If you're adding argus-lens to an existing project, just add e.g. argus-lens[openai] to your requirements.txt -- pip resolves all transitive deps automatically.

System dependencies for local GPU backends

Cloud-only users ([openai], [replicate]) need no system packages -- skip this section.

Local backends ([local], [wd14], [torch]) require system libraries for image processing and (optionally) CUDA for GPU acceleration. On Ubuntu/Debian:

sudo apt install -y \
    libgl1 libglib2.0-0 libxcb1 libsm6 libxext6 libxrender1

For GPU inference, you also need:

  • NVIDIA GPU drivers (check with nvidia-smi)
  • CUDA runtime (the Dockerfile.gpu-base in this repo uses nvidia/cuda:12.4.1-runtime-ubuntu22.04 as a reference)
  • NVIDIA Container Toolkit (for Docker deployment only)

If you already have torch and CUDA working in your environment, you're set -- the pip extras handle the rest.

Usage

Python API

Import and use directly in your code. This is the primary interface.

from argus_lens import ArgusLens

# Cloud backend -- works anywhere, no GPU
engine = ArgusLens(backend="openai", api_key="sk-...")
result = engine.caption("photo.jpg", trigger_word="sks_person")

# Local backend -- needs torch + GPU/CPU
engine = ArgusLens(backend="hybrid")
result = engine.caption("photo.jpg", trigger_word="sks_person")

# Batch processing
results = engine.caption_directory("./images/", output_format="txt")

CLI

# Caption a single image
argus-lens caption photo.jpg --trigger sks_person --backend openai

# Caption a directory, output as txt sidecars
argus-lens caption ./images/ --format txt --backend hybrid

# List available backends
argus-lens backends

HTTP Server

Run the built-in FastAPI server for frontend consumers (e.g. argus-vision-demo):

pip install argus-lens[server,local]
argus-lens serve --cors --port 8080

Endpoints:

  • POST /caption -- multipart file upload
  • POST /caption/url -- JSON body with image URL
  • POST /caption/batch -- multiple file upload
  • POST /caption/stream -- NDJSON streaming for batch
  • GET /backends -- list available backends

Docker

For fresh hosts or isolated deployment with GPU passthrough. No pip install needed on the host.

# Build and run
./build-docker.sh
docker compose up

This builds a CUDA 12.4 base image, installs all extras into it, and runs argus-lens serve on port 8080.

Configuration

Copy or create a .env file for the Docker deployment:

Variable Default Description
ARGUS_BACKEND hybrid Captioning backend (hybrid, wd14, florence2, openai, etc.)
ARGUS_API_KEY -- API key for cloud backends
ARGUS_PORT 8080 Host port for the server
WD14_MODEL_DIR ~/.cache/wd14_tagger/ WD14 ONNX model directory (auto-downloads on first use)
HF_HOME ~/.cache/huggingface HuggingFace model cache (auto-downloads on first use)
HF_TRUST_REMOTE_CODE false Only needed for legacy microsoft/Florence-2-* weights. See Security

GPU prerequisites

# Verify NVIDIA driver
nvidia-smi

# Install container toolkit (if not already)
sudo apt install nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

Model caching

The docker-compose.yaml bind-mounts ~/.cache/wd14_tagger and ~/.cache/huggingface from the host so models persist across container rebuilds. Models auto-download on first use if not already cached.

Security

trust_remote_code and Florence-2

By default, the Florence-2 backend uses florence-community/Florence-2-base weights which are natively supported in transformers -- no trust_remote_code needed.

The legacy microsoft/Florence-2-base weights require HF_TRUST_REMOTE_CODE=true, which executes arbitrary Python from the model repository at load time. Only enable this for models you trust. WD14 uses a static ONNX model and never runs remote code.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

argus_lens-0.1.0.tar.gz (39.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

argus_lens-0.1.0-py3-none-any.whl (48.6 kB view details)

Uploaded Python 3

File details

Details for the file argus_lens-0.1.0.tar.gz.

File metadata

  • Download URL: argus_lens-0.1.0.tar.gz
  • Upload date:
  • Size: 39.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for argus_lens-0.1.0.tar.gz
Algorithm Hash digest
SHA256 ea2f9f8191c67cea97d422411d14d3b258ba169bfc3935d57ef974b5a5c57a25
MD5 46d2b57761dcbe1c50a6006fcf71c15e
BLAKE2b-256 438f7ebb553d4d07402a36e64c7b6e7b326fcd233b49746fe1c472f18c4edc58

See more details on using hashes here.

Provenance

The following attestation bundles were made for argus_lens-0.1.0.tar.gz:

Publisher: release.yml on smk762/argus-lens

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file argus_lens-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: argus_lens-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 48.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for argus_lens-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 66c408fe9a49ece797fa72f6afdb528fe1eed0556a7382f745b788e8e4765bc1
MD5 1369cc90a5621cc6e0ab5167b5171b25
BLAKE2b-256 8d6ffdcf119402c95758143e921fd60f05d24e817b63e1d0e784a003c894288d

See more details on using hashes here.

Provenance

The following attestation bundles were made for argus_lens-0.1.0-py3-none-any.whl:

Publisher: release.yml on smk762/argus-lens

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page