Structured image captioning for training and generation
Project description
Argus Lens
Structured image captioning for training and generation.
Quick Start
pip install argus-lens[openai]
from argus_lens import ArgusLens
engine = ArgusLens(backend="openai", api_key="sk-...")
result = engine.caption("photo.jpg", trigger_word="sks_person")
print(result.final_caption)
print(result.caption_variants["training"])
print(result.caption_variants["zeroshot"])
Features
- Multi-model backends: WD14, Florence-2 (local GPU/CPU) + OpenAI, HuggingFace, Replicate, NVIDIA NIM (cloud API)
- Structured captions: Category-bucketed variants (identity, wardrobe, pose, setting, lighting, action)
- Training-optimised: Tiered tag protection, omission cycles, CLIP/T5 token budgets, identity suppression
- Zero-shot variant: Identity-first, prose-preferred captions for generation without LoRA
- Hybrid pipelines: Mix local + cloud backends (e.g. WD14 tags + GPT-4o prose)
- Backend-aware budgets: Automatic token limits for SDXL (60), Flux (200), SD3 (200)
- CLI + Server: Command-line tool and optional FastAPI micro-server
- Export formats:
.txtsidecars, JSON, JSONL, CSV
Installation
pip handles all Python dependencies through extras. Pick the extras that match your use case:
# Assembly engine only (no model deps)
pip install argus-lens
# Local backends (GPU inference)
pip install argus-lens[local] # WD14 + Florence-2
pip install argus-lens[wd14] # WD14 only (CPU, no torch)
pip install argus-lens[torch] # Florence-2 only
# Cloud backends (no GPU needed)
pip install argus-lens[openai] # GPT-4o vision
pip install argus-lens[replicate] # Replicate API
# Server (FastAPI + uvicorn)
pip install argus-lens[server,local,openai]
# Everything
pip install argus-lens[all]
If you're adding argus-lens to an existing project, just add e.g. argus-lens[openai] to your requirements.txt -- pip resolves all transitive deps automatically.
System dependencies for local GPU backends
Cloud-only users ([openai], [replicate]) need no system packages -- skip this section.
Local backends ([local], [wd14], [torch]) require system libraries for image processing and (optionally) CUDA for GPU acceleration. On Ubuntu/Debian:
sudo apt install -y \
libgl1 libglib2.0-0 libxcb1 libsm6 libxext6 libxrender1
For GPU inference, you also need:
- NVIDIA GPU drivers (check with
nvidia-smi) - CUDA runtime (the
Dockerfile.gpu-basein this repo usesnvidia/cuda:12.4.1-runtime-ubuntu22.04as a reference) - NVIDIA Container Toolkit (for Docker deployment only)
If you already have torch and CUDA working in your environment, you're set -- the pip extras handle the rest.
Usage
Python API
Import and use directly in your code. This is the primary interface.
from argus_lens import ArgusLens
# Cloud backend -- works anywhere, no GPU
engine = ArgusLens(backend="openai", api_key="sk-...")
result = engine.caption("photo.jpg", trigger_word="sks_person")
# Local backend -- needs torch + GPU/CPU
engine = ArgusLens(backend="hybrid")
result = engine.caption("photo.jpg", trigger_word="sks_person")
# Batch processing
results = engine.caption_directory("./images/", output_format="txt")
CLI
# Caption a single image
argus-lens caption photo.jpg --trigger sks_person --backend openai
# Caption a directory, output as txt sidecars
argus-lens caption ./images/ --format txt --backend hybrid
# List available backends
argus-lens backends
HTTP Server
Run the built-in FastAPI server for frontend consumers (e.g. argus-vision-demo):
pip install argus-lens[server,local]
argus-lens serve --cors --port 8080
Endpoints:
POST /caption-- multipart file uploadPOST /caption/url-- JSON body with image URLPOST /caption/batch-- multiple file uploadPOST /caption/stream-- NDJSON streaming for batchGET /backends-- list available backends
Docker
For fresh hosts or isolated deployment with GPU passthrough. No pip install needed on the host.
# Build and run
./build-docker.sh
docker compose up
This builds a CUDA 12.4 base image, installs all extras into it, and runs argus-lens serve on port 8080.
Configuration
Copy or create a .env file for the Docker deployment:
| Variable | Default | Description |
|---|---|---|
ARGUS_BACKEND |
hybrid |
Captioning backend (hybrid, wd14, florence2, openai, etc.) |
ARGUS_API_KEY |
-- | API key for cloud backends |
ARGUS_PORT |
8080 |
Host port for the server |
WD14_MODEL_DIR |
~/.cache/wd14_tagger/ |
WD14 ONNX model directory (auto-downloads on first use) |
HF_HOME |
~/.cache/huggingface |
HuggingFace model cache (auto-downloads on first use) |
HF_TRUST_REMOTE_CODE |
false |
Only needed for legacy microsoft/Florence-2-* weights. See Security |
GPU prerequisites
# Verify NVIDIA driver
nvidia-smi
# Install container toolkit (if not already)
sudo apt install nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
Model caching
The docker-compose.yaml bind-mounts ~/.cache/wd14_tagger and ~/.cache/huggingface from the host so models persist across container rebuilds. Models auto-download on first use if not already cached.
Security
trust_remote_code and Florence-2
By default, the Florence-2 backend uses florence-community/Florence-2-base weights which are natively supported in transformers -- no trust_remote_code needed.
The legacy microsoft/Florence-2-base weights require HF_TRUST_REMOTE_CODE=true, which executes arbitrary Python from the model repository at load time. Only enable this for models you trust. WD14 uses a static ONNX model and never runs remote code.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file argus_lens-0.1.0.tar.gz.
File metadata
- Download URL: argus_lens-0.1.0.tar.gz
- Upload date:
- Size: 39.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ea2f9f8191c67cea97d422411d14d3b258ba169bfc3935d57ef974b5a5c57a25
|
|
| MD5 |
46d2b57761dcbe1c50a6006fcf71c15e
|
|
| BLAKE2b-256 |
438f7ebb553d4d07402a36e64c7b6e7b326fcd233b49746fe1c472f18c4edc58
|
Provenance
The following attestation bundles were made for argus_lens-0.1.0.tar.gz:
Publisher:
release.yml on smk762/argus-lens
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
argus_lens-0.1.0.tar.gz -
Subject digest:
ea2f9f8191c67cea97d422411d14d3b258ba169bfc3935d57ef974b5a5c57a25 - Sigstore transparency entry: 1327169425
- Sigstore integration time:
-
Permalink:
smk762/argus-lens@97aff55177f1a14596734115e645fd5a063b5337 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/smk762
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@97aff55177f1a14596734115e645fd5a063b5337 -
Trigger Event:
push
-
Statement type:
File details
Details for the file argus_lens-0.1.0-py3-none-any.whl.
File metadata
- Download URL: argus_lens-0.1.0-py3-none-any.whl
- Upload date:
- Size: 48.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
66c408fe9a49ece797fa72f6afdb528fe1eed0556a7382f745b788e8e4765bc1
|
|
| MD5 |
1369cc90a5621cc6e0ab5167b5171b25
|
|
| BLAKE2b-256 |
8d6ffdcf119402c95758143e921fd60f05d24e817b63e1d0e784a003c894288d
|
Provenance
The following attestation bundles were made for argus_lens-0.1.0-py3-none-any.whl:
Publisher:
release.yml on smk762/argus-lens
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
argus_lens-0.1.0-py3-none-any.whl -
Subject digest:
66c408fe9a49ece797fa72f6afdb528fe1eed0556a7382f745b788e8e4765bc1 - Sigstore transparency entry: 1327169552
- Sigstore integration time:
-
Permalink:
smk762/argus-lens@97aff55177f1a14596734115e645fd5a063b5337 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/smk762
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@97aff55177f1a14596734115e645fd5a063b5337 -
Trigger Event:
push
-
Statement type: