Structured image captioning for training and generation

These details have been verified by PyPI

Project links

Repository

GitHub Statistics

Maintainers

dragonhound_dev

These details have not been verified by PyPI

Project description

Argus Lens

Structured image captioning for training and generation.

Quick Start

pip install argus-lens[openai]

from argus_lens import ArgusLens

engine = ArgusLens(backend="openai", api_key="sk-...")
result = engine.caption("photo.jpg", trigger_word="sks_person")

print(result.final_caption)
print(result.caption_variants["training"])
print(result.caption_variants["zeroshot"])

Features

Multi-model backends: WD14, Florence-2 (local GPU/CPU) + OpenAI, HuggingFace, Replicate, NVIDIA NIM (cloud API)
Structured captions: Category-bucketed variants (identity, wardrobe, pose, setting, lighting, action)
Training-optimised: Tiered tag protection, omission cycles, CLIP/T5 token budgets, identity suppression
Zero-shot variant: Identity-first, prose-preferred captions for generation without LoRA
Hybrid pipelines: Mix local + cloud backends (e.g. WD14 tags + GPT-4o prose)
Backend-aware budgets: Automatic token limits for SDXL (60), Flux (200), SD3 (200)
CLI + Server: Command-line tool and optional FastAPI micro-server
Export formats: .txt sidecars, JSON, JSONL, CSV

Installation

pip handles all Python dependencies through extras. Pick the extras that match your use case:

# Assembly engine only (no model deps)
pip install argus-lens

# Local backends (GPU inference)
pip install argus-lens[local]      # WD14 + Florence-2
pip install argus-lens[wd14]       # WD14 only (CPU, no torch)
pip install argus-lens[torch]      # Florence-2 only

# Cloud backends (no GPU needed)
pip install argus-lens[openai]     # GPT-4o vision
pip install argus-lens[replicate]  # Replicate API

# Server (FastAPI + uvicorn)
pip install argus-lens[server,local,openai]

# Everything
pip install argus-lens[all]

If you're adding argus-lens to an existing project, just add e.g. argus-lens[openai] to your requirements.txt -- pip resolves all transitive deps automatically.

System dependencies for local GPU backends

Cloud-only users ([openai], [replicate]) need no system packages -- skip this section.

Local backends ([local], [wd14], [torch]) require system libraries for image processing and (optionally) CUDA for GPU acceleration. On Ubuntu/Debian:

sudo apt install -y \
    libgl1 libglib2.0-0 libxcb1 libsm6 libxext6 libxrender1

For GPU inference, you also need:

NVIDIA GPU drivers (check with nvidia-smi)
CUDA runtime (the Dockerfile.gpu-base in this repo uses nvidia/cuda:12.4.1-runtime-ubuntu22.04 as a reference)
NVIDIA Container Toolkit (for Docker deployment only)

If you already have torch and CUDA working in your environment, you're set -- the pip extras handle the rest.

Usage

Python API

Import and use directly in your code. This is the primary interface.

from argus_lens import ArgusLens

# Cloud backend -- works anywhere, no GPU
engine = ArgusLens(backend="openai", api_key="sk-...")
result = engine.caption("photo.jpg", trigger_word="sks_person")

# Local backend -- needs torch + GPU/CPU
engine = ArgusLens(backend="hybrid")
result = engine.caption("photo.jpg", trigger_word="sks_person")

# Batch processing
results = engine.caption_directory("./images/", output_format="txt")

CLI

# Caption a single image
argus-lens caption photo.jpg --trigger sks_person --backend openai

# Caption a directory, output as txt sidecars
argus-lens caption ./images/ --format txt --backend hybrid

# List available backends
argus-lens backends

HTTP Server

Run the built-in FastAPI server for frontend consumers (e.g. argus-vision-demo):

pip install argus-lens[server,local]
argus-lens serve --cors --port 8080

Endpoints:

POST /caption -- multipart file upload
POST /caption/url -- JSON body with image URL
POST /caption/batch -- multiple file upload
POST /caption/stream -- NDJSON streaming for batch
GET /backends -- list available backends

Docker

For fresh hosts or isolated deployment with GPU passthrough. No pip install needed on the host.

# Build and run
./build-docker.sh
docker compose up

This builds a CUDA 12.4 base image, installs all extras into it, and runs argus-lens serve on port 8080.

Configuration

Copy or create a .env file for the Docker deployment:

Variable	Default	Description
`ARGUS_BACKEND`	`hybrid`	Captioning backend (`hybrid`, `wd14`, `florence2`, `openai`, etc.)
`ARGUS_API_KEY`	--	API key for cloud backends
`ARGUS_PORT`	`8080`	Host port for the server
`WD14_MODEL_DIR`	`~/.cache/wd14_tagger/`	WD14 ONNX model directory (auto-downloads on first use)
`HF_HOME`	`~/.cache/huggingface`	HuggingFace model cache (auto-downloads on first use)
`HF_TRUST_REMOTE_CODE`	`false`	Only needed for legacy `microsoft/Florence-2-*` weights. See Security

GPU prerequisites

# Verify NVIDIA driver
nvidia-smi

# Install container toolkit (if not already)
sudo apt install nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

Model caching

The docker-compose.yaml bind-mounts ~/.cache/wd14_tagger and ~/.cache/huggingface from the host so models persist across container rebuilds. Models auto-download on first use if not already cached.

Security

`trust_remote_code` and Florence-2

By default, the Florence-2 backend uses florence-community/Florence-2-base weights which are natively supported in transformers -- no trust_remote_code needed.

The legacy microsoft/Florence-2-base weights require HF_TRUST_REMOTE_CODE=true, which executes arbitrary Python from the model repository at load time. Only enable this for models you trust. WD14 uses a static ONNX model and never runs remote code.

License

MIT

Project details

These details have been verified by PyPI

Project links

Repository

GitHub Statistics

Maintainers

dragonhound_dev

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0

Apr 17, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

argus_lens-0.1.0.tar.gz (39.3 kB view details)

Uploaded Apr 17, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

argus_lens-0.1.0-py3-none-any.whl (48.6 kB view details)

Uploaded Apr 17, 2026 Python 3

File details

Details for the file argus_lens-0.1.0.tar.gz.

File metadata

Download URL: argus_lens-0.1.0.tar.gz
Upload date: Apr 17, 2026
Size: 39.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for argus_lens-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`ea2f9f8191c67cea97d422411d14d3b258ba169bfc3935d57ef974b5a5c57a25`
MD5	`46d2b57761dcbe1c50a6006fcf71c15e`
BLAKE2b-256	`438f7ebb553d4d07402a36e64c7b6e7b326fcd233b49746fe1c472f18c4edc58`

See more details on using hashes here.

Provenance

The following attestation bundles were made for argus_lens-0.1.0.tar.gz:

Publisher: release.yml on smk762/argus-lens

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: argus_lens-0.1.0.tar.gz
- Subject digest: ea2f9f8191c67cea97d422411d14d3b258ba169bfc3935d57ef974b5a5c57a25
- Sigstore transparency entry: 1327169425
- Sigstore integration time: Apr 17, 2026
Source repository:
- Permalink: smk762/argus-lens@97aff55177f1a14596734115e645fd5a063b5337
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/smk762
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@97aff55177f1a14596734115e645fd5a063b5337
- Trigger Event: push

File details

Details for the file argus_lens-0.1.0-py3-none-any.whl.

File metadata

Download URL: argus_lens-0.1.0-py3-none-any.whl
Upload date: Apr 17, 2026
Size: 48.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for argus_lens-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`66c408fe9a49ece797fa72f6afdb528fe1eed0556a7382f745b788e8e4765bc1`
MD5	`1369cc90a5621cc6e0ab5167b5171b25`
BLAKE2b-256	`8d6ffdcf119402c95758143e921fd60f05d24e817b63e1d0e784a003c894288d`

See more details on using hashes here.

Provenance

The following attestation bundles were made for argus_lens-0.1.0-py3-none-any.whl:

Publisher: release.yml on smk762/argus-lens

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: argus_lens-0.1.0-py3-none-any.whl
- Subject digest: 66c408fe9a49ece797fa72f6afdb528fe1eed0556a7382f745b788e8e4765bc1
- Sigstore transparency entry: 1327169552
- Sigstore integration time: Apr 17, 2026
Source repository:
- Permalink: smk762/argus-lens@97aff55177f1a14596734115e645fd5a063b5337
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/smk762
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@97aff55177f1a14596734115e645fd5a063b5337
- Trigger Event: push

argus-lens 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Argus Lens

Quick Start

Features

Installation

System dependencies for local GPU backends

Usage

Python API

CLI

HTTP Server

Docker

Configuration

GPU prerequisites

Model caching

Security

trust_remote_code and Florence-2

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

`trust_remote_code` and Florence-2