Simple local web UI for captioning image/video datasets with optional local VLM auto-captioning
Project description
nori-captioner
Local Vision Caption Studio for Images and Video.
A web UI for captioning image/video datasets in-place using local VLMs.
Captions are saved as sidecar .txt files next to each media file.
Quick start
uv run nori-captioner
Scans the current directory (or a given path) recursively for images and videos and opens a local web UI.
uv run nori-captioner /path/to/dataset
Features
- Recursive directory scan — hidden directories are excluded
- Per-file metadata display: resolution, duration, frame count, fps
- Manual caption editing with autosave
- Upload images/videos via file picker or drag-and-drop
- Delete files (removes media and sidecar caption together)
- Auto-captioning queue with single-file and batch modes
- Configurable user prompt — editable in the UI and persisted to disk
- Pagination and filter by caption state (all / captioned / uncaptioned / queued)
Auto-captioning with local VLMs
Install VLM extras:
uv sync --extra vlm
Note: Qwen3-VL requires
torchvision, which is included in thevlmextra. On Linux x86_64, CUDA 12.8 wheels fortorchandtorchvisionare used automatically.
Optional 4-bit / 8-bit quantization:
uv sync --extra vlm --extra quantize
Run with a built-in model alias:
uv run nori-captioner --model qwen3-vl:8b
Or pass any Hugging Face model ID directly:
uv run nori-captioner --model your-org/your-vlm
Model aliases
| Alias | Model |
|---|---|
qwen3-vl:2b |
Qwen/Qwen3-VL-2B-Instruct |
qwen3-vl:4b |
Qwen/Qwen3-VL-4B-Instruct |
qwen3-vl:8b |
Qwen/Qwen3-VL-8B-Instruct |
qwen3-vl:32b |
Qwen/Qwen3-VL-32B-Instruct |
qwen3-vl:30b |
Qwen/Qwen3-VL-30B-A3B-Instruct |
qwen2.5-vl:3b |
Qwen/Qwen2.5-VL-3B-Instruct |
qwen2.5-vl:7b |
Qwen/Qwen2.5-VL-7B-Instruct |
qwen2.5-vl:72b |
Qwen/Qwen2.5-VL-72B-Instruct |
qwen2-vl:2b |
Qwen/Qwen2-VL-2B-Instruct |
qwen2-vl:7b |
Qwen/Qwen2-VL-7B-Instruct |
qwen2-vl:72b |
Qwen/Qwen2-VL-72B-Instruct |
gemma3:4b |
google/gemma-3-4b-it |
gemma3:12b |
google/gemma-3-12b-it |
gemma3:27b |
google/gemma-3-27b-it |
CLI options
| Option | Default | Description |
|---|---|---|
directory |
. |
Directory to scan |
--model |
none | Model alias or HF model ID |
--quantize |
none | 4 or 8 bit quantization |
--device |
auto |
auto, cuda, mps, or cpu |
--frames |
8 |
Video frames sampled per auto-caption |
--system-prompt |
built-in | System prompt for model behavior |
--prompt |
built-in | Captioning prompt (also editable in UI) |
--host |
127.0.0.1 |
Server bind address |
--port |
8765 |
Server port |
--no-browser |
false | Suppress automatic browser open |
Prompt persistence
The user prompt edited in the web UI is saved to .nori-captioner.settings.json in the scanned
directory and automatically restored on next launch.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file nori_captioner-0.1.2.tar.gz.
File metadata
- Download URL: nori_captioner-0.1.2.tar.gz
- Upload date:
- Size: 172.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
633cee69250d79bb526c1dc1c6287a009ddcfd82e01fdc714064015a18fc563d
|
|
| MD5 |
a67d194a91ba429ba7dcc3932963c910
|
|
| BLAKE2b-256 |
9c73af7aef5d39a7449798ea2b49c02351aae5b1da4a779ab1da2966029db993
|
File details
Details for the file nori_captioner-0.1.2-py3-none-any.whl.
File metadata
- Download URL: nori_captioner-0.1.2-py3-none-any.whl
- Upload date:
- Size: 24.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6d34721b3311374443f435df30a83216bae891d0d85dd9e9489c8c00c3ba12ab
|
|
| MD5 |
102f16be07ec9be074e2d7791d27fe94
|
|
| BLAKE2b-256 |
4e84b8d72a3315127dc8c281a03d70be76dbca13ee090be71367d96ab4f69d58
|