Resumable multimodal-LLM annotator and embedder for folders of audio or image files.
Project description
mllm-annotator
A small, resumable tool for sending folders of audio or image files to a multimodal LLM for automatic annotation, plus an embedding + UMAP visualization workflow. Gemini is the current backend; the design keeps the provider behind a thin seam so others can be added later.
It ships both a command-line tool and a desktop GUI.
Install
# CLI only
pip install mllm-annotator
# with the desktop GUI and the embed/visualize feature
pip install "mllm-annotator[ui,viz]"
Or, for development from a clone:
uv sync --extra ui --extra viz
The embed/visualize feature also needs ffmpeg on your PATH to handle
audio formats Gemini can't embed directly (e.g. .aac, .opus). It is an
optional system dependency, not a pip package; without it those files are
skipped.
API key
Provide a Gemini API key in any one of these ways (checked in this order):
-
environment variable
GEMINI_API_KEY(orGOOGLE_API_KEY):$env:GEMINI_API_KEY="your_api_key"
-
a
.envfile in the current working directory:GEMINI_API_KEY=your_api_key
-
saved from inside the GUI — click Set API Key in the top-left corner (it also opens automatically on first launch if no key is found), paste the key, and it is stored securely in your OS keyring (Windows Credential Manager / macOS Keychain / Linux Secret Service). No plaintext file is written.
.env is ignored by git, and keys are never written into the built package.
Command line
mllm-annotator --help
Examples
Horse cough annotation:
mllm-annotator `
--input-folder "C:\data\horse_audio" `
--media-type audio `
--instruction "Annotate if the audio contains a horse cough or another sound such as the horse smacking the microphone." `
--daily-limit 500
Swiss German transcription validation:
mllm-annotator `
--input-folder "C:\data\swiss_german_audio" `
--media-type audio `
--labels-csv "C:\data\transcriptions.csv" `
--instruction "Confirm whether the attached Swiss German audio matches the associated transcription. If it is wrong, rewrite the correct transcription." `
--daily-limit 500
Image captioning:
mllm-annotator `
--input-folder "C:\data\images" `
--media-type image `
--instruction "Caption the attached image." `
--daily-limit 500
Desktop GUI
mllm-annotator-ui
The GUI lets you browse for the data folder, choose audio or image mode,
optionally select a filename,label CSV, write the instruction, preview the
file table, and start or resume processing. It shows the rewritten prompt and
updates each row as Gemini responses arrive, using the same JSONL result/state
files as the CLI. A second tab embeds the media and shows an interactive 2-D
UMAP projection (zoom/pan toolbar, hover a point for its file name).
CSV format
The optional labels CSV must contain exactly one row per media file and these columns:
filename,label
audio_001.wav,expected transcription or label
audio_002.wav,another label
For --recursive, filename must be the relative path with forward slashes,
for example speaker_a/audio_001.wav.
Resume behavior
By default, results are appended to runs/results.jsonl and progress is saved
in runs/state.json. If the daily limit is reached or the API returns a
quota/rate limit, run the same command again later or the next day — already
processed files are skipped.
The first run rewrites your natural-language instruction with gemini-3.5-flash
and stores it in the state file. The media files are processed with
gemini-3.1-flash-lite. Use --no-rewrite to skip the rewrite call.
Before spending API calls, you can validate the folder and optional CSV:
mllm-annotator `
--input-folder "C:\data\images" `
--media-type image `
--instruction "Caption the attached image." `
--dry-run
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mllm_annotator-0.1.1.tar.gz.
File metadata
- Download URL: mllm_annotator-0.1.1.tar.gz
- Upload date:
- Size: 30.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
17dc6cbaf29812f1a627111345df125723b8e63f70a7378cf9b2b3eca0247125
|
|
| MD5 |
4baa5ffacd0bbc00237b1e51d4399898
|
|
| BLAKE2b-256 |
9f9d6f622e514fe7fda058abb040b09d571429d98e8455c83ef1c9bfb483d1b4
|
Provenance
The following attestation bundles were made for mllm_annotator-0.1.1.tar.gz:
Publisher:
publish.yml on BoiMat/mllm-annotator
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mllm_annotator-0.1.1.tar.gz -
Subject digest:
17dc6cbaf29812f1a627111345df125723b8e63f70a7378cf9b2b3eca0247125 - Sigstore transparency entry: 1789767390
- Sigstore integration time:
-
Permalink:
BoiMat/mllm-annotator@6c8d4b7898fa7f3cfc03d751a26175ce64925531 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/BoiMat
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@6c8d4b7898fa7f3cfc03d751a26175ce64925531 -
Trigger Event:
push
-
Statement type:
File details
Details for the file mllm_annotator-0.1.1-py3-none-any.whl.
File metadata
- Download URL: mllm_annotator-0.1.1-py3-none-any.whl
- Upload date:
- Size: 31.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
085088586c46b6779247c15b9a4407dffbdd88e5b5832711b1d4d04e6eac8f22
|
|
| MD5 |
29c28a0b058a99c3d303115f97ac547f
|
|
| BLAKE2b-256 |
a2d038161363a15304df0975ee09d560238059dedf2d8ab0fb15c2d3a523b9c1
|
Provenance
The following attestation bundles were made for mllm_annotator-0.1.1-py3-none-any.whl:
Publisher:
publish.yml on BoiMat/mllm-annotator
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mllm_annotator-0.1.1-py3-none-any.whl -
Subject digest:
085088586c46b6779247c15b9a4407dffbdd88e5b5832711b1d4d04e6eac8f22 - Sigstore transparency entry: 1789767423
- Sigstore integration time:
-
Permalink:
BoiMat/mllm-annotator@6c8d4b7898fa7f3cfc03d751a26175ce64925531 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/BoiMat
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@6c8d4b7898fa7f3cfc03d751a26175ce64925531 -
Trigger Event:
push
-
Statement type: