Resumable multimodal-LLM annotator and embedder for folders of audio or image files.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

mllm-annotator

A small, resumable tool for sending folders of audio or image files to a multimodal LLM for automatic annotation, plus an embedding + UMAP visualization workflow. Gemini is the current backend; the design keeps the provider behind a thin seam so others can be added later.

It ships both a command-line tool and a desktop GUI.

Install

# CLI only
pip install mllm-annotator

# with the desktop GUI and the embed/visualize feature
pip install "mllm-annotator[ui,viz]"

Or, for development from a clone:

uv sync --extra ui --extra viz

The embed/visualize feature also needs ffmpeg on your PATH to handle audio formats Gemini can't embed directly (e.g. .aac, .opus). It is an optional system dependency, not a pip package; without it those files are skipped.

API key

Provide a Gemini API key in any one of these ways (checked in this order):

environment variable GEMINI_API_KEY (or GOOGLE_API_KEY):
```
$env:GEMINI_API_KEY="your_api_key"
```
a .env file in the current working directory:
```
GEMINI_API_KEY=your_api_key
```
saved from inside the GUI — click Set API Key in the top-left corner (it also opens automatically on first launch if no key is found), paste the key, and it is stored securely in your OS keyring (Windows Credential Manager / macOS Keychain / Linux Secret Service). No plaintext file is written.

.env is ignored by git, and keys are never written into the built package.

Command line

mllm-annotator --help

Examples

Horse cough annotation:

mllm-annotator `
  --input-folder "C:\data\horse_audio" `
  --media-type audio `
  --instruction "Annotate if the audio contains a horse cough or another sound such as the horse smacking the microphone." `
  --daily-limit 500

Swiss German transcription validation:

mllm-annotator `
  --input-folder "C:\data\swiss_german_audio" `
  --media-type audio `
  --labels-csv "C:\data\transcriptions.csv" `
  --instruction "Confirm whether the attached Swiss German audio matches the associated transcription. If it is wrong, rewrite the correct transcription." `
  --daily-limit 500

Image captioning:

mllm-annotator `
  --input-folder "C:\data\images" `
  --media-type image `
  --instruction "Caption the attached image." `
  --daily-limit 500

Desktop GUI

mllm-annotator-ui

The GUI lets you browse for the data folder, choose audio or image mode, optionally select a filename,label CSV, write the instruction, preview the file table, and start or resume processing. It shows the rewritten prompt and updates each row as Gemini responses arrive, using the same JSONL result/state files as the CLI. A second tab embeds the media and shows an interactive 2-D UMAP projection (zoom/pan toolbar, hover a point for its file name).

CSV format

The optional labels CSV must contain exactly one row per media file and these columns:

filename,label
audio_001.wav,expected transcription or label
audio_002.wav,another label

For --recursive, filename must be the relative path with forward slashes, for example speaker_a/audio_001.wav.

Resume behavior

By default, results are appended to runs/results.jsonl and progress is saved in runs/state.json. If the daily limit is reached or the API returns a quota/rate limit, run the same command again later or the next day — already processed files are skipped.

The first run rewrites your natural-language instruction with gemini-3.5-flash and stores it in the state file. The media files are processed with gemini-3.1-flash-lite. Use --no-rewrite to skip the rewrite call.

Before spending API calls, you can validate the folder and optional CSV:

mllm-annotator `
  --input-folder "C:\data\images" `
  --media-type image `
  --instruction "Caption the attached image." `
  --dry-run

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

BoiMat

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.1

Jun 11, 2026

0.1.0

Jun 11, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mllm_annotator-0.1.1.tar.gz (30.8 kB view details)

Uploaded Jun 11, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mllm_annotator-0.1.1-py3-none-any.whl (31.6 kB view details)

Uploaded Jun 11, 2026 Python 3

File details

Details for the file mllm_annotator-0.1.1.tar.gz.

File metadata

Download URL: mllm_annotator-0.1.1.tar.gz
Upload date: Jun 11, 2026
Size: 30.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mllm_annotator-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`17dc6cbaf29812f1a627111345df125723b8e63f70a7378cf9b2b3eca0247125`
MD5	`4baa5ffacd0bbc00237b1e51d4399898`
BLAKE2b-256	`9f9d6f622e514fe7fda058abb040b09d571429d98e8455c83ef1c9bfb483d1b4`

See more details on using hashes here.

Provenance

The following attestation bundles were made for mllm_annotator-0.1.1.tar.gz:

Publisher: publish.yml on BoiMat/mllm-annotator

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: mllm_annotator-0.1.1.tar.gz
- Subject digest: 17dc6cbaf29812f1a627111345df125723b8e63f70a7378cf9b2b3eca0247125
- Sigstore transparency entry: 1789767390
- Sigstore integration time: Jun 11, 2026
Source repository:
- Permalink: BoiMat/mllm-annotator@6c8d4b7898fa7f3cfc03d751a26175ce64925531
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/BoiMat
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@6c8d4b7898fa7f3cfc03d751a26175ce64925531
- Trigger Event: push

File details

Details for the file mllm_annotator-0.1.1-py3-none-any.whl.

File metadata

Download URL: mllm_annotator-0.1.1-py3-none-any.whl
Upload date: Jun 11, 2026
Size: 31.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mllm_annotator-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`085088586c46b6779247c15b9a4407dffbdd88e5b5832711b1d4d04e6eac8f22`
MD5	`29c28a0b058a99c3d303115f97ac547f`
BLAKE2b-256	`a2d038161363a15304df0975ee09d560238059dedf2d8ab0fb15c2d3a523b9c1`

See more details on using hashes here.

Provenance

The following attestation bundles were made for mllm_annotator-0.1.1-py3-none-any.whl:

Publisher: publish.yml on BoiMat/mllm-annotator

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: mllm_annotator-0.1.1-py3-none-any.whl
- Subject digest: 085088586c46b6779247c15b9a4407dffbdd88e5b5832711b1d4d04e6eac8f22
- Sigstore transparency entry: 1789767423
- Sigstore integration time: Jun 11, 2026
Source repository:
- Permalink: BoiMat/mllm-annotator@6c8d4b7898fa7f3cfc03d751a26175ce64925531
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/BoiMat
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@6c8d4b7898fa7f3cfc03d751a26175ce64925531
- Trigger Event: push

mllm-annotator 0.1.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

mllm-annotator

Install

API key

Command line

Examples

Desktop GUI

CSV format

Resume behavior

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance