Skip to main content

Resumable multimodal-LLM annotator and embedder for folders of audio or image files.

Project description

mllm-annotator

A small, resumable tool for sending folders of audio or image files to a multimodal LLM for automatic annotation, plus an embedding + UMAP visualization workflow. Gemini is the current backend; the design keeps the provider behind a thin seam so others can be added later.

It ships both a command-line tool and a desktop GUI.

Install

# CLI only
pip install mllm-annotator

# with the desktop GUI and the embed/visualize feature
pip install "mllm-annotator[ui,viz]"

Or, for development from a clone:

uv sync --extra ui --extra viz

The embed/visualize feature also needs ffmpeg on your PATH to handle audio formats Gemini can't embed directly (e.g. .aac, .opus). It is an optional system dependency, not a pip package; without it those files are skipped.

API key

Provide a Gemini API key in any one of these ways (checked in this order):

  1. environment variable GEMINI_API_KEY (or GOOGLE_API_KEY):

    $env:GEMINI_API_KEY="your_api_key"
    
  2. a .env file in the current working directory:

    GEMINI_API_KEY=your_api_key
    
  3. saved from inside the GUI — click Set API Key in the top-left corner (it also opens automatically on first launch if no key is found), paste the key, and it is stored securely in your OS keyring (Windows Credential Manager / macOS Keychain / Linux Secret Service). No plaintext file is written.

.env is ignored by git, and keys are never written into the built package.

Command line

mllm-annotator --help

Examples

Horse cough annotation:

mllm-annotator `
  --input-folder "C:\data\horse_audio" `
  --media-type audio `
  --instruction "Annotate if the audio contains a horse cough or another sound such as the horse smacking the microphone." `
  --daily-limit 500

Swiss German transcription validation:

mllm-annotator `
  --input-folder "C:\data\swiss_german_audio" `
  --media-type audio `
  --labels-csv "C:\data\transcriptions.csv" `
  --instruction "Confirm whether the attached Swiss German audio matches the associated transcription. If it is wrong, rewrite the correct transcription." `
  --daily-limit 500

Image captioning:

mllm-annotator `
  --input-folder "C:\data\images" `
  --media-type image `
  --instruction "Caption the attached image." `
  --daily-limit 500

Desktop GUI

mllm-annotator-ui

The GUI lets you browse for the data folder, choose audio or image mode, optionally select a filename,label CSV, write the instruction, preview the file table, and start or resume processing. It shows the rewritten prompt and updates each row as Gemini responses arrive, using the same JSONL result/state files as the CLI. A second tab embeds the media and shows an interactive 2-D UMAP projection (zoom/pan toolbar, hover a point for its file name).

CSV format

The optional labels CSV must contain exactly one row per media file and these columns:

filename,label
audio_001.wav,expected transcription or label
audio_002.wav,another label

For --recursive, filename must be the relative path with forward slashes, for example speaker_a/audio_001.wav.

Resume behavior

By default, results are appended to runs/results.jsonl and progress is saved in runs/state.json. If the daily limit is reached or the API returns a quota/rate limit, run the same command again later or the next day — already processed files are skipped.

The first run rewrites your natural-language instruction with gemini-3.5-flash and stores it in the state file. The media files are processed with gemini-3.1-flash-lite. Use --no-rewrite to skip the rewrite call.

Before spending API calls, you can validate the folder and optional CSV:

mllm-annotator `
  --input-folder "C:\data\images" `
  --media-type image `
  --instruction "Caption the attached image." `
  --dry-run

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mllm_annotator-0.1.1.tar.gz (30.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mllm_annotator-0.1.1-py3-none-any.whl (31.6 kB view details)

Uploaded Python 3

File details

Details for the file mllm_annotator-0.1.1.tar.gz.

File metadata

  • Download URL: mllm_annotator-0.1.1.tar.gz
  • Upload date:
  • Size: 30.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mllm_annotator-0.1.1.tar.gz
Algorithm Hash digest
SHA256 17dc6cbaf29812f1a627111345df125723b8e63f70a7378cf9b2b3eca0247125
MD5 4baa5ffacd0bbc00237b1e51d4399898
BLAKE2b-256 9f9d6f622e514fe7fda058abb040b09d571429d98e8455c83ef1c9bfb483d1b4

See more details on using hashes here.

Provenance

The following attestation bundles were made for mllm_annotator-0.1.1.tar.gz:

Publisher: publish.yml on BoiMat/mllm-annotator

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mllm_annotator-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: mllm_annotator-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 31.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mllm_annotator-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 085088586c46b6779247c15b9a4407dffbdd88e5b5832711b1d4d04e6eac8f22
MD5 29c28a0b058a99c3d303115f97ac547f
BLAKE2b-256 a2d038161363a15304df0975ee09d560238059dedf2d8ab0fb15c2d3a523b9c1

See more details on using hashes here.

Provenance

The following attestation bundles were made for mllm_annotator-0.1.1-py3-none-any.whl:

Publisher: publish.yml on BoiMat/mllm-annotator

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page