Skip to main content

Resumable multimodal-LLM annotator and embedder for folders of audio or image files.

Project description

mllm-annotator

A small, resumable tool for sending folders of audio or image files to a multimodal LLM for automatic annotation, plus an embedding + UMAP visualization workflow. Gemini is the current backend; the design keeps the provider behind a thin seam so others can be added later.

It ships both a command-line tool and a desktop GUI.

Install

# CLI only
pip install mllm-annotator

# with the desktop GUI and the embed/visualize feature
pip install "mllm-annotator[ui,viz]"

Or, for development from a clone:

uv sync --extra ui --extra viz

The embed/visualize feature also needs ffmpeg on your PATH to handle audio formats Gemini can't embed directly (e.g. .aac, .opus). It is an optional system dependency, not a pip package; without it those files are skipped.

API key

Provide a Gemini API key in any one of these ways (checked in this order):

  1. environment variable GEMINI_API_KEY (or GOOGLE_API_KEY):

    $env:GEMINI_API_KEY="your_api_key"
    
  2. a .env file in the current working directory:

    GEMINI_API_KEY=your_api_key
    
  3. saved from inside the GUI — click API Key, paste it, and it is stored securely in your OS keyring (Windows Credential Manager / macOS Keychain / Linux Secret Service). No plaintext file is written.

.env is ignored by git, and keys are never written into the built package.

Command line

mllm-annotator --help

Examples

Horse cough annotation:

mllm-annotator `
  --input-folder "C:\data\horse_audio" `
  --media-type audio `
  --instruction "Annotate if the audio contains a horse cough or another sound such as the horse smacking the microphone." `
  --daily-limit 500

Swiss German transcription validation:

mllm-annotator `
  --input-folder "C:\data\swiss_german_audio" `
  --media-type audio `
  --labels-csv "C:\data\transcriptions.csv" `
  --instruction "Confirm whether the attached Swiss German audio matches the associated transcription. If it is wrong, rewrite the correct transcription." `
  --daily-limit 500

Image captioning:

mllm-annotator `
  --input-folder "C:\data\images" `
  --media-type image `
  --instruction "Caption the attached image." `
  --daily-limit 500

Desktop GUI

mllm-annotator-ui

The GUI lets you browse for the data folder, choose audio or image mode, optionally select a filename,label CSV, write the instruction, preview the file table, and start or resume processing. It shows the rewritten prompt and updates each row as Gemini responses arrive, using the same JSONL result/state files as the CLI. A second tab embeds the media and shows an interactive 2-D UMAP projection (zoom/pan toolbar, hover a point for its file name).

CSV format

The optional labels CSV must contain exactly one row per media file and these columns:

filename,label
audio_001.wav,expected transcription or label
audio_002.wav,another label

For --recursive, filename must be the relative path with forward slashes, for example speaker_a/audio_001.wav.

Resume behavior

By default, results are appended to runs/results.jsonl and progress is saved in runs/state.json. If the daily limit is reached or the API returns a quota/rate limit, run the same command again later or the next day — already processed files are skipped.

The first run rewrites your natural-language instruction with gemini-3.5-flash and stores it in the state file. The media files are processed with gemini-3.1-flash-lite. Use --no-rewrite to skip the rewrite call.

Before spending API calls, you can validate the folder and optional CSV:

mllm-annotator `
  --input-folder "C:\data\images" `
  --media-type image `
  --instruction "Caption the attached image." `
  --dry-run

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mllm_annotator-0.1.0.tar.gz (31.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mllm_annotator-0.1.0-py3-none-any.whl (31.8 kB view details)

Uploaded Python 3

File details

Details for the file mllm_annotator-0.1.0.tar.gz.

File metadata

  • Download URL: mllm_annotator-0.1.0.tar.gz
  • Upload date:
  • Size: 31.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for mllm_annotator-0.1.0.tar.gz
Algorithm Hash digest
SHA256 d32902a3cee57c913f83c70707710f1efb4965b1b98678444bd6f0563d6b0dbe
MD5 5a1b44ac6b40a085a638be4b9fe8e106
BLAKE2b-256 5780ca3971ea1a60e0b74ef729b1ab6311bb1ed12c7ccd12e22ff8188cdea6de

See more details on using hashes here.

File details

Details for the file mllm_annotator-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: mllm_annotator-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 31.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for mllm_annotator-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 97f21637a186d1e2f5d1a2b75134ae5f677882b95e40c82d9bc14391676f9f0b
MD5 38af2b30097eabeb10f3080c0372cdb7
BLAKE2b-256 b9014d61dbc112d01a58a357bccfdbaf19a4eeab7c7cd4b9c1a657363a546e4e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page