Skip to main content

Audio transcription with speaker diarization using Gemini

Project description

Kakitori

Audio transcription with speaker diarization using Gemini Flash.

Overview

Kakitori is a CLI tool that transcribes audio files with speaker diarization using Google's Gemini 2.5 Flash model. It provides an interactive workflow to identify speakers by playing audio snippets and allows you to assign names to each speaker.

Features

  • Audio Transcription: Upload and transcribe audio files using Gemini 2.5 Flash
  • Speaker Diarization: Automatically identify different speakers in the audio
  • Interactive Speaker Identification: Listen to audio snippets and assign names to speakers
  • Multiple Audio Formats: Supports MP3, WAV, M4A, OGG, FLAC
  • Plain Text Output: Generate clean, readable transcripts with timestamps

Requirements

  • Python 3.11+
  • mpv media player installed on your system
    • macOS: brew install mpv
    • Linux (Arch): pacman -S mpv
    • Linux (Debian/Ubuntu): apt install mpv
  • Gemini API key (Get one here)

Installation

As a uv tool (recommended)

uv tool install kakitori

From source

git clone <repository-url>
cd kakitori
uv sync

Configuration

Create a .env file in your working directory with your Gemini API key:

GEMINI_API_KEY=your-api-key-here

Or set it as an environment variable:

export GEMINI_API_KEY=your-api-key-here

Usage

Basic usage

Transcribe an audio file and print to stdout:

kakitori recording.mp3

Save to file

Save the transcript to a file:

kakitori recording.mp3 -o transcript.txt

Skip speaker identification

Skip the interactive speaker identification step and keep generic labels:

kakitori recording.mp3 --skip-speaker-id

Interactive Speaker Identification

When you run kakitori without --skip-speaker-id, the tool will:

  1. Upload and transcribe your audio file
  2. Identify unique speakers (Speaker 1, Speaker 2, etc.)
  3. For each speaker:
    • Play 3-5 audio snippets where that speaker talks
    • Show a preview of what they said
    • Prompt you to assign a name

Example interaction:

Speaker 1:
--------------------------------------------------

Playing 3 snippet(s) for Speaker 1...

Snippet 1/3: [00:15]
  "Hello everyone, welcome to today's meeting..."

[Audio plays]

Snippet 2/3: [02:30]
  "I think we should focus on the quarterly goals..."

[Audio plays]

Who is Speaker 1? (press Enter to replay, or type name): John
✓ Speaker 1 identified as: John

Output Format

The transcript is formatted as plain text with timestamps and speaker names:

[00:15] John: Hello everyone, welcome to today's meeting.
[00:32] Jane: Thanks for having me. I have some updates to share.
[01:05] John: Great, let's hear them.
[01:10] Jane: First, the project timeline has been extended...

Development

Running locally

# Install dependencies
uv sync

# Run the tool
uv run kakitori recording.mp3

Building

uv build

License

MIT

Credits

Built with:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kakitori-1.0.1.tar.gz (14.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kakitori-1.0.1-py3-none-any.whl (19.9 kB view details)

Uploaded Python 3

File details

Details for the file kakitori-1.0.1.tar.gz.

File metadata

  • Download URL: kakitori-1.0.1.tar.gz
  • Upload date:
  • Size: 14.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for kakitori-1.0.1.tar.gz
Algorithm Hash digest
SHA256 0510523d4bcb36ca43de360c44dcd92e8d3999bf12e8ba17e0321719871fb2a9
MD5 a601871efe4c55201212df984a0e33a5
BLAKE2b-256 ce966c18235121e1759cf0eb99bdea16c24c78220bc3749505981db2870747d9

See more details on using hashes here.

File details

Details for the file kakitori-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: kakitori-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 19.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for kakitori-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 eb67cf2f101c9c08c7e22669ecf03540c4951ffecdfca2d8dc9e279ba469acd5
MD5 87f4ec5bc1b863f1e51b1d67428b81e6
BLAKE2b-256 e8a659ec3b3e1cd1602db7ab8617ee26396afc91efad7ca07499abbed43bb9ec

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page