Audio transcription with speaker diarization using Gemini
Project description
Kakitori
Audio transcription with speaker diarization using Gemini Flash.
Overview
Kakitori is a CLI tool that transcribes audio files with speaker diarization using Google's Gemini 2.5 Flash model. It provides an interactive workflow to identify speakers by playing audio snippets and allows you to assign names to each speaker.
Features
- Audio Transcription: Upload and transcribe audio files using Gemini 2.5 Flash
- Speaker Diarization: Automatically identify different speakers in the audio
- Interactive Speaker Identification: Listen to audio snippets and assign names to speakers
- Multiple Audio Formats: Supports MP3, WAV, M4A, OGG, FLAC
- Plain Text Output: Generate clean, readable transcripts with timestamps
Requirements
- Python 3.11+
mpvmedia player installed on your system- macOS:
brew install mpv - Linux (Arch):
pacman -S mpv - Linux (Debian/Ubuntu):
apt install mpv
- macOS:
- Gemini API key (Get one here)
Installation
As a uv tool (recommended)
uv tool install kakitori
From source
git clone <repository-url>
cd kakitori
uv sync
Configuration
Create a .env file in your working directory with your Gemini API key:
GEMINI_API_KEY=your-api-key-here
Or set it as an environment variable:
export GEMINI_API_KEY=your-api-key-here
Usage
Basic usage
Transcribe an audio file and print to stdout:
kakitori recording.mp3
Save to file
Save the transcript to a file:
kakitori recording.mp3 -o transcript.txt
Skip speaker identification
Skip the interactive speaker identification step and keep generic labels:
kakitori recording.mp3 --skip-speaker-id
Interactive Speaker Identification
When you run kakitori without --skip-speaker-id, the tool will:
- Upload and transcribe your audio file
- Identify unique speakers (Speaker 1, Speaker 2, etc.)
- For each speaker:
- Play 3-5 audio snippets where that speaker talks
- Show a preview of what they said
- Prompt you to assign a name
Example interaction:
Speaker 1:
--------------------------------------------------
Playing 3 snippet(s) for Speaker 1...
Snippet 1/3: [00:15]
"Hello everyone, welcome to today's meeting..."
[Audio plays]
Snippet 2/3: [02:30]
"I think we should focus on the quarterly goals..."
[Audio plays]
Who is Speaker 1? (press Enter to replay, or type name): John
✓ Speaker 1 identified as: John
Output Format
The transcript is formatted as plain text with timestamps and speaker names:
[00:15] John: Hello everyone, welcome to today's meeting.
[00:32] Jane: Thanks for having me. I have some updates to share.
[01:05] John: Great, let's hear them.
[01:10] Jane: First, the project timeline has been extended...
Development
Running locally
# Install dependencies
uv sync
# Run the tool
uv run kakitori recording.mp3
Building
uv build
License
MIT
Credits
Built with:
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file kakitori-1.0.1.tar.gz.
File metadata
- Download URL: kakitori-1.0.1.tar.gz
- Upload date:
- Size: 14.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0510523d4bcb36ca43de360c44dcd92e8d3999bf12e8ba17e0321719871fb2a9
|
|
| MD5 |
a601871efe4c55201212df984a0e33a5
|
|
| BLAKE2b-256 |
ce966c18235121e1759cf0eb99bdea16c24c78220bc3749505981db2870747d9
|
File details
Details for the file kakitori-1.0.1-py3-none-any.whl.
File metadata
- Download URL: kakitori-1.0.1-py3-none-any.whl
- Upload date:
- Size: 19.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eb67cf2f101c9c08c7e22669ecf03540c4951ffecdfca2d8dc9e279ba469acd5
|
|
| MD5 |
87f4ec5bc1b863f1e51b1d67428b81e6
|
|
| BLAKE2b-256 |
e8a659ec3b3e1cd1602db7ab8617ee26396afc91efad7ca07499abbed43bb9ec
|