Multi-speaker audio/video transcription — Whisper large + pyannote.audio (offline)
Project description
wishcribe ✍️
Multi-speaker audio/video transcription — Whisper large + pyannote.audio, fully offline after first run.
[SPEAKER_00] 00:00:01
Selamat datang di rapat hari ini.
[SPEAKER_01] 00:00:05
Terima kasih. Mari kita mulai.
[SPEAKER_00] 00:00:10
Baik, topik pertama adalah anggaran kuartal ini.
Installation
pip install wishcribe
ffmpeg is also required (one-time system install):
brew install ffmpeg # macOS sudo apt install ffmpeg # Ubuntu/Debian
Quick start
Step 1 — download all models (run once)
wishcribe download --hf-token hf_xxx
This downloads and caches:
- Whisper
large(~2.9 GB) →~/.cache/whisper/large.pt - pyannote diarization (~1 GB) →
~/.cache/huggingface/hub/...
Output:
📦 WISHCRIBE — MODEL DOWNLOADER
══════════════════════════════════════════
Whisper model : large
Diarization : HuggingFace download (token provided)
══════════════════════════════════════════
📥 Downloading Whisper 'large' model (2.9 GB)...
✅ Whisper 'large' downloaded and cached (2.9 GB)
📥 Downloading pyannote diarization model (~1 GB)...
✅ Diarization model downloaded and cached
🎉 All models cached! wishcribe now works fully offline.
Run transcription with:
wishcribe --video meeting.mp4
Step 2 — transcribe (fully offline, forever)
wishcribe --video meeting.mp4
That's it. No token, no internet, no extra flags.
Usage — CLI
Download command
# Download default model (large)
wishcribe download --hf-token hf_xxx
# Download a specific model size
wishcribe download --hf-token hf_xxx --model medium
# Use a local pyannote model folder (no HuggingFace needed)
wishcribe download --model-path /path/to/pyannote-model
Run / transcribe command
# Basic (Whisper large by default)
wishcribe --video meeting.mp4
wishcribe run --video meeting.mp4 # same thing
# With language + speaker count
wishcribe --video meeting.mp4 --bahasa id --speakers 3
# Override Whisper model
wishcribe --video meeting.mp4 --model medium
wishcribe --video meeting.mp4 --model small
# Use OpenAI API for transcription (diarization still offline)
wishcribe --video meeting.mp4 --use-api --api-key sk-xxx
# Custom output folder + save JSON
wishcribe --video meeting.mp4 --output ./results --json
All run options
| Argument | Description | Default |
|---|---|---|
--video |
Path to video or audio file (required) | — |
--hf-token |
HuggingFace token — first-time only | — |
--model-path |
Path to local pyannote model folder | — |
--model |
tiny/base/small/medium/large |
large |
--bahasa |
Language code e.g. id, en |
auto-detect |
--speakers |
Number of speakers (optional) | auto |
--output |
Output folder | same as input |
--use-api |
Use OpenAI Whisper API | False |
--api-key |
OpenAI API key (with --use-api) |
— |
--json |
Also save .json |
False |
--no-txt |
Skip .txt output |
False |
--no-srt |
Skip .srt output |
False |
Usage — Python
from wishcribe import download, transcribe
# Step 1 — download models once
download(hf_token="hf_xxx")
# Step 2 — transcribe offline
segments = transcribe("meeting.mp4")
# With options
segments = transcribe(
"meeting.mp4",
model="large", # default — best accuracy
language="id",
num_speakers=3,
output_dir="./out",
)
for seg in segments:
print(f"[{seg.speaker}] {seg.start:.1f}s {seg.text}")
How offline mode works
| Cache location | What's stored |
|---|---|
~/.cache/whisper/large.pt |
Whisper large model weights (2.9 GB) |
~/.cache/huggingface/hub/models--pyannote--... |
Diarization model (~1 GB) |
Once cached, both load instantly from disk — no internet ever needed.
Whisper model guide
| Model | Size | Speed | Accuracy |
|---|---|---|---|
tiny |
75 MB | Very fast | Fair |
base |
139 MB | Fast | Good |
small |
461 MB | Moderate | Better |
medium |
1.4 GB | Slow | Very good |
large |
2.9 GB | Slowest | Best ⭐ (default) |
HuggingFace setup (for download command)
- Sign up at https://huggingface.co
- Accept the license: https://huggingface.co/pyannote/speaker-diarization-3.1
- Create a Read token: https://huggingface.co/settings/tokens
Only needed once for wishcribe download.
Output files
| File | Description |
|---|---|
<n>_transcript.txt |
Plain text grouped by speaker |
<n>_transcript.srt |
SRT subtitles with speaker labels |
<n>_transcript.json |
Raw JSON array (opt-in) |
Publishing
make build # build dist/
make publish # upload to PyPI → pip install wishcribe
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file wishcribe-1.0.1.tar.gz.
File metadata
- Download URL: wishcribe-1.0.1.tar.gz
- Upload date:
- Size: 17.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8e9fc15026ed6fe26263b454ff5f5bec7d8f36ec670fe55605eaaf490e9967f0
|
|
| MD5 |
a877a4282f0a3c1de76f3d5f2e203d5c
|
|
| BLAKE2b-256 |
5beab92f036efbaefbbc2ccbd85b30163093165a95f4e38fdfede7e113c05dbc
|
File details
Details for the file wishcribe-1.0.1-py3-none-any.whl.
File metadata
- Download URL: wishcribe-1.0.1-py3-none-any.whl
- Upload date:
- Size: 17.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3287b768774a98030f094b62fca0f6da3369f95905d72077754622c374fddcb8
|
|
| MD5 |
d409f4a9c3b3ae00c9079f4f45627324
|
|
| BLAKE2b-256 |
afe384202836497c9cc33e0ac68a0d3dd397d12b9a24fbd7c6f5cdaea4baf3e6
|