Multi-speaker audio/video transcription — Whisper large + pyannote.audio (offline)
Project description
Wishcribe
Multi-speaker audio/video transcription — Whisper large + pyannote.audio, fully offline after first run.
[SPEAKER_00] 00:00:01
Selamat datang di rapat hari ini.
[SPEAKER_01] 00:00:05
Terima kasih. Mari kita mulai.
[SPEAKER_00] 00:00:10
Baik, topik pertama adalah anggaran kuartal ini.
Requirements
- Python 3.9 or higher
- ffmpeg
- 4 GB free disk space (for model weights)
- Internet connection (first run only)
Installing Python
Windows
-
Click "Download Python 3.x.x" (latest version)
-
Run the installer
-
⚠️ Important: On the first screen, check "Add Python to PATH" before clicking Install
-
Click "Install Now"
-
Once done, open Command Prompt and verify:
python --version pip --versionBoth should print a version number.
Tip for Windows: Use Command Prompt or PowerShell to run wishcribe commands.
To open Command Prompt: pressWin + R, typecmd, press Enter.
macOS
# Check if Python is already installed
python3 --version
# If not installed, use Homebrew
brew install python
If you don't have Homebrew: https://brew.sh
Ubuntu / Debian Linux
sudo apt update
sudo apt install python3 python3-pip
Installing ffmpeg
ffmpeg is required to extract audio from video files.
Windows
- Go to https://ffmpeg.org/download.html
- Click "Windows" → "Windows builds by BtbN"
- Download
ffmpeg-master-latest-win64-gpl.zip - Extract the zip file to
C:\ffmpeg - Add ffmpeg to PATH:
- Press
Win + S→ search "Environment Variables" - Click "Edit the system environment variables"
- Click "Environment Variables"
- Under "System variables", find Path → click Edit
- Click New → type
C:\ffmpeg\bin - Click OK on all windows
- Press
- Open a new Command Prompt and verify:
ffmpeg -version
macOS
brew install ffmpeg
Ubuntu / Debian
sudo apt install ffmpeg
Installation
Once Python and ffmpeg are installed:
pip install wishcribe
Windows users: If
pipis not found, trypip3orpython -m pip install wishcribe
HuggingFace setup (required once)
Wishcribe uses pyannote.audio for speaker detection. You need to accept two model licenses on HuggingFace before downloading.
- Sign up at https://huggingface.co/join
- Accept license (diarization model): https://huggingface.co/pyannote/speaker-diarization-3.1
- Accept license (segmentation model): https://huggingface.co/pyannote/segmentation-3.0
- Create a Read token: https://huggingface.co/settings/tokens
⚠️ Both licenses must be accepted. The diarization model depends on the segmentation model internally — skipping either one will cause the download to fail.
Quick start
Step 1 — Download all models once
wishcribe download --hf-token hf_xxx
This downloads and caches:
- Whisper
large(~2.9 GB) → saved locally - pyannote diarization (~1 GB) → saved locally
Step 2 — Transcribe
wishcribe --video meeting.mp4 --bahasa id --speakers 2 --hf-token hf_xxx
⚠️ Your token is required on every run — pyannote verifies access to the segmentation model each time, even when loading from local cache.
Avoid typing --hf-token every time
Set your token as an environment variable once and wishcribe will read it automatically:
macOS / Linux
# Add this to your ~/.zshrc or ~/.bash_profile
export WISHCRIBE_HF_TOKEN="hf_xxx"
# Reload
source ~/.zshrc
Windows
# In Command Prompt (current session only)
set WISHCRIBE_HF_TOKEN=hf_xxx
# Or permanently via System Environment Variables:
# Win + S → "Environment Variables" → New → Name: WISHCRIBE_HF_TOKEN, Value: hf_xxx
After setting it, run without --hf-token:
wishcribe --video meeting.mp4 --bahasa id --speakers 2
🔒 Your token is safe — environment variables live on your machine only and are never committed to Git or uploaded to GitHub.
Usage — CLI
Download command
# Download default Whisper large model
wishcribe download --hf-token hf_xxx
# Download a smaller/faster model instead
wishcribe download --hf-token hf_xxx --model medium
# Use a manually downloaded pyannote model folder
wishcribe download --model-path /path/to/pyannote-model
Transcribe command
# Basic — Whisper large by default (token from WISHCRIBE_HF_TOKEN env var)
wishcribe --video meeting.mp4
# With explicit token + language + speaker count
wishcribe --video meeting.mp4 --bahasa id --speakers 2 --hf-token hf_xxx
# Override Whisper model
wishcribe --video meeting.mp4 --model medium
# Use OpenAI API for transcription
wishcribe --video meeting.mp4 --use-api --api-key sk-xxx
# Save to a custom folder + include JSON
wishcribe --video meeting.mp4 --output ./results --json
All options
| Argument | Description | Default |
|---|---|---|
--video |
Path to video or audio file (required) | — |
--hf-token |
HuggingFace token (or set WISHCRIBE_HF_TOKEN env var) |
— |
--model-path |
Path to local pyannote model folder | — |
--model |
tiny/base/small/medium/large |
large |
--bahasa |
Language code e.g. id, en |
auto-detect |
--speakers |
Number of speakers (optional) | auto |
--output |
Output folder | same as input |
--use-api |
Use OpenAI Whisper API | False |
--api-key |
OpenAI API key (with --use-api) |
— |
--json |
Also save .json |
False |
--no-txt |
Skip .txt output |
False |
--no-srt |
Skip .srt output |
False |
Usage — Python
from wishcribe import download, transcribe
# Step 1 — download models once
download(hf_token="hf_xxx")
# Step 2 — transcribe
segments = transcribe("meeting.mp4", hf_token="hf_xxx")
# With options
segments = transcribe(
"meeting.mp4",
hf_token="hf_xxx", # or set WISHCRIBE_HF_TOKEN env var
model="large", # default — best accuracy
language="id",
num_speakers=2,
output_dir="./out",
)
for seg in segments:
print(f"[{seg.speaker}] {seg.start:.1f}s {seg.text}")
Using a virtual environment (recommended)
To avoid conflicts with other Python packages on your system:
Windows
python -m venv wishcribe-env
wishcribe-env\Scripts\activate
pip install wishcribe
macOS / Linux
python3 -m venv wishcribe-env
source wishcribe-env/bin/activate
pip install wishcribe
Every time you open a new terminal, activate the environment first:
# Windows
wishcribe-env\Scripts\activate
# macOS / Linux
source wishcribe-env/bin/activate
Whisper model guide
| Model | Size | Speed | Accuracy |
|---|---|---|---|
tiny |
75 MB | Very fast | Fair |
base |
139 MB | Fast | Good |
small |
461 MB | Moderate | Better |
medium |
1.4 GB | Slow | Very good |
large |
2.9 GB | Slowest | Best ⭐ (default) |
Output files
| File | Description |
|---|---|
<n>_transcript.txt |
Plain text grouped by speaker |
<n>_transcript.srt |
SRT subtitles with speaker labels |
<n>_transcript.json |
Raw JSON array (opt-in) |
Supported formats
Video: mp4, mkv, avi, mov, webm, and more
Audio: mp3, wav, m4a, flac, ogg, aac, opus, and more
Languages: 90+ (Whisper auto-detects if --bahasa not set)
Troubleshooting
401 Client Error / Access to model pyannote/segmentation-3.0 is restricted
Your token must be passed on every run, or set as the WISHCRIBE_HF_TOKEN environment variable:
wishcribe --video meeting.mp4 --bahasa id --speakers 2 --hf-token hf_xxx
# or set once: export WISHCRIBE_HF_TOKEN=hf_xxx
Also make sure both licenses are accepted:
- https://huggingface.co/pyannote/speaker-diarization-3.1
- https://huggingface.co/pyannote/segmentation-3.0
wishcribe: command not found
pip install wishcribe --upgrade
# or on Windows:
python -m wishcribe --video meeting.mp4
ffmpeg not found
Follow the ffmpeg installation steps above for your OS.
Dependency conflicts (e.g. with tensorflow)
Use a virtual environment (see section above) to isolate wishcribe cleanly.
Out of memory with large model
Switch to a smaller model:
wishcribe --video meeting.mp4 --model medium
License
MIT — free to use, modify, and distribute.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file wishcribe-1.0.8.tar.gz.
File metadata
- Download URL: wishcribe-1.0.8.tar.gz
- Upload date:
- Size: 21.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5c640d4ba0983812eed71790698e7945efafaee489086f213650316d93ad20fd
|
|
| MD5 |
9939f7c7c728c92cc87d055a5b21022e
|
|
| BLAKE2b-256 |
431506241bd83d7c156af7cc8f1126f1d78a4a2210e546a4cea9c2316f7b62fc
|
File details
Details for the file wishcribe-1.0.8-py3-none-any.whl.
File metadata
- Download URL: wishcribe-1.0.8-py3-none-any.whl
- Upload date:
- Size: 19.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c41a6d3b61b0de6156ce77c4c6f1b981a5c58d6a6dcd3d57bc05c16fe691804b
|
|
| MD5 |
683ce415dbd0c8e8481861bae71034df
|
|
| BLAKE2b-256 |
ae7a78ba5a5233f65a82ec3cdc5e453e9de78b99ccbb40acb13aae40cb4f0760
|