Multi-speaker audio/video transcription — Whisper large + pyannote.audio (offline)

These details have not been verified by PyPI

Project links

Project description

Wishcribe

Multi-speaker audio/video transcription — Whisper large + pyannote.audio, fully offline after first run.

[SPEAKER_00] 00:00:01
  Selamat datang di rapat hari ini.

[SPEAKER_01] 00:00:05
  Terima kasih. Mari kita mulai.

[SPEAKER_00] 00:00:10
  Baik, topik pertama adalah anggaran kuartal ini.

Or without speaker labels (no HuggingFace token needed):

00:00:01
  Selamat datang di rapat hari ini.

00:00:05
  Terima kasih. Mari kita mulai.

Requirements

Python 3.9 or higher
ffmpeg
4 GB free disk space (for model weights)
Internet connection (first run only)

Installing Python

Windows

Go to https://www.python.org/downloads/windows/
Click "Download Python 3.x.x" (latest version)
Run the installer
⚠️ Important: On the first screen, check "Add Python to PATH" before clicking Install
Click "Install Now"
Once done, open Command Prompt and verify:
```
python --version
pip --version
```
Both should print a version number.

Tip for Windows: Use Command Prompt or PowerShell to run wishcribe commands.
To open Command Prompt: press Win + R, type cmd, press Enter.

macOS

# Check if Python is already installed
python3 --version

# If not installed, use Homebrew
brew install python

If you don't have Homebrew: https://brew.sh

Ubuntu / Debian Linux

sudo apt update
sudo apt install python3 python3-pip

Installing ffmpeg

ffmpeg is required to extract audio from video files.

Windows

Go to https://ffmpeg.org/download.html
Click "Windows" → "Windows builds by BtbN"
Download ffmpeg-master-latest-win64-gpl.zip
Extract the zip file to C:\ffmpeg
Add ffmpeg to PATH:
- Press Win + S → search "Environment Variables"
- Click "Edit the system environment variables"
- Click "Environment Variables"
- Under "System variables", find Path → click Edit
- Click New → type C:\ffmpeg\bin
- Click OK on all windows
Open a new Command Prompt and verify:
```
ffmpeg -version
```

macOS

brew install ffmpeg

Ubuntu / Debian

sudo apt install ffmpeg

Installation

Once Python and ffmpeg are installed:

pip install wishcribe

Windows users: If pip is not found, try pip3 or python -m pip install wishcribe

Two modes of transcription

Wishcribe supports two modes:

Mode	Command	HuggingFace token?	Output
With speaker labels	(default)	✅ Required	`[SPEAKER_00]`, `[SPEAKER_01]` …
Transcription only	`--no-diarize`	❌ Not needed	Timestamps only, no speaker labels

Use --no-diarize if you just want a fast transcript without identifying who speaks when.

HuggingFace setup (required for speaker labels)

Skip this section if you only want to use --no-diarize mode.

Wishcribe uses pyannote.audio for speaker detection. You need to accept the model license on HuggingFace before downloading.

Sign up at https://huggingface.co/join
Accept license: https://huggingface.co/pyannote/speaker-diarization-community-1
Create a Read token: https://huggingface.co/settings/tokens

⚠️ The license must be accepted before running wishcribe download. Without it, the download will fail with a 401 error.

Quick start

With speaker labels (full mode)

# Step 1 — Download all models once
wishcribe download --hf-token hf_xxx

# Step 2 — Transcribe
wishcribe --video meeting.mp4 --bahasa id --speakers 2

Without speaker labels (no token needed)

# No download step needed — just transcribe
wishcribe --video meeting.mp4 --bahasa id --no-diarize

Avoid typing --hf-token every time

Set your token as an environment variable once and wishcribe will read it automatically:

macOS / Linux

# Add this to your ~/.zshrc or ~/.bash_profile
export WISHCRIBE_HF_TOKEN="hf_xxx"

# Reload
source ~/.zshrc

Windows

# In Command Prompt (current session only)
set WISHCRIBE_HF_TOKEN=hf_xxx

# Or permanently via System Environment Variables:
# Win + S → "Environment Variables" → New → Name: WISHCRIBE_HF_TOKEN, Value: hf_xxx

After setting it, run without --hf-token:

wishcribe --video meeting.mp4 --bahasa id --speakers 2

🔒 Your token is safe — environment variables live on your machine only and are never committed to Git or uploaded to GitHub.

Usage — CLI

Download command

# Download default Whisper large model + pyannote diarization model
wishcribe download --hf-token hf_xxx

# Download a smaller/faster Whisper model instead
wishcribe download --hf-token hf_xxx --model medium

# Use a manually downloaded pyannote model folder
wishcribe download --model-path /path/to/pyannote-model

Transcribe command

# With speaker labels (default)
wishcribe --video meeting.mp4 --bahasa id --speakers 2

# Without speaker labels — no HuggingFace token needed, faster
wishcribe --video meeting.mp4 --bahasa id --no-diarize

# Override Whisper model
wishcribe --video meeting.mp4 --model medium

# Use OpenAI API for transcription
wishcribe --video meeting.mp4 --use-api --api-key sk-xxx

# Save to a custom folder + include JSON
wishcribe --video meeting.mp4 --output ./results --json

All options

Argument	Description	Default
`--video`	Path to video or audio file (required)	—
`--hf-token`	HuggingFace token (or set `WISHCRIBE_HF_TOKEN` env var)	—
`--no-diarize`	Skip speaker diarization — no token needed	`False`
`--model-path`	Path to local pyannote model folder	—
`--model`	`tiny`/`base`/`small`/`medium`/`large`	`large`
`--bahasa`	Language code e.g. `id`, `en`	auto-detect
`--speakers`	Number of speakers (optional, ignored with `--no-diarize`)	auto
`--output`	Output folder	same as input
`--use-api`	Use OpenAI Whisper API	`False`
`--api-key`	OpenAI API key (with `--use-api`)	—
`--json`	Also save `.json`	`False`
`--no-txt`	Skip `.txt` output	`False`
`--no-srt`	Skip `.srt` output	`False`

Usage — Python

from wishcribe import download, transcribe

# ── With speaker labels ────────────────────────────────────────
# Step 1 — download models once
download(hf_token="hf_xxx")

# Step 2 — transcribe with speaker labels
segments = transcribe(
    "meeting.mp4",
    hf_token="hf_xxx",     # or set WISHCRIBE_HF_TOKEN env var
    model="large",          # default — best accuracy
    language="id",
    num_speakers=2,
    output_dir="./out",
)

# ── Without speaker labels ─────────────────────────────────────
# No download step needed, no token needed
segments = transcribe(
    "meeting.mp4",
    diarize=False,
    language="id",
)

for seg in segments:
    if seg.speaker:
        print(f"[{seg.speaker}] {seg.start:.1f}s  {seg.text}")
    else:
        print(f"{seg.start:.1f}s  {seg.text}")

Using a virtual environment (recommended)

To avoid conflicts with other Python packages on your system:

Windows

python -m venv wishcribe-env
wishcribe-env\Scripts\activate
pip install wishcribe

macOS / Linux

python3 -m venv wishcribe-env
source wishcribe-env/bin/activate
pip install wishcribe

Every time you open a new terminal, activate the environment first:

# Windows
wishcribe-env\Scripts\activate

# macOS / Linux
source wishcribe-env/bin/activate

Whisper model guide

Model	Size	Speed	Accuracy
`tiny`	75 MB	Very fast	Fair
`base`	139 MB	Fast	Good
`small`	461 MB	Moderate	Better
`medium`	1.4 GB	Slow	Very good
`large`	2.9 GB	Slowest	Best ⭐ (default)

Output files

File	Description
`<n>_transcript.txt`	Plain text grouped by speaker (or by time if `--no-diarize`)
`<n>_transcript.srt`	SRT subtitles with speaker labels (or without if `--no-diarize`)
`<n>_transcript.json`	Raw JSON array (opt-in with `--json`)

Supported formats

Video: mp4, mkv, avi, mov, webm, and more
Audio: mp3, wav, m4a, flac, ogg, aac, opus, and more
Languages: 90+ (Whisper auto-detects if --bahasa not set)

Troubleshooting

401 Client Error / Access to model is restricted
Make sure the license is accepted and your token is valid:

wishcribe --video meeting.mp4 --bahasa id --hf-token hf_xxx
# or set once: export WISHCRIBE_HF_TOKEN=hf_xxx

Accept the license here: https://huggingface.co/pyannote/speaker-diarization-community-1

Want to skip the HuggingFace setup entirely?
Use --no-diarize — no token needed, works immediately after pip install wishcribe.

wishcribe: command not found

pip install wishcribe --upgrade
# or on Windows:
python -m wishcribe --video meeting.mp4

ffmpeg not found
Follow the ffmpeg installation steps above for your OS.

Dependency conflicts (e.g. with tensorflow)
Use a virtual environment (see section above) to isolate wishcribe cleanly.

Out of memory with large model
Switch to a smaller model:

wishcribe --video meeting.mp4 --model medium

License

MIT — free to use, modify, and distribute.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.4.1

Mar 25, 2026

1.4.0

Mar 15, 2026

1.3.2

Mar 15, 2026

1.3.1

Mar 15, 2026

1.3.0

Mar 15, 2026

1.2.1

Mar 13, 2026

1.2.0

Mar 13, 2026

1.1.0

Mar 12, 2026

1.0.12

Mar 10, 2026

This version

1.0.11

Mar 10, 2026

1.0.10

Mar 10, 2026

1.0.8

Mar 10, 2026

1.0.7

Mar 9, 2026

1.0.6

Mar 9, 2026

1.0.5

Mar 9, 2026

1.0.4

Mar 9, 2026

1.0.3

Mar 9, 2026

1.0.2

Mar 9, 2026

1.0.1

Mar 9, 2026

1.0.0

Mar 8, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wishcribe-1.0.11.tar.gz (21.8 kB view details)

Uploaded Mar 10, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

wishcribe-1.0.11-py3-none-any.whl (20.1 kB view details)

Uploaded Mar 10, 2026 Python 3

File details

Details for the file wishcribe-1.0.11.tar.gz.

File metadata

Download URL: wishcribe-1.0.11.tar.gz
Upload date: Mar 10, 2026
Size: 21.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.0

File hashes

Hashes for wishcribe-1.0.11.tar.gz
Algorithm	Hash digest
SHA256	`67f699659cc4cc4ec1c55b523113bf120e8bec6829db19472228dc03a2044f2a`
MD5	`1c8175c9013018c0c6cd57a5451a8f2c`
BLAKE2b-256	`c0bb5f0caf9b5421d9f339e6aa0e15de6e7ac7d730e0f4de1af8c2e36bd067bb`

See more details on using hashes here.

File details

Details for the file wishcribe-1.0.11-py3-none-any.whl.

File metadata

Download URL: wishcribe-1.0.11-py3-none-any.whl
Upload date: Mar 10, 2026
Size: 20.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.0

File hashes

Hashes for wishcribe-1.0.11-py3-none-any.whl
Algorithm	Hash digest
SHA256	`107997f3780b53bd15af34f799e93deed41fe07cb9d91f4812ceb6b9e042cd06`
MD5	`e7595cb55c84770486bd2f476f429be8`
BLAKE2b-256	`0afc935173ad317d43e1018aa09d8d8045292ce5c36b354c179c768a8a65e958`

See more details on using hashes here.

wishcribe 1.0.11

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Wishcribe

Requirements

Installing Python

Windows

macOS

Ubuntu / Debian Linux

Installing ffmpeg

Windows

macOS

Ubuntu / Debian

Installation

Two modes of transcription

HuggingFace setup (required for speaker labels)

Quick start

With speaker labels (full mode)

Without speaker labels (no token needed)

Avoid typing --hf-token every time

macOS / Linux

Windows

Usage — CLI

Download command

Transcribe command

All options

Usage — Python

Using a virtual environment (recommended)

Windows

macOS / Linux

Whisper model guide

Output files

Supported formats

Troubleshooting

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes