Fast multi-speaker audio/video transcription — faster-whisper + pyannote.audio

These details have not been verified by PyPI

Project links

Project description

wishcribe

Fast multi-speaker audio/video transcription — faster-whisper + pyannote.audio, fully offline after first run.

v1.1.0 upgrades the transcription backend to faster-whisper (CTranslate2), giving 4–8× faster transcription at the same accuracy, with batched inference and VAD silence-filtering built in.

[SPEAKER_00] 00:00:01
  Selamat datang di rapat hari ini.

[SPEAKER_01] 00:00:05
  Terima kasih. Mari kita mulai.

[SPEAKER_00] 00:00:10
  Baik, topik pertama adalah anggaran kuartal ini.

Or without speaker labels (no HuggingFace token needed):

00:00:01
  Selamat datang di rapat hari ini.

00:00:05
  Terima kasih. Mari kita mulai.

What's new in v1.1.0

4–8× faster transcription via faster-whisper (CTranslate2 backend)
Batched inference — multiple audio chunks transcribed in parallel (--batch-size, default 16)
VAD filtering — silence is skipped before transcription, reducing hallucination
Auto compute type — float16 on modern GPU, int8 on CPU (auto-detected, overridable)
New models — large-v3, turbo (large-v3-turbo), distil-large-v3
New flags — --batch-size, --compute-type, --device
GPU memory freed between transcription and diarization (avoids OOM on 8 GB VRAM)
openai-whisper fallback — automatically used if faster-whisper is not installed

Requirements

Python 3.9 or higher
ffmpeg
~4 GB free disk space (for model weights)
Internet connection (first run only — fully offline after)

Installing Python

Windows

Go to https://www.python.org/downloads/windows/
Click "Download Python 3.x.x" (latest version)
Run the installer
⚠️ Important: Check "Add Python to PATH" before clicking Install
Open Command Prompt and verify:
```
python --version
pip --version
```

Tip: Use Command Prompt or PowerShell to run wishcribe commands.
To open: press Win + R, type cmd, press Enter.

macOS

python3 --version          # check if already installed
brew install python        # install via Homebrew if not

If you don't have Homebrew: https://brew.sh

Ubuntu / Debian

sudo apt update
sudo apt install python3 python3-pip

Installing ffmpeg

ffmpeg is required to extract audio from video files.

Windows

Go to https://ffmpeg.org/download.html → Windows → BtbN builds
Download ffmpeg-master-latest-win64-gpl.zip
Extract to C:\ffmpeg
Add C:\ffmpeg\bin to your system PATH
Verify: ffmpeg -version

macOS

brew install ffmpeg

Ubuntu / Debian

sudo apt install ffmpeg

Installation

pip install wishcribe

Windows: If pip is not found, try pip3 or python -m pip install wishcribe

Two modes

Mode	Command	HuggingFace token?	Output
With speaker labels	(default)	✅ Required	`[SPEAKER_00]`, `[SPEAKER_01]` …
Transcription only	`--no-diarize`	❌ Not needed	Timestamps only

Use --no-diarize if you want a transcript without identifying who speaks when.

HuggingFace setup (required for speaker labels)

Skip this section if using --no-diarize only.

wishcribe uses pyannote/speaker-diarization-community-1 for speaker detection. You need to accept the license once before downloading:

Sign up: https://huggingface.co/join
Accept license: https://huggingface.co/pyannote/speaker-diarization-community-1
Create a Read token: https://huggingface.co/settings/tokens

⚠️ The license must be accepted before running wishcribe download. Without it the download fails with a 401 error.

Quick start

With speaker labels (full mode)

# Step 1 — download all models once (~4 GB total, then fully offline)
wishcribe download --hf-token hf_xxx

# Step 2 — transcribe
wishcribe --video meeting.mp4 --bahasa id --speakers 2

Without speaker labels (no token needed)

wishcribe --video meeting.mp4 --bahasa id --no-diarize

Low GPU memory (8 GB VRAM or less)

wishcribe --video meeting.mp4 --batch-size 4 --compute-type int8

CPU-only

wishcribe --video meeting.mp4 --device cpu

Avoid typing --hf-token every time

Set your token as an environment variable once:

macOS / Linux

# Add to ~/.zshrc or ~/.bash_profile
export WISHCRIBE_HF_TOKEN="hf_xxx"

source ~/.zshrc    # reload

Windows

# Current session only
set WISHCRIBE_HF_TOKEN=hf_xxx

# Permanently: Win + S → "Environment Variables" → New → WISHCRIBE_HF_TOKEN

After setting it, --hf-token is no longer needed:

wishcribe --video meeting.mp4 --bahasa id --speakers 2

🔒 Environment variables live on your machine only — never committed to Git or uploaded anywhere.

CLI reference

`wishcribe download`

Pre-download and cache all model weights (run once, then fully offline).

wishcribe download --hf-token hf_xxx                   # default large-v2 + diarization
wishcribe download --hf-token hf_xxx --model turbo     # download turbo instead
wishcribe download --hf-token hf_xxx --force           # delete cache and re-download fresh
wishcribe download --model-path /path/to/local-model   # use a local pyannote folder

`wishcribe --video` (transcribe)

wishcribe --video meeting.mp4 --bahasa id --speakers 2        # full mode
wishcribe --video meeting.mp4 --bahasa id --no-diarize        # no speaker labels
wishcribe --video meeting.mp4 --model turbo                   # faster model
wishcribe --video meeting.mp4 --batch-size 4                  # lower GPU memory
wishcribe --video meeting.mp4 --compute-type int8             # quantized, less VRAM
wishcribe --video meeting.mp4 --device cpu                    # CPU only
wishcribe --video meeting.mp4 --use-api --api-key sk-xxx      # OpenAI cloud API
wishcribe --video meeting.mp4 --output ./results --json       # custom folder + JSON

All options

Argument	Description	Default
`--video`	Path to video or audio file (required)	—
`--hf-token`	HuggingFace token (or set `WISHCRIBE_HF_TOKEN` env var)	—
`--no-diarize`	Skip speaker diarization — no token needed	`False`
`--model`	`tiny` / `base` / `small` / `medium` / `large` / `large-v2` / `large-v3` / `turbo` / `distil-large-v3`	`large-v2`
`--bahasa`	Language code e.g. `id`, `en`	auto-detect
`--speakers`	Number of speakers (improves accuracy when known)	auto
`--batch-size`	Transcription batch size — higher = faster on GPU	`16`
`--compute-type`	`float16` (GPU fast) / `int8` (low-mem) / `float32` (CPU)	auto
`--device`	`cuda` or `cpu`	auto
`--model-path`	Path to local pyannote model folder	—
`--output`	Output folder	same as input
`--use-api`	Use OpenAI Whisper API (no local GPU)	`False`
`--api-key`	OpenAI API key (required with `--use-api`)	—
`--json`	Also save `.json` output	`False`
`--no-txt`	Skip `.txt` output	`False`
`--no-srt`	Skip `.srt` output	`False`
`--quiet`	Suppress progress output	`False`

Python API

from wishcribe import download, transcribe

# ── One-time setup ─────────────────────────────────────────────
download(hf_token="hf_xxx")
# download(hf_token="hf_xxx", model="turbo")    # different model
# download(hf_token="hf_xxx", force=True)       # re-download fresh

# ── With speaker labels (default) ─────────────────────────────
segments = transcribe(
    "meeting.mp4",
    language="id",
    num_speakers=2,       # optional but improves accuracy
    output_dir="./out",
)

# ── Without speaker labels ─────────────────────────────────────
segments = transcribe("meeting.mp4", diarize=False, language="id")

# ── Speed / hardware controls ──────────────────────────────────
segments = transcribe(
    "meeting.mp4",
    model="turbo",          # large-v3-turbo: fast + accurate
    batch_size=16,          # default — lower to 4 if OOM
    compute_type="float16", # auto-detected; "int8" for CPU/low-mem
    device="cuda",          # auto-detected
)

# ── OpenAI cloud API ───────────────────────────────────────────
segments = transcribe("meeting.mp4", use_api=True, api_key="sk-xxx")

# ── Output control ─────────────────────────────────────────────
segments = transcribe(
    "meeting.mp4",
    save_txt=True,   # _transcript.txt  (default on)
    save_srt=True,   # _transcript.srt  (default on)
    save_json=True,  # _transcript.json (default off)
)

# ── Iterate results ────────────────────────────────────────────
for seg in segments:
    print(f"[{seg.speaker}] {seg.start:.1f}s — {seg.text}")

# Each Segment: .start  .end  .speaker  .text
# seg.to_dict() → {start, end, duration, speaker, text}

Whisper model guide

Model	Size	Speed	Accuracy	Notes
`tiny`	75 MB	⚡⚡⚡⚡	Fair	Fast testing / drafts
`base`	139 MB	⚡⚡⚡	Good
`small`	461 MB	⚡⚡	Better
`medium`	1.4 GB	⚡	Very good	Recommended for CPU
`large-v2`	2.9 GB	—	Best ⭐	Default — highest accuracy
`large-v3`	3.1 GB	—	Best	Newest large model
`turbo`	1.6 GB	⚡⚡	Very good	Best speed/accuracy ratio
`distil-large-v3`	1.5 GB	⚡⚡	Very good	Distilled, near large-v2

Recommendation: use large-v2 (default) for best accuracy, or turbo for a fast/accurate balance.

Output files

File	Description
`<name>_transcript.txt`	Human-readable, grouped by speaker turn
`<name>_transcript.srt`	SRT subtitles — importable into video editors
`<name>_transcript.json`	JSON array with `start`, `end`, `duration`, `speaker`, `text` (opt-in with `--json`)

Supported formats

Video: .mp4, .mkv, .mov, .avi, .webm, .ts, .wmv, .flv
Audio: .mp3, .wav, .m4a, .flac, .ogg, .aac, .opus, .wma
Languages: 90+ (Whisper auto-detects if --bahasa is not set)

Using a virtual environment (recommended)

# macOS / Linux
python3 -m venv wishcribe-env
source wishcribe-env/bin/activate
pip install wishcribe

# Windows
python -m venv wishcribe-env
wishcribe-env\Scripts\activate
pip install wishcribe

Activate at the start of each terminal session:

source wishcribe-env/bin/activate    # macOS / Linux
wishcribe-env\Scripts\activate        # Windows

Troubleshooting

401 Client Error / Access to model is restricted
Accept the license at https://huggingface.co/pyannote/speaker-diarization-community-1 and verify your token is a valid Read token.

wishcribe: command not found

pip install --upgrade wishcribe
# Windows fallback:
python -m wishcribe --video meeting.mp4

ffmpeg not found
Install ffmpeg for your OS (see above).

Out of GPU memory (CUDA OOM)
Lower batch size or use int8 quantization:

wishcribe --video meeting.mp4 --batch-size 4 --compute-type int8
# or CPU only:
wishcribe --video meeting.mp4 --device cpu

Dependency conflicts (e.g. with TensorFlow)
Use a virtual environment to isolate wishcribe cleanly.

Want to skip HuggingFace entirely?
Use --no-diarize — no token required, works right after pip install wishcribe.

License

MIT — free to use, modify, and distribute.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.4.1

Mar 25, 2026

1.4.0

Mar 15, 2026

1.3.2

Mar 15, 2026

1.3.1

Mar 15, 2026

1.3.0

Mar 15, 2026

1.2.1

Mar 13, 2026

1.2.0

Mar 13, 2026

This version

1.1.0

Mar 12, 2026

1.0.12

Mar 10, 2026

1.0.11

Mar 10, 2026

1.0.10

Mar 10, 2026

1.0.8

Mar 10, 2026

1.0.7

Mar 9, 2026

1.0.6

Mar 9, 2026

1.0.5

Mar 9, 2026

1.0.4

Mar 9, 2026

1.0.3

Mar 9, 2026

1.0.2

Mar 9, 2026

1.0.1

Mar 9, 2026

1.0.0

Mar 8, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wishcribe-1.1.0.tar.gz (27.0 kB view details)

Uploaded Mar 12, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

wishcribe-1.1.0-py3-none-any.whl (25.1 kB view details)

Uploaded Mar 12, 2026 Python 3

File details

Details for the file wishcribe-1.1.0.tar.gz.

File metadata

Download URL: wishcribe-1.1.0.tar.gz
Upload date: Mar 12, 2026
Size: 27.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.0

File hashes

Hashes for wishcribe-1.1.0.tar.gz
Algorithm	Hash digest
SHA256	`3ca1ccbe789c8a5614bcc1968a002a4576b9ff620fb094b0ab57fae6032e3587`
MD5	`200822b026d8ee49f3a96b9a9de7f1e2`
BLAKE2b-256	`4c9caa6866cf672c0d327c487b65bdeb75444a544afea8455d6ee4997b2342c1`

See more details on using hashes here.

File details

Details for the file wishcribe-1.1.0-py3-none-any.whl.

File metadata

Download URL: wishcribe-1.1.0-py3-none-any.whl
Upload date: Mar 12, 2026
Size: 25.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.0

File hashes

Hashes for wishcribe-1.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`263ae57a9a46a7e52d33035d348b40ff839db8ad57d32aae6c23dbac18b2dea8`
MD5	`ca2c1366dd800267150de7b67da0281d`
BLAKE2b-256	`5307ef688fb6196d2dec326191d0544ea45069ff96420bd945eef8c5faea97fb`

See more details on using hashes here.

wishcribe 1.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

wishcribe

What's new in v1.1.0

Requirements

Installing Python

Windows

macOS

Ubuntu / Debian

Installing ffmpeg

Windows

macOS

Ubuntu / Debian

Installation

Two modes

HuggingFace setup (required for speaker labels)

Quick start

With speaker labels (full mode)

Without speaker labels (no token needed)

Low GPU memory (8 GB VRAM or less)

CPU-only

Avoid typing --hf-token every time

macOS / Linux

Windows

CLI reference

wishcribe download

wishcribe --video (transcribe)

All options

Python API

Whisper model guide

Output files

Supported formats

Using a virtual environment (recommended)

Troubleshooting

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`wishcribe download`

`wishcribe --video` (transcribe)