Skip to main content

Multi-speaker audio/video transcription — Whisper large + pyannote.audio (offline)

Project description

wishcribe ✍️

Multi-speaker audio/video transcription — Whisper large + pyannote.audio, fully offline after first run.

[SPEAKER_00] 00:00:01
  Selamat datang di rapat hari ini.

[SPEAKER_01] 00:00:05
  Terima kasih. Mari kita mulai.

[SPEAKER_00] 00:00:10
  Baik, topik pertama adalah anggaran kuartal ini.

Installation

pip install wishcribe

ffmpeg is also required (one-time system install):

brew install ffmpeg        # macOS
sudo apt install ffmpeg    # Ubuntu/Debian

Quick start

Step 1 — download all models (run once)

wishcribe download --hf-token hf_xxx

This downloads and caches:

  • Whisper large (~2.9 GB) → ~/.cache/whisper/large.pt
  • pyannote diarization (~1 GB) → ~/.cache/huggingface/hub/...

Output:

📦  WISHCRIBE — MODEL DOWNLOADER
══════════════════════════════════════════
  Whisper model : large
  Diarization   : HuggingFace download (token provided)
══════════════════════════════════════════

📥 Downloading Whisper 'large' model (2.9 GB)...
✅ Whisper 'large' downloaded and cached  (2.9 GB)

📥 Downloading pyannote diarization model (~1 GB)...
✅ Diarization model downloaded and cached

🎉 All models cached! wishcribe now works fully offline.
   Run transcription with:
   wishcribe --video meeting.mp4

Step 2 — transcribe (fully offline, forever)

wishcribe --video meeting.mp4

That's it. No token, no internet, no extra flags.


Usage — CLI

Download command

# Download default model (large)
wishcribe download --hf-token hf_xxx

# Download a specific model size
wishcribe download --hf-token hf_xxx --model medium

# Use a local pyannote model folder (no HuggingFace needed)
wishcribe download --model-path /path/to/pyannote-model

Run / transcribe command

# Basic (Whisper large by default)
wishcribe --video meeting.mp4
wishcribe run --video meeting.mp4    # same thing

# With language + speaker count
wishcribe --video meeting.mp4 --bahasa id --speakers 3

# Override Whisper model
wishcribe --video meeting.mp4 --model medium
wishcribe --video meeting.mp4 --model small

# Use OpenAI API for transcription (diarization still offline)
wishcribe --video meeting.mp4 --use-api --api-key sk-xxx

# Custom output folder + save JSON
wishcribe --video meeting.mp4 --output ./results --json

All run options

Argument Description Default
--video Path to video or audio file (required)
--hf-token HuggingFace token — first-time only
--model-path Path to local pyannote model folder
--model tiny/base/small/medium/large large
--bahasa Language code e.g. id, en auto-detect
--speakers Number of speakers (optional) auto
--output Output folder same as input
--use-api Use OpenAI Whisper API False
--api-key OpenAI API key (with --use-api)
--json Also save .json False
--no-txt Skip .txt output False
--no-srt Skip .srt output False

Usage — Python

from wishcribe import download, transcribe

# Step 1 — download models once
download(hf_token="hf_xxx")

# Step 2 — transcribe offline
segments = transcribe("meeting.mp4")

# With options
segments = transcribe(
    "meeting.mp4",
    model="large",     # default — best accuracy
    language="id",
    num_speakers=3,
    output_dir="./out",
)

for seg in segments:
    print(f"[{seg.speaker}] {seg.start:.1f}s  {seg.text}")

How offline mode works

Cache location What's stored
~/.cache/whisper/large.pt Whisper large model weights (2.9 GB)
~/.cache/huggingface/hub/models--pyannote--... Diarization model (~1 GB)

Once cached, both load instantly from disk — no internet ever needed.


Whisper model guide

Model Size Speed Accuracy
tiny 75 MB Very fast Fair
base 139 MB Fast Good
small 461 MB Moderate Better
medium 1.4 GB Slow Very good
large 2.9 GB Slowest Best ⭐ (default)

HuggingFace setup (for download command)

  1. Sign up at https://huggingface.co
  2. Accept the license: https://huggingface.co/pyannote/speaker-diarization-3.1
  3. Create a Read token: https://huggingface.co/settings/tokens

Only needed once for wishcribe download.


Output files

File Description
<n>_transcript.txt Plain text grouped by speaker
<n>_transcript.srt SRT subtitles with speaker labels
<n>_transcript.json Raw JSON array (opt-in)

Publishing

make build      # build dist/
make publish    # upload to PyPI → pip install wishcribe

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wishcribe-1.0.0.tar.gz (17.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

wishcribe-1.0.0-py3-none-any.whl (17.2 kB view details)

Uploaded Python 3

File details

Details for the file wishcribe-1.0.0.tar.gz.

File metadata

  • Download URL: wishcribe-1.0.0.tar.gz
  • Upload date:
  • Size: 17.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.0

File hashes

Hashes for wishcribe-1.0.0.tar.gz
Algorithm Hash digest
SHA256 92ed3839b42d2da59a42b26eb1d6e2ee387d99ed32c792296567baf4bf175db0
MD5 0def8f5ab6c83d5ca13289a4f0e91494
BLAKE2b-256 0d8c88cbbac32d8b5727154d47a5d88780012f9b89337608fb42c3128988bbbd

See more details on using hashes here.

File details

Details for the file wishcribe-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: wishcribe-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 17.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.0

File hashes

Hashes for wishcribe-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 954c309f5d78b4a6167de57aa3b1b6f0e757410ef823b9a243c4d6557e7bb74e
MD5 c11b44d550dbeb61b4c32b5f14f08894
BLAKE2b-256 abd1b60d41c55e1b3f9f507113c54683850a210606b5a580cf7a1349d37c1d50

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page