Skip to main content

"Fast, cross-platform CLI and GUI for batch transcription, translation, speaker annotation and subtitle generation using OpenAIโ€™s Whisper on CPU, Nvidia GPU and Apple MLX."

Project description

whisply

PyPI version

Transcribe, translate, annotate and subtitle audio and video files with OpenAI's Whisper ... fast!

whisply combines faster-whisper and mlx-whisper to offer an easy-to-use solution for batch processing files on Windows, Linux and Mac. It also enables word-level speaker annotation by integrating whisperX and pyannote.

Table of contents

Features

  • ๐Ÿšดโ€โ™‚๏ธ Performance: whisply selects the fastest Whisper implementation based on your hardware:

    • CPU/GPU (Nvidia CUDA): faster-whisper or whisperX
    • MLX (Apple M1-M5): mlx-whisper
  • โฉ large-v3-turbo Ready: Support for whisper-large-v3-turbo on all devices. Note: Subtitling and annotations on CPU/GPU use whisperX for accurate timestamps, but whisper-large-v3-turbo isnโ€™t currently available for whisperX.

  • โœ… Auto Device Selection: whisply automatically chooses faster-whisper (CPU) or insanely-fast-whisper (MPS) or whisper-MLX (Apple M1-M5) for transcription and translation unless a specific --device option is passed.

  • ๐Ÿ—ฃ๏ธ Word-level Annotations: Enabling --subtitle or --annotate uses whisperX or insanely-fast-whisper for word segmentation and speaker annotations. whisply approximates missing timestamps for numeric words.

  • ๐Ÿ’ฌ Customizable Subtitles: Specify words per subtitle block (e.g., "5") to generate .srt, .vtt and .webvtt files with fixed word counts and timestamps.

  • ๐Ÿ“ฆ Batch Processing: Handle single files, folders, URLs, or lists via .list documents. See the Batch processing section for details.

  • ๐Ÿ‘ฉโ€๐Ÿ’ป CLI / App: whisply can be run directly from CLI or as an app with a graphical user-interface (GUI).

  • โš™๏ธ Export Formats:

    • Structured: .json, .rttm
    • Unstructured: .txt, .txt (annotated)
    • Markup: .html (compatible with noScribe's editor)
    • Subtitles: .srt, .webvtt, .vtt

Requirements

  • FFmpeg
  • >= Python3.10 <Python3.14
  • GPU processing requires:
    • Nvidia GPU (CUDA: cuBLAS and cuDNN for CUDA 12)
    • Apple Silicon (Mac M1-M5)
  • Speaker annotation requires a HuggingFace Access Token

Installation

Install ffmpeg

# --- macOS ---
brew install ffmpeg

# --- Linux ---
sudo apt-get update
sudo apt-get install ffmpeg

# --- Windows ---
winget install Gyan.FFmpeg

For more information you can visit the FFmpeg website.

Installation with pip

pip install whisply installs CPU + annotation dependencies (torch, torchaudio, pyannote) out of the box. Add one of the extras below if you want GPU/MPS/MLX acceleration or whisperX-based pipelines.

  1. Create a Python virtual environment
python3 -m venv venv
  1. Activate the environment
# --- Linux & macOS ---
source venv/bin/activate

# --- Windows ---
venv\Scripts\activate
  1. Install whisply
pip install whisply
  1. (Optional) Install extras if you need them
pip install "whisply[app]"  # For running the whisply browser app
pip install "whisply[mlx]"  # For running whisply-MLX on Apple M1-M5

Installation from source

  1. Clone this repository
git clone https://github.com/tsmdt/whisply.git
  1. Change to project folder
cd whisply
  1. Create a Python virtual environment
python3 -m venv venv
  1. Activate the Python virtual environment
# --- Linux & macOS ---
source venv/bin/activate

# --- Windows ---
venv\Scripts\activate
  1. Install whisply
pip install .
  1. (Optional) Install whisply extras
pip install -e ".[mlx,app]"

Nvidia GPU fix (November 2025)

Could not load library libcudnn_ops.so.9 (click to expand)
If you use whisply with a Nvidia GPU and encounter this error:

Unable to load any of {libcudnn_ops.so.9.1.0, libcudnn_ops.so.9.1, libcudnn_ops.so.9, libcudnn_ops.so}

Use the following steps to fix the issue:

  1. In your activated python environment run pip list and check that torch==2.8.0 and torchaudio==2.8.0 are installed.
  2. If yes, run pip install ctranslate2==4.6.0.
  3. Export the following environment variable to your shell:
export LD_LIBRARY_PATH="$(
python - <<'PY'
import importlib.util, pathlib

spec = importlib.util.find_spec("nvidia.cudnn")
if not spec or not spec.submodule_search_locations:
    raise SystemExit("Could not locate nvidia.cudnn package")

pkg_dir = pathlib.Path(spec.submodule_search_locations[0])
lib_dir = pkg_dir / "lib"
print(lib_dir)
PY
):${LD_LIBRARY_PATH}"
  1. To make the change permanent, run this bash command while your python environment is activated:
printf '\n# --- add cuDNN wheel dir ---\nexport LD_LIBRARY_PATH="$(python - <<'"'"'PY'"'"'\nimport importlib.util, pathlib\nspec = importlib.util.find_spec("nvidia.cudnn")\npkg_dir = pathlib.Path(spec.submodule_search_locations[0])\nprint(pkg_dir / "lib")\nPY\n):${LD_LIBRARY_PATH}"\n' >> "$VIRTUAL_ENV/bin/activate"

Finally, deactivate the environment and reactivate it to apply the changes.

Find additional information at faster-whisper's GitHub page.

Usage

CLI

Three CLI commands are available:

  1. whisply run: Running a transcription task
  2. whisply app: Starting the whisply browser app
  3. whisply list: Listing available models
$ whisply run

 Usage: whisply run [OPTIONS]

 Transcribe files with whisply

โ•ญโ”€ Options โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ --files              -f               TEXT                                     Path to file, folder, URL or .list to process.               โ”‚
โ”‚ --output_dir         -o               DIRECTORY                                Output folder [default: transcriptions]                      โ”‚
โ”‚ --device             -d               [auto|cpu|gpu|mlx]                       CPU, GPU (NVIDIA), MLX (Mac M1-M5) [default: auto]           โ”‚
โ”‚ --model              -m               TEXT                                     Whisper model (run "whisply list" to see options)            โ”‚
โ”‚                                                                                [default: large-v3-turbo]                                    โ”‚
โ”‚ --language           -l               TEXT                                     Language of your file(s) ("en", "de") (Default: auto-detect) โ”‚
โ”‚ --annotate           -a                                                        Enable speaker annotation (Default: False)                   โ”‚
โ”‚ --num_speakers       -num             INTEGER                                  Number of speakers to annotate (Default: auto-detect)        โ”‚
โ”‚ --hf_token           -hf              TEXT                                     HuggingFace Access token required for speaker annotation     โ”‚
โ”‚ --subtitle           -s                                                        Create subtitles (Default: False)                            โ”‚
โ”‚ --subtitle_length    -sub_length      INTEGER                                  Subtitle segment length in words [default: 5]                โ”‚
โ”‚ --translate          -t                                                        Translate transcription to English (Default: False)          โ”‚
โ”‚ --export             -e               [all|json|txt|rttm|vtt|webvtt|srt|html]  Choose the export format [default: all]                      โ”‚
โ”‚ --del_originals      -del                                                      Delete input files after file conversion. (Default: False)   โ”‚
โ”‚ --download_language  -dl              TEXT                                     Specify a language code ("en", "de" ...) to transcribe a     โ”‚
โ”‚                                                                                specific audio track of a URL. (Default: auto-detect)        โ”‚
โ”‚ --config             -c               PATH                                     Path to configuration file                                   โ”‚
โ”‚ --post_correction    -post            PATH                                     Path to YAML file for post-correction                        โ”‚
โ”‚ --verbose            -v                                                        Print text chunks during transcription (Default: False)      โ”‚
โ”‚ --help                                                                         Show this message and exit.                                  โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ

App

Instead of running whisply from the CLI you can start the web app instead:

$ whisply app

Open the local URL in your browser after starting the app (Note: The URL might differ from system to system):

* Running on local URL: http://127.0.0.1:7860

Speaker annotation and diarization

Requirements

In order to annotate speakers using --annotate you need to provide a valid HuggingFace access token using the --hf_token option. Additionally, you must accept the terms and conditions for both version 3.0 and version 3.1 of the pyannote segmentation model.

For detailed instructions, refer to the Requirements section on the pyannote model page on HuggingFace and make sure that you complete steps "2. Accept pyannote/segmentation-3.0 user conditions", "3. Accept pyannote/speaker-diarization-3.1 user conditions" and "4. Create access token at hf.co/settings/tokens".

How speaker annotation works

whisply uses whisperX for speaker diarization and annotation. Instead of returning chunk-level timestamps like the standard Whisper implementation whisperX is able to return word-level timestamps as well as annotating speakers word by word, thus returning much more precise annotations.

Out of the box whisperX will not provide timestamps for words containing only numbers (e.g. "1.5" or "2024"): whisply fixes those instances through timestamp approximation. Other known limitations of whisperX include:

  • inaccurate speaker diarization if multiple speakers talk at the same time
  • to provide word-level timestamps and annotations whisperX uses language specific alignment models; out of the box whisperX supports these languages: en, fr, de, es, it, ja, zh, nl, uk, pt.

Refer to the whisperX GitHub page for more information.

Post correction

The --post_correction option allows you to correct various transcription errors that you may find in your files. The option takes as argument a .yaml file with the following structure:

# Single word corrections
Gardamer: Gadamer

# Pattern-based corrections
patterns:
  - pattern: 'Klaus-(Cira|Cyra|Tira)-Stiftung'
    replacement: 'Klaus Tschira Stiftung'
  • Single word corrections: matches single words โ†’ wrong word: correct word
  • Pattern-based corrections: matches patterns โ†’ (Cira|Cyra|Tira) will look for Klaus-Cira-Stiftung, Klaus-Cyra-Stiftung and / or Klaus-Tira-Stiftung and replaces it with Klaus-Tschirra-Stiftung

Post correction will be applied to all export file formats you choose.

Batch processing

Instead of providing a file, folder or URL by using the --files option you can pass a .list with a mix of files, folders and URLs for processing.

Example:

$ cat my_files.list

video_01.mp4
video_02.mp4
./my_files/
https://youtu.be/KtOayYXEsN4?si=-0MS6KXbEWXA7dqo

Using config files for batch processing

You can provide a .json config file by using the --config option which makes batch processing easy. An example config looks like this:

{
    "files": "./files/my_files.list",          # Path to your files
    "output_dir": "./transcriptions",          # Output folder where transcriptions are saved
    "device": "auto",                          # AUTO, CPU, GPU, MLX or MPS
    "model": "large-v3-turbo",                 # Whisper model to use
    "language": null,                          # Null for auto-detection or language codes ("en", "de", ...)
    "download_language": null,                 # If transcribing a video with multiple audio tracks from a URL you can choose a specific audio track by passing a language code ("de", "fr", "en" ...)
    "annotate": false,                         # Annotate speakers 
    "num_speakers": null,                      # Number of speakers of the input file (null: auto-detection)
    "hf_token": "HuggingFace Access Token",    # Your HuggingFace Access Token (needed for annotations)
    "subtitle": false,                         # Subtitle file(s)
    "subtitle_length": 10,                     # Length of each subtitle block in number of words
    "translate": false,                        # Translate to English
    "export": "txt",                           # Export .txts only
    "verbose": false,                          # Print transcription segments while processing 
    "delete_originals": false,                 # Delete original input files after file conversion
    "post_correction": "my_corrections.yaml"   # Apply post correction with specified patterns in .yaml
}

Citation

Schmidt, T., & Shigapov, R. whisply (v0.14.0) [Computer software]. https://github.com/tsmdt/whisply

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

whisply-0.14.0.tar.gz (43.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

whisply-0.14.0-py3-none-any.whl (42.1 kB view details)

Uploaded Python 3

File details

Details for the file whisply-0.14.0.tar.gz.

File metadata

  • Download URL: whisply-0.14.0.tar.gz
  • Upload date:
  • Size: 43.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for whisply-0.14.0.tar.gz
Algorithm Hash digest
SHA256 950739db30b84f160eddbd91c138309066655edc573499333a2a7209b1da4b37
MD5 479e0ebe2d41100bdcdca0eac45fbba4
BLAKE2b-256 d77e011395230860b485155f630ff5a653bb92b8c6b453abfa309bf8787951f0

See more details on using hashes here.

File details

Details for the file whisply-0.14.0-py3-none-any.whl.

File metadata

  • Download URL: whisply-0.14.0-py3-none-any.whl
  • Upload date:
  • Size: 42.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for whisply-0.14.0-py3-none-any.whl
Algorithm Hash digest
SHA256 738a88d8c44764dd3b3f98e82378a9736d579997fc9ffbb0ca16d43841398439
MD5 80354e44fa2bfa8815637e3c1c35f193
BLAKE2b-256 aa7d85a6a363c8173020bee3e9c5e1ac04e53abeaa16e5f0d67cc0548e2a373f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page