"Fast, cross-platform CLI and GUI for batch transcription, translation, speaker annotation and subtitle generation using OpenAI’s Whisper on CPU, Nvidia GPU and Apple MLX."

These details have not been verified by PyPI

Project links

Homepage

Project description

whisply

Transcribe, translate, annotate and subtitle audio and video files with OpenAI's Whisper ... fast!

whisply combines faster-whisper and mlx-whisper to offer an easy-to-use solution for batch processing files on Windows, Linux and Mac. It also enables word-level speaker annotation by integrating whisperX and pyannote.

Features
Requirements
Installation
Usage
Citation

Features

🚴‍♂️ Performance: whisply selects the fastest Whisper implementation based on your hardware:
- CPU/GPU (Nvidia CUDA): faster-whisper or whisperX
- MLX (Apple M1-M5): mlx-whisper
⏩ large-v3-turbo Ready: Support for whisper-large-v3-turbo on all devices. Note: Subtitling and annotations on CPU/GPU use whisperX for accurate timestamps, but whisper-large-v3-turbo isn’t currently available for whisperX.
✅ Auto Device Selection: whisply automatically chooses faster-whisper (CPU, Nvidia GPU) or whisper-MLX (Apple M1-M5) for transcription and translation unless a specific --device option is passed.
🗣️ Word-level Annotations: Enabling --subtitle or --annotate uses whisperX and pyannote for word segmentation and speaker annotations. whisply approximates missing timestamps for numeric words.
💬 Customizable Subtitles: Specify words per subtitle block (e.g., "5") to generate .srt, .vtt and .webvtt files with fixed word counts and timestamps.
📦 Batch Processing: Handle single files, folders, URLs, or lists via .list documents. See the Batch processing section for details.
👩‍💻 CLI / App: whisply can be run directly from CLI or as a browser app.
⚙️ Export Formats:
- Structured: .json, .rttm
- Unstructured: .txt, .txt (annotated)
- Markup: .html (compatible with noScribe's editor)
- Subtitles: .srt, .webvtt, .vtt

Requirements

FFmpeg
>= Python3.10 <Python3.14
GPU processing requires:
- Nvidia GPU (CUDA: cuBLAS and cuDNN for CUDA 12)
- Apple Silicon (Mac M1-M5)
Speaker annotation requires a HuggingFace Access Token

Installation

Install `ffmpeg`

# --- macOS ---
brew install ffmpeg

# --- Linux ---
sudo apt-get update
sudo apt-get install ffmpeg

# --- Windows ---
winget install Gyan.FFmpeg

For more information you can visit the FFmpeg website.

Installation with `pip`

pip install whisply installs CPU + annotation dependencies (torch, torchaudio, pyannote) out of the box. Add one of the extras below if you want MLX acceleration or the whisply browser app.

Create a Python virtual environment

python3 -m venv venv

Activate the environment

# --- Linux & macOS ---
source venv/bin/activate

# --- Windows ---
venv\Scripts\activate

Install whisply

pip install whisply

(Optional) Install extras if you need them

pip install ".[app]"  # For running the whisply browser app
pip install ".[mlx]"  # For running whisply-MLX on Apple M1-M5

Installation from `source`

Clone this repository

git clone https://github.com/tsmdt/whisply.git

Change to project folder

cd whisply

Create a Python virtual environment

python3 -m venv venv

Activate the Python virtual environment

# --- Linux & macOS ---
source venv/bin/activate

# --- Windows ---
venv\Scripts\activate

Install whisply

pip install .

(Optional) Install whisply extras

pip install -e ".[mlx,app]"

Nvidia GPU fix (November 2025)

Could not load library libcudnn_ops.so.9 (click to expand)

If you use whisply with a Nvidia GPU and encounter this error:

Unable to load any of {libcudnn_ops.so.9.1.0, libcudnn_ops.so.9.1, libcudnn_ops.so.9, libcudnn_ops.so}

Use the following steps to fix the issue:

In your activated python environment run pip list and check that torch==2.8.0 and torchaudio==2.8.0 are installed.
If yes, run pip install ctranslate2==4.6.0.
Export the following environment variable to your shell:

export LD_LIBRARY_PATH="$(
python - <<'PY'
import importlib.util, pathlib

spec = importlib.util.find_spec("nvidia.cudnn")
if not spec or not spec.submodule_search_locations:
    raise SystemExit("Could not locate nvidia.cudnn package")

pkg_dir = pathlib.Path(spec.submodule_search_locations[0])
lib_dir = pkg_dir / "lib"
print(lib_dir)
PY
):${LD_LIBRARY_PATH}"

To make the change permanent, run this bash command while your python environment is activated:

printf '\n# --- add cuDNN wheel dir ---\nexport LD_LIBRARY_PATH="$(python - <<'"'"'PY'"'"'\nimport importlib.util, pathlib\nspec = importlib.util.find_spec("nvidia.cudnn")\npkg_dir = pathlib.Path(spec.submodule_search_locations[0])\nprint(pkg_dir / "lib")\nPY\n):${LD_LIBRARY_PATH}"\n' >> "$VIRTUAL_ENV/bin/activate"

Finally, deactivate the environment and reactivate it to apply the changes.

Find additional information at faster-whisper's GitHub page.

Usage

CLI

Three CLI commands are available:

whisply run: Running a transcription task
whisply app: Starting the whisply browser app
whisply list: Listing available models

$ whisply run

 Usage: whisply run [OPTIONS]

 Transcribe files with whisply

╭─ Options ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --files              -f               TEXT                                     Path to file, folder, URL or .list to process.               │
│ --output_dir         -o               DIRECTORY                                Output folder [default: transcriptions]                      │
│ --device             -d               [auto|cpu|gpu|mlx]                       CPU, GPU (NVIDIA), MLX (Mac M1-M5) [default: auto]           │
│ --model              -m               TEXT                                     Whisper model (run "whisply list" to see options)            │
│                                                                                [default: large-v3-turbo]                                    │
│ --language           -l               TEXT                                     Language of your file(s) ("en", "de") (Default: auto-detect) │
│ --annotate           -a                                                        Enable speaker annotation (Default: False)                   │
│ --num_speakers       -num             INTEGER                                  Number of speakers to annotate (Default: auto-detect)        │
│ --hf_token           -hf              TEXT                                     HuggingFace Access token required for speaker annotation     │
│ --subtitle           -s                                                        Create subtitles (Default: False)                            │
│ --subtitle_length    -sub_length      INTEGER                                  Subtitle segment length in words [default: 5]                │
│ --translate          -t                                                        Translate transcription to English (Default: False)          │
│ --export             -e               [all|json|txt|rttm|vtt|webvtt|srt|html]  Choose the export format [default: all]                      │
│ --del_originals      -del                                                      Delete input files after file conversion. (Default: False)   │
│ --download_language  -dl              TEXT                                     Specify a language code ("en", "de" ...) to transcribe a     │
│                                                                                specific audio track of a URL. (Default: auto-detect)        │
│ --config             -c               PATH                                     Path to configuration file                                   │
│ --post_correction    -post            PATH                                     Path to YAML file for post-correction                        │
│ --verbose            -v                                                        Print text chunks during transcription (Default: False)      │
│ --help                                                                         Show this message and exit.                                  │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

App

Instead of running whisply from the CLI you can start the browser app instead:

$ whisply app

Open the local URL in your browser after starting the app (Note: The URL might differ from system to system):

* Running on local URL: http://127.0.0.1:7860

Speaker annotation and diarization

Requirements

In order to annotate speakers using --annotate you need to provide a valid HuggingFace access token using the --hf_token option. Additionally, you must accept the terms and conditions for both version 3.0 and version 3.1 of the pyannote segmentation model.

For detailed instructions, refer to the Requirements section on the pyannote model page on HuggingFace and make sure that you complete steps "2. Accept pyannote/segmentation-3.0 user conditions", "3. Accept pyannote/speaker-diarization-3.1 user conditions" and "4. Create access token at hf.co/settings/tokens".

How speaker annotation works

whisply uses whisperX for speaker diarization and annotation. Instead of returning chunk-level timestamps like the standard Whisper implementation whisperX is able to return word-level timestamps as well as annotating speakers word by word, thus returning much more precise annotations.

Out of the box whisperX will not provide timestamps for words containing only numbers (e.g. "1.5" or "2024"): whisply fixes those instances through timestamp approximation. Other known limitations of whisperX include:

inaccurate speaker diarization if multiple speakers talk at the same time
to provide word-level timestamps and annotations whisperX uses language specific alignment models; out of the box whisperX supports these languages: en, fr, de, es, it, ja, zh, nl, uk, pt.

Refer to the whisperX GitHub page for more information.

Post correction

The --post_correction option allows you to correct various transcription errors that you may find in your files. The option takes as argument a .yaml file with the following structure:

# Single word corrections
Gardamer: Gadamer

# Pattern-based corrections
patterns:
  - pattern: 'Klaus-(Cira|Cyra|Tira)-Stiftung'
    replacement: 'Klaus Tschira Stiftung'

Single word corrections: matches single words → wrong word: correct word
Pattern-based corrections: matches patterns → (Cira|Cyra|Tira) will look for Klaus-Cira-Stiftung, Klaus-Cyra-Stiftung and / or Klaus-Tira-Stiftung and replaces it with Klaus-Tschirra-Stiftung

Post correction will be applied to all export file formats you choose.

Batch processing

Instead of providing a file, folder or URL by using the --files option you can pass a .list with a mix of files, folders and URLs for processing.

Example:

$ cat my_files.list

video_01.mp4
video_02.mp4
./my_files/
https://youtu.be/KtOayYXEsN4?si=-0MS6KXbEWXA7dqo

Using config files for batch processing

You can provide a .json config file by using the --config option which makes batch processing easy. An example config looks like this:

{
    "files": "./files/my_files.list",          # Path to your files
    "output_dir": "./transcriptions",          # Output folder where transcriptions are saved
    "device": "auto",                          # AUTO, CPU, GPU, MLX or MPS
    "model": "large-v3-turbo",                 # Whisper model to use
    "language": null,                          # Null for auto-detection or language codes ("en", "de", ...)
    "download_language": null,                 # If transcribing a video with multiple audio tracks from a URL you can choose a specific audio track by passing a language code ("de", "fr", "en" ...)
    "annotate": false,                         # Annotate speakers 
    "num_speakers": null,                      # Number of speakers of the input file (null: auto-detection)
    "hf_token": "HuggingFace Access Token",    # Your HuggingFace Access Token (needed for annotations)
    "subtitle": false,                         # Subtitle file(s)
    "subtitle_length": 10,                     # Length of each subtitle block in number of words
    "translate": false,                        # Translate to English
    "export": "txt",                           # Export .txts only
    "verbose": false,                          # Print transcription segments while processing 
    "delete_originals": false,                 # Delete original input files after file conversion
    "post_correction": "my_corrections.yaml"   # Apply post correction with specified patterns in .yaml
}

Citation

Schmidt, T. (2026). whisply: Cross-Platform Python App for Batch Transcription, Translation, Speaker Annotation and Subtitle Generation of Video and Audio Content (v0.14.0). Zenodo. https://doi.org/10.5281/zenodo.19437856

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.14.2

Jul 7, 2026

0.14.1

May 16, 2026

0.14.0

Mar 18, 2026

0.13.2

Feb 8, 2026

0.13.1

Jan 16, 2026

0.13.0

Dec 12, 2025

0.12.0

Dec 3, 2025

0.11.0

Sep 8, 2025

0.10.5

May 13, 2025

0.10.4

May 8, 2025

0.10.3

Mar 28, 2025

0.10.1

Mar 27, 2025

0.10.0

Mar 25, 2025

0.9.5

Feb 13, 2025

0.9.4

Feb 13, 2025

0.9.2

Jan 17, 2025

0.9.1

Jan 15, 2025

0.9.0

Jan 7, 2025

0.8.0

Jan 2, 2025

0.7.4

Dec 21, 2024

0.7.3

Dec 20, 2024

0.6.4

Dec 12, 2024

0.6.3

Dec 10, 2024

0.6.2

Dec 6, 2024

0.6.1

Dec 4, 2024

0.5

Nov 26, 2024

0.3

Oct 31, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

whisply-0.14.2.tar.gz (3.0 MB view details)

Uploaded Jul 7, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

whisply-0.14.2-py3-none-any.whl (3.0 MB view details)

Uploaded Jul 7, 2026 Python 3

File details

Details for the file whisply-0.14.2.tar.gz.

File metadata

Download URL: whisply-0.14.2.tar.gz
Upload date: Jul 7, 2026
Size: 3.0 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.14

File hashes

Hashes for whisply-0.14.2.tar.gz
Algorithm	Hash digest
SHA256	`783c580b1df631a1bf12d2c463854a10082e441a0f1fd7f5d0ef9dda8834b8fb`
MD5	`79516402635b53c4a38cb967c7112c9c`
BLAKE2b-256	`f2e07449e0f4d0b5cadff71cea25989ecd0cae4bce1cfe0792611cf4440c98eb`

See more details on using hashes here.

File details

Details for the file whisply-0.14.2-py3-none-any.whl.

File metadata

Download URL: whisply-0.14.2-py3-none-any.whl
Upload date: Jul 7, 2026
Size: 3.0 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.14

File hashes

Hashes for whisply-0.14.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c73a8ea049c373d1785e5951fedc4dc088996d36580a95214fede6519c6778b6`
MD5	`03f2244144360961e2c34547a2ae683a`
BLAKE2b-256	`49c2cd6b9a11569545fd5b304862ae371d6eb64772898ada1a143c9ed97a67ac`

See more details on using hashes here.

whisply 0.14.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

whisply

Table of contents

Features

Requirements

Installation

Install ffmpeg

Installation with pip

Installation from source

Nvidia GPU fix (November 2025)

Usage

CLI

App

Speaker annotation and diarization

Requirements

How speaker annotation works

Post correction

Batch processing

Using config files for batch processing

Citation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Install `ffmpeg`

Installation with `pip`

Installation from `source`