Transcribe, translate, annotate and subtitle audio and video files with OpenAI's Whisper ... fast!
Project description
whisply
Transcribe, translate, annotate and subtitle audio and video files with OpenAI's Whisper ... fast!
whisply
combines faster-whisper and insanely-fast-whisper to offer an easy-to-use solution for batch processing files. It also enables word-level speaker annotation by integrating whisperX and pyannote.
Table of contents
Features
-
๐ดโโ๏ธ Performance: Depending on your hardware
whisply
will use the fastestWhisper
implementation:- CPU:
fast-whisper
orwhisperX
- GPU (Nvidia CUDA) and MPS (Metal Performance Shaders, Apple M1-M3):
insanely-fast-whisper
orwhisperX
- CPU:
-
โ Auto device selection: When performing transcription or translation tasks without speaker annotation or subtitling,
faster-whisper
(CPU) orinsanely-fast-whisper
(MPS, Nvidia GPUs) will be selected automatically based on your hardware if you do not provide a device by using the--device
option. -
๐ฃ๏ธ Word-level annotations: If you choose to
--subtitle
or--annotate
,whisperX
will be used, which supports word-level segmentation and speaker annotations. Depending on your hardware,whisperX
can run either on CPU or Nvidia GPU (but not on Apple MPS). Out of the boxwhisperX
will not provide timestamps for words containing only numbers (e.g. "1.5" or "2024"):whisply
fixes those instances through timestamp approximation. -
๐ฌ Subtitles: Generating subtitles is customizable. You can specify the number of words per subtitle block (e.g., choosing "5" will generate
.srt
and.webvtt
files where each subtitle block exactly 5 words per segment with the corresponding timestamps). -
๐งบ Batch processing:
whisply
can process single files, whole folders, URLs or a combination of all by combining paths in a.list
document. See the Batch processing section for more information. -
โ๏ธ Supported output formats:
.json
.txt
.txt (annotated)
.srt
.webvtt
.vtt
.rttm
Requirements
- FFmpeg
- >= Python3.10
- GPU processing requires:
- Nvidia GPU (CUDA: cuBLAS and cuDNN 8 for CUDA 12)
- Apple Metal Performance Shaders (MPS) (Mac M1-M3)
- Speaker annotation requires a HuggingFace Access Token
GPU Fix for Could not load library libcudnn_ops_infer.so.8. (click to expand)
If you use whisply on a Linux system with a Nivida GPU and get this error:
"Could not load library libcudnn_ops_infer.so.8. Error: libcudnn_ops_infer.so.8: cannot open shared object file: No such file or directory"
Run the following line in your CLI:
export LD_LIBRARY_PATH=`python3 -c 'import os; import nvidia.cublas.lib; import nvidia.cudnn.lib; print(os.path.dirname(nvidia.cublas.lib.__file__) + ":" + os.path.dirname(nvidia.cudnn.lib.__file__))'`
Add this line to your Python environment to make it permanent:
echo "export LD_LIBRARY_PATH=\`python3 -c 'import os; import nvidia.cublas.lib; import nvidia.cudnn.lib; print(os.path.dirname(nvidia.cublas.lib.__file__) + \":\" + os.path.dirname(nvidia.cudnn.lib.__file__))'\`" >> path/to/your/python/env
For more information please refer to the faster-whisper GitHub page.
Installation
1. Install ffmpeg
--- macOS ---
brew install ffmpeg
--- Linux ---
sudo apt-get update
sudo apt-get install ffmpeg
--- Windows ----
https://ffmpeg.org/download.html
2. Clone this repository and change to project folder
git clone https://github.com/tsmdt/whisply.git
cd whisply
3. Create a Python virtual environment
python3.11 -m venv venv
4. Activate the Python virtual environment
source venv/bin/activate
5. Install whisply
with pip
pip install .
Usage
Usage: whisply [OPTIONS]
WHISPLY ๐ฌ Transcribe, translate, annotate and subtitle audio and video files with OpenAI's Whisper ... fast!
โญโ Options โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --files -f TEXT Path to file, folder, URL or .list to process. [default: None] โ
โ --output_dir -o DIRECTORY Folder where transcripts should be saved. [default: transcriptions] โ
โ --device -d [auto|cpu|gpu|mps] Select the computation device: CPU, GPU (NVIDIA), or MPS (Mac M1-M3). [default: auto] โ
โ --model -m TEXT Whisper model to use (List models via --list_models). [default: large-v2] โ
โ --lang -l TEXT Language of provided file(s) ("en", "de") (Default: auto-detection). [default: None] โ
โ --annotate -a Enable speaker annotation (Saves .rttm). โ
โ --hf_token -hf TEXT HuggingFace Access token required for speaker annotation. [default: None] โ
โ --translate -t Translate transcription to English. โ
โ --subtitle -s Create subtitles (Saves .srt, .vtt and .webvtt). โ
โ --sub_length INTEGER Subtitle segment length in words [default: 5] โ
โ --verbose -v Print text chunks during transcription. โ
โ --config PATH Path to configuration file. [default: None] โ
โ --list_filetypes List supported audio and video file types. โ
โ --list_models List available models. โ
โ --install-completion Install completion for the current shell. โ
โ --show-completion Show completion for the current shell, to copy it or customize the installation. โ
โ --help Show this message and exit. โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
Speaker annotation and diarization
Requirements
In order to annotate speakers using --annotate
you need to provide a valid HuggingFace access token using the --hf_token
option. Additionally, you must accept the terms and conditions for both version 3.0 and version 3.1 of the pyannote
segmentation model. For detailed instructions, refer to the Requirements section on the pyannote model page on HuggingFace.
How speaker annotation works
whisply
uses whisperX for speaker diarization and annotation. Instead of returning chunk-level timestamps like the standard Whisper
implementation whisperX
is able to return word-level timestamps as well as annotating speakers word by word, thus returning much more precise annotations.
Out of the box whisperX
will not provide timestamps for words containing only numbers (e.g. "1.5" or "2024"): whisply
fixes those instances through timestamp approximation. Other known limitations of whisperX
include:
- inaccurate speaker diarization if multiple speakers talk at the same time
- to provide word-level timestamps and annotations
whisperX
uses language specific alignment models; out of the boxwhisperX
supports these languages:en, fr, de, es, it, ja, zh, nl, uk, pt
.
Refer to the whisperX GitHub page for more information.
Batch processing
Instead of providing a file, folder or URL by using the --files
option you can pass a .list
with a mix of files, folders and URLs for processing.
Example:
$ cat my_files.list
video_01.mp4
video_02.mp4
./my_files/
https://youtu.be/KtOayYXEsN4?si=-0MS6KXbEWXA7dqo
Using config files for batch processing
You can provide a .json
config file by using the --config
option which makes batch processing easy. An example config looks like this:
{
"files": "./files/my_files.list", # Path to your files
"output_dir": "./transcriptions", # Output folder where transcriptions are saved
"device": "auto", # AUTO, GPU, MPS or CPU
"model": "large-v3-turbo", # Whisper model to use
"lang": null, # Null for auto-detection or language codes ("en", "de", ...)
"annotate": false, # Annotate speakers
"hf_token": "HuggingFace Access Token", # Your HuggingFace Access Token (needed for annotations)
"translate": false, # Translate to English
"subtitle": false, # Subtitle file(s)
"sub_length": 10, # Length of each subtitle block in number of words
"verbose": false # Print transcription segments while processing
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file whisply-0.3.tar.gz
.
File metadata
- Download URL: whisply-0.3.tar.gz
- Upload date:
- Size: 27.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 05cadfb606eb0b456ebff2dba151c9659ce55dc0b4737f6427959d18e2645ed6 |
|
MD5 | bc6caf7a4a2769696629116e3814bb3d |
|
BLAKE2b-256 | 65783a79c291655125b9466e24a39deb90f6e89c8af9191d62f625dd2e5c508e |
File details
Details for the file whisply-0.3-py3-none-any.whl
.
File metadata
- Download URL: whisply-0.3-py3-none-any.whl
- Upload date:
- Size: 26.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 723ad68cfba2344814f9947d01dc89921cc76e3ba8bc721e362ef3b0119fb3df |
|
MD5 | 196c1533a2e1c04141777617ae2893c7 |
|
BLAKE2b-256 | 4625629566559d44658d84a83212899995b4ff7eae9373282c960936e8a8e4af |