Skip to main content

Faster Whisper transcription with CTranslate2

Project description

CI PyPI version

Faster Whisper transcription with CTranslate2

faster-whisper is a reimplementation of OpenAI's Whisper model using CTranslate2, which is a fast inference engine for Transformer models.

This implementation is up to 4 times faster than openai/whisper for the same accuracy while using less memory. The efficiency can be further improved with 8-bit quantization on both CPU and GPU.

Benchmark

For reference, here's the time and memory usage that are required to transcribe 13 minutes of audio using different implementations:

Large-v2 model on GPU

Implementation Precision Beam size Time Max. GPU memory Max. CPU memory
openai/whisper fp16 5 4m30s 11325MB 9439MB
faster-whisper fp16 5 54s 4755MB 3244MB
faster-whisper int8 5 59s 3091MB 3117MB

Executed with CUDA 11.7.1 on a NVIDIA Tesla V100S.

Small model on CPU

Implementation Precision Beam size Time Max. memory
openai/whisper fp32 5 10m31s 3101MB
whisper.cpp fp32 5 17m42s 1581MB
whisper.cpp fp16 5 12m39s 873MB
faster-whisper fp32 5 2m44s 1675MB
faster-whisper int8 5 2m04s 995MB

Executed with 8 threads on a Intel(R) Xeon(R) Gold 6226R.

Installation

The module can be installed from PyPI:

pip install faster-whisper

Other installation methods:

# Install the master branch:
pip install --force-reinstall "faster-whisper @ https://github.com/guillaumekln/faster-whisper/archive/refs/heads/master.tar.gz"

# Install a specific commit:
pip install --force-reinstall "faster-whisper @ https://github.com/guillaumekln/faster-whisper/archive/a4f1cc8f11433e454c3934442b5e1a4ed5e865c3.tar.gz"

# Install for development:
git clone https://github.com/guillaumekln/faster-whisper.git
pip install -e faster-whisper/

GPU support

GPU execution requires the NVIDIA libraries cuBLAS 11.x and cuDNN 8.x to be installed on the system. Please refer to the CTranslate2 documentation.

Usage

Transcription

from faster_whisper import WhisperModel

model_size = "large-v2"

# Run on GPU with FP16
model = WhisperModel(model_size, device="cuda", compute_type="float16")

# or run on GPU with INT8
# model = WhisperModel(model_size, device="cuda", compute_type="int8_float16")
# or run on CPU with INT8
# model = WhisperModel(model_size, device="cpu", compute_type="int8")

segments, info = model.transcribe("audio.mp3", beam_size=5)

print("Detected language '%s' with probability %f" % (info.language, info.language_probability))

for segment in segments:
    print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))

Word-level timestamps

segments, _ = model.transcribe("audio.mp3", word_timestamps=True)

for segment in segments:
    for word in segment.words:
        print("[%.2fs -> %.2fs] %s" % (word.start, word.end, word.word))

See more model and transcription options in the WhisperModel class implementation.

Model conversion

When loading a model from its size such as WhisperModel("large-v2"), the correspondig CTranslate2 model is automatically downloaded from the Hugging Face Hub.

We also provide a script to convert any Whisper models compatible with the Transformers library. They could be the original OpenAI models or user fine-tuned models.

For example the command below converts the original "large-v2" Whisper model and saves the weights in FP16:

pip install transformers[torch]>=4.23

ct2-transformers-converter --model openai/whisper-large-v2 --output_dir whisper-large-v2-ct2 \
    --copy_files tokenizer.json --quantization float16
  • The option --model accepts a model name on the Hub or a path to a model directory.
  • If the option --copy_files tokenizer.json is not used, the tokenizer configuration is automatically downloaded when the model is loaded later.

Models can also be converted from the code. See the conversion API.

Comparing performance against other implementations

If you are comparing the performance against other Whisper implementations, you should make sure to run the comparison with similar settings. In particular:

  • Verify that the same transcription options are used, especially the same beam size. For example in openai/whisper, model.transcribe uses a default beam size of 1 but here we use a default beam size of 5.
  • When running on CPU, make sure to set the same number of threads. Many frameworks will read the environment variable OMP_NUM_THREADS, which can be set when running your script:
OMP_NUM_THREADS=4 python3 my_script.py

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

faster-whisper-0.3.0.tar.gz (18.6 kB view details)

Uploaded Source

Built Distribution

faster_whisper-0.3.0-py3-none-any.whl (17.5 kB view details)

Uploaded Python 3

File details

Details for the file faster-whisper-0.3.0.tar.gz.

File metadata

  • Download URL: faster-whisper-0.3.0.tar.gz
  • Upload date:
  • Size: 18.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.11.2

File hashes

Hashes for faster-whisper-0.3.0.tar.gz
Algorithm Hash digest
SHA256 bb5d688a370300ff09bb409e57457d0826383977406ee392ed6638474d75af53
MD5 07a26d6bdb48172c71a7224b0eecb243
BLAKE2b-256 4a92a3e75c399c0ba7f3a3c4d554744bacae45390fde65726ad84adee0f16a43

See more details on using hashes here.

File details

Details for the file faster_whisper-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: faster_whisper-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 17.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.11.2

File hashes

Hashes for faster_whisper-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 76c4ac83c65ec214fbd8f8c837a3de666c8585d6892c3468e88ed34374b01641
MD5 849420029696ffccabea3a7e8fb45d11
BLAKE2b-256 34cae9d4de171b0ff00be8c3d0ec6274d2b8d0f03eb943cd8228b4819156da39

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page