Faster Whisper transcription with CTranslate2

These details have not been verified by PyPI

Project links

Homepage

Project description

Faster Whisper transcription with CTranslate2

This repository demonstrates how to implement the Whisper transcription using CTranslate2, which is a fast inference engine for Transformer models.

This implementation is up to 4 times faster than openai/whisper for the same accuracy while using less memory. The efficiency can be further improved with 8-bit quantization on both CPU and GPU.

Benchmark

For reference, here's the time and memory usage that are required to transcribe 13 minutes of audio using different implementations:

Large-v2 model on GPU

Implementation	Precision	Beam size	Time	Max. GPU memory	Max. CPU memory
openai/whisper	fp16	5	4m30s	11325MB	9439MB
faster-whisper	fp16	5	54s	4755MB	3244MB
faster-whisper	int8	5	59s	3091MB	3117MB

Executed with CUDA 11.7.1 on a NVIDIA Tesla V100S.

Small model on CPU

Implementation	Precision	Beam size	Time	Max. memory
openai/whisper	fp32	5	10m31s	3101MB
whisper.cpp	fp32	5	17m42s	1581MB
whisper.cpp	fp16	5	12m39s	873MB
faster-whisper	fp32	5	2m44s	1675MB
faster-whisper	int8	5	2m04s	995MB

Executed with 8 threads on a Intel(R) Xeon(R) Gold 6226R.

Installation

pip install -e .[conversion]

The model conversion requires the modules transformers and torch which are installed by the [conversion] requirement. Once a model is converted, these modules are no longer needed and the installation could be simplified to:

pip install -e .

It is also possible to install the module without cloning the Git repository:

# Install the master branch:
pip install "faster-whisper @ https://github.com/guillaumekln/faster-whisper/archive/refs/heads/master.tar.gz"

# Install a specific commit:
pip install "faster-whisper @ https://github.com/guillaumekln/faster-whisper/archive/a4f1cc8f11433e454c3934442b5e1a4ed5e865c3.tar.gz"

GPU support

GPU execution requires the NVIDIA libraries cuBLAS 11.x and cuDNN 8.x to be installed on the system. Please refer to the CTranslate2 documentation.

Usage

Model conversion

A Whisper model should be first converted into the CTranslate2 format. We provide a script to download and convert models from the Hugging Face model repository.

For example the command below converts the "large-v2" Whisper model and saves the weights in FP16:

ct2-transformers-converter --model openai/whisper-large-v2 --output_dir whisper-large-v2-ct2 \
    --copy_files tokenizer.json --quantization float16

If the option --copy_files tokenizer.json is not used, the tokenizer configuration is automatically downloaded when the model is loaded later.

Models can also be converted from the code. See the conversion API.

Transcription

from faster_whisper import WhisperModel

model_path = "whisper-large-v2-ct2/"

# Run on GPU with FP16
model = WhisperModel(model_path, device="cuda", compute_type="float16")

# or run on GPU with INT8
# model = WhisperModel(model_path, device="cuda", compute_type="int8_float16")
# or run on CPU with INT8
# model = WhisperModel(model_path, device="cpu", compute_type="int8")

segments, info = model.transcribe("audio.mp3", beam_size=5)

print("Detected language '%s' with probability %f" % (info.language, info.language_probability))

for segment in segments:
    print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))

Word-level timestamps

segments, _ = model.transcribe("audio.mp3", word_timestamps=True)

for segment in segments:
    for word in segment.words:
        print("[%.2fs -> %.2fs] %s" % (word.start, word.end, word.word))

See more model and transcription options in the WhisperModel class implementation.

Comparing performance against other implementations

If you are comparing the performance against other Whisper implementations, you should make sure to run the comparison with similar settings. In particular:

Verify that the same transcription options are used, especially the same beam size. For example in openai/whisper, model.transcribe uses a default beam size of 1 but here we use a default beam size of 5.
When running on CPU, make sure to set the same number of threads. Many frameworks will read the environment variable OMP_NUM_THREADS, which can be set when running your script:

OMP_NUM_THREADS=4 python3 my_script.py

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

1.2.1

Oct 31, 2025

1.2.0

Aug 6, 2025

1.1.1

Jan 1, 2025

1.1.0

Nov 21, 2024

1.0.3

Jul 1, 2024

1.0.2

May 6, 2024

1.0.1

Mar 1, 2024

1.0.0

Feb 22, 2024

0.10.1

Feb 22, 2024

0.10.0

Nov 26, 2023

0.9.0

Sep 18, 2023

0.8.0

Sep 4, 2023

0.7.1

Jul 24, 2023

0.7.0

Jul 18, 2023

0.6.0

May 24, 2023

0.5.1

Apr 26, 2023

0.5.0

Apr 25, 2023

0.4.1

Apr 4, 2023

0.4.0

Apr 3, 2023

0.3.0

Mar 24, 2023

This version

0.2.0

Mar 22, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

faster-whisper-0.2.0.tar.gz (17.2 kB view details)

Uploaded Mar 22, 2023 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

faster_whisper-0.2.0-py3-none-any.whl (16.4 kB view details)

Uploaded Mar 22, 2023 Python 3

File details

Details for the file faster-whisper-0.2.0.tar.gz.

File metadata

Download URL: faster-whisper-0.2.0.tar.gz
Upload date: Mar 22, 2023
Size: 17.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.10.6

File hashes

Hashes for faster-whisper-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`609ce86b521c762cb204552b6de4121ec6c1c297e7d46a90db29280ac1dd4cd5`
MD5	`654786fa060023c0621c592f3ed94f58`
BLAKE2b-256	`2eda1876106e4b0d9705e4864870eb351d7949d09d273c6f6582ba15c7157e5b`

See more details on using hashes here.

File details

Details for the file faster_whisper-0.2.0-py3-none-any.whl.

File metadata

Download URL: faster_whisper-0.2.0-py3-none-any.whl
Upload date: Mar 22, 2023
Size: 16.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.10.6

File hashes

Hashes for faster_whisper-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`380b2e17c30f60cdacf0a816cf9265ffb56fffdd099a13c442a6341efabe75e6`
MD5	`249b535159ce4d3c1a4951ca9b735607`
BLAKE2b-256	`2de45118b4cff04b993c3ce12f5f3cd82e7135ddd9c3fd8fe77f6b1d287c18ce`

See more details on using hashes here.

faster-whisper 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Faster Whisper transcription with CTranslate2

Benchmark

Large-v2 model on GPU

Small model on CPU

Installation

GPU support

Usage

Model conversion

Transcription

Word-level timestamps

Comparing performance against other implementations

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes