Skip to main content

A very fast whisper CLI

Project description

Insanely Fast Whisper

Powered by 🤗 Transformers & Optimum

TL;DR - Transcribe 300 minutes (5 hours) of audio in less than 10 minutes - with OpenAI's Whisper Large v2. Blazingly fast transcription is now a reality!⚡️

Basically all you need to do is this:

import torch
from transformers import pipeline

pipe = pipeline("automatic-speech-recognition",
                "openai/whisper-large-v2",
                torch_dtype=torch.float16,
                device="cuda:0")

pipe.model = pipe.model.to_bettertransformer()

outputs = pipe("<FILE_NAME>",
               chunk_length_s=30,
               batch_size=24,
               return_timestamps=True)

outputs["text"]

Not convinced? Here are some benchmarks we ran on a free Google Colab T4 GPU! 👇

Optimisation type Time to Transcribe (150 mins of Audio)
Transformers (fp32) ~31 (31 min 1 sec)
Transformers (fp32 + batching [8]) ~13 (13 min 19 sec)
Transformers (fp16 + batching [16]) ~6 (6 min 13 sec)
Transformers (fp16 + batching [16] + bettertransformer) ~5.42 (5 min 42 sec)
Transformers (fp16 + batching [24] + bettertransformer) ~5 (5 min 2 sec)
Transformers (distil-whisper) (fp16 + batching [24] + bettertransformer) ~3 (3 min 16 sec)
Faster Whisper (fp16 + beam_size [1]) ~9.23 (9 min 23 sec)
Faster Whisper (8-bit + beam_size [1]) ~8 (8 min 15 sec)

🆕 You can now access blazingly fast transcriptions via your terminal! ⚡️

We've added a CLI to enable fast transcriptions. Here's how you can use it:

Transcribe your audio

pipx run insanely-fast-whisper --file_name <filename or URL>

Note: The CLI is opinionated and currently only works for Nvidia GPUs. Make sure to check out the defaults and the list of options you can play around with to maximise your transcription throughput. Run python transcribe.py --help to get all the CLI arguments.

How does this all work?

Here-in, we'll dive into optimisations that can make Whisper faster for fun and profit! Our goal is to be able to transcribe a 2-3 hour long audio in the fastest amount of time possible. We'll start with the most basic usage and work our way up to make it fast!

The only fitting test audio to use for our benchmark would be Lex interviewing Sam Altman. We'll use the audio file corresponding to his podcast. I uploaded it on a wee dataset on the hub here.

Installation

pip install -q --upgrade torch torchvision torchaudio
pip install -q git+https://github.com/huggingface/transformers
pip install -q accelerate optimum
pip install -q ipython-autotime

Let's download the audio file corresponding to the podcast.

wget https://huggingface.co/datasets/reach-vb/random-audios/resolve/main/sam_altman_lex_podcast_367.flac

Base Case

import torch
from transformers import pipeline

pipe = pipeline("automatic-speech-recognition",
                "openai/whisper-large-v2",
                device="cuda:0")
outputs = pipe("sam_altman_lex_podcast_367.flac",
               chunk_length_s=30,
               return_timestamps=True)

outputs["text"][:200]

Sample output:

We have been a misunderstood and badly mocked org for a long time. When we started, we announced the org at the end of 2015 and said we were going to work on AGI, people thought we were batshit insan

Time to transcribe the entire podcast: 31min 1s

Batching

outputs = pipe("sam_altman_lex_podcast_367.flac",
               chunk_length_s=30,
               batch_size=8,
               return_timestamps=True)

outputs["text"][:200]

Time to transcribe the entire podcast: 13min 19s

Half-Precision

pipe = pipeline("automatic-speech-recognition",
                "openai/whisper-large-v2",
                torch_dtype=torch.float16,
                device="cuda:0")
outputs = pipe("sam_altman_lex_podcast_367.flac",
               chunk_length_s=30,
               batch_size=16,
               return_timestamps=True)

outputs["text"][:200]

Time to transcribe the entire podcast: 6min 13s

BetterTransformer w/ Optimum

pipe = pipeline("automatic-speech-recognition",
                "openai/whisper-large-v2",
                torch_dtype=torch.float16,
                device="cuda:0")

pipe.model = pipe.model.to_bettertransformer()
outputs = pipe("sam_altman_lex_podcast_367.flac",
               chunk_length_s=30,
               batch_size=24,
               return_timestamps=True)

outputs["text"][:200]

Time to transcribe the entire podcast: 5min 2s

Roadmap

  • Add benchmarks for Whisper.cpp
  • Add benchmarks for 4-bit inference
  • Add a light CLI script
  • Deployment script with Inference API

Community showcase

@ochen1 created a brilliant MVP for a CLI here: https://github.com/ochen1/insanely-fast-whisper-cli (Try it out now!)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

insanely_fast_whisper-0.0.2.tar.gz (3.6 kB view details)

Uploaded Source

Built Distribution

insanely_fast_whisper-0.0.2-py3-none-any.whl (4.4 kB view details)

Uploaded Python 3

File details

Details for the file insanely_fast_whisper-0.0.2.tar.gz.

File metadata

File hashes

Hashes for insanely_fast_whisper-0.0.2.tar.gz
Algorithm Hash digest
SHA256 ed31d9fcc11e143150e29c5c3aef38bc1f30a467e8db736f9e3bc54fe143f05e
MD5 5d76469094905118891dc8ee19469f7e
BLAKE2b-256 f3cab750aa16fe77e3caa1a31c7736aa2d9a67bc462b064afa39edad088855f4

See more details on using hashes here.

File details

Details for the file insanely_fast_whisper-0.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for insanely_fast_whisper-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 8909830812dc69619721ce82b8a3ee189cb0644909eada18eb1018d81867121b
MD5 c45c47b02f4945941cf5f8c56d4555d0
BLAKE2b-256 86637f903b21b6589ea0ca7ee0fb13cb35b977ac08660923cf3c3f3ed46108a2

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page