Skip to main content

An insanely fast whisper CLI

Project description

Insanely Fast Whisper

An opinionated CLI to transcribe Audio files w/ Whisper on-device! Powered by 🤗 Transformers, Optimum & flash-attn

TL;DR - Transcribe 150 minutes (2.5 hours) of audio in less than 98 seconds - with OpenAI's Whisper Large v3. Blazingly fast transcription is now a reality!⚡️

Not convinced? Here are some benchmarks we ran on a Nvidia A100 - 80GB 👇

Optimisation type Time to Transcribe (150 mins of Audio)
large-v3 (Transformers) (fp32) ~31 (31 min 1 sec)
large-v3 (Transformers) (fp16 + batching [24] + bettertransformer) ~5 (5 min 2 sec)
large-v3 (Transformers) (fp16 + batching [24] + Flash Attention 2) ~2 (1 min 38 sec)
distil-large-v2 (Transformers) (fp16 + batching [24] + bettertransformer) ~3 (3 min 16 sec)
distil-large-v2 (Transformers) (fp16 + batching [24] + Flash Attention 2) ~1 (1 min 18 sec)
large-v2 (Faster Whisper) (fp16 + beam_size [1]) ~9.23 (9 min 23 sec)
large-v2 (Faster Whisper) (8-bit + beam_size [1]) ~8 (8 min 15 sec)

P.S. We also ran the benchmarks on a Google Colab T4 GPU instance too!

🆕 Blazingly fast transcriptions via your terminal! ⚡️

We've added a CLI to enable fast transcriptions. Here's how you can use it:

Install insanely-fast-whisper with pipx (pip install pipx or brew install pipx):

pipx install insanely-fast-whisper

Run inference from any path on your computer:

insanely-fast-whisper --file-name <filename or URL>

🔥 You can run Whisper-large-v3 w/ Flash Attention 2 from this CLI too:

insanely-fast-whisper --file-name <filename or URL> --flash True 

🌟 You can run distil-whisper directly from this CLI too:

insanely-fast-whisper --model-name distil-whisper/large-v2 --file-name <filename or URL> 

Don't want to install insanely-fast-whisper? Just use pipx run:

pipx run insanely-fast-whisper --file-name <filename or URL>

[!NOTE] The CLI is highly opinionate and only works on NVIDIA GPUs & Mac. Make sure to check out the defaults and the list of options you can play around with to maximise your transcription throughput. Run insanely-fast-whisper --help or pipx run insanely-fast-whisper --help to get all the CLI arguments along with their defaults.

CLI Options

The insanely-fast-whisper repo provides an all round support for running Whisper in various settings. Note that as of today 26th Nov, insanely-fast-whisper works on both CUDA and mps (mac) enabled devices.

  -h, --help            show this help message and exit
  --file-name FILE_NAME
                        Path or URL to the audio file to be transcribed.
  --device-id DEVICE_ID
                        Device ID for your GPU. Just pass the device number when using CUDA, or "mps" for Macs with Apple Silicon. (default: "0")
  --transcript-path TRANSCRIPT_PATH
                        Path to save the transcription output. (default: output.json)
  --model-name MODEL_NAME
                        Name of the pretrained model/ checkpoint to perform ASR. (default: openai/whisper-large-v3)
  --task {transcribe,translate}
                        Task to perform: transcribe or translate to another language. (default: transcribe)
  --language LANGUAGE   
                        Language of the input audio. (default: "None" (Whisper auto-detects the language))
  --batch-size BATCH_SIZE
                        Number of parallel batches you want to compute. Reduce if you face OOMs. (default: 24)
  --flash FLASH         
                        Use Flash Attention 2. Read the FAQs to see how to install FA2 correctly. (default: False)
  --timestamp {chunk,word}
                        Whisper supports both chunked as well as word level timestamps. (default: chunk)

Frequently Asked Questions

How to correctly install flash-attn to make it work with insanely-fast-whisper?

Make sure to install it via pipx runpip insanely-fast-whisper install flash-attn --no-build-isolation. Massive kudos to @li-yifei for helping with this.

How to solve an AssertionError: Torch not compiled with CUDA enabled error on Windows?

The root cause of this problem is still unkown, however, you can resolve this by manually installing torch in the virtualenv like python -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121. Thanks to @pto2k for all tdebugging this.

How to avoid Out-Of-Memory (OOM) exceptions on Mac?

The mps backend isn't as optimised as CUDA, hence is way more memory hungry. Typically you can run with --batch-size 4 without any issues (should use roughly 12GB GPU VRAM). Don't forget to set --device mps.

How to use Whisper without a CLI?

All you need to run is the below snippet:
import torch
from transformers import pipeline

pipe = pipeline(
    "automatic-speech-recognition",
    model=args.model_name,
    torch_dtype=torch.float16,
    device="cuda", # or mps for Mac devices
    model_kwargs={"use_flash_attention_2": True}, # set to False for old GPUs
)

pipe.model = pipe.model.to_bettertransformer() # only if `use_flash_attention_2` is set to False

outputs = pipe("<FILE_NAME>",
               chunk_length_s=30,
               batch_size=24,
               return_timestamps=True)

outputs

Acknowledgements

  1. OpenAI Whisper team for open sourcing such a brilliant check point.
  2. Hugging Face Transformers team, specifically Arthur, Patrick, Sanchit & Yoach (alphabetical order) for continuing to maintain Whisper in Transformers.
  3. Hugging Face Optimum team for making the BetterTransformer API so easily accessible.
  4. Patrick Arminio for helping me tremendously to put together this CLI.

Community showcase

  1. @ochen1 created a brilliant MVP for a CLI here: https://github.com/ochen1/insanely-fast-whisper-cli (Try it out now!)
  2. @arihanv created a an app (Shush) using NextJS (Frontend) & Modal (Backend): https://github.com/arihanv/Shush (Check it outtt!)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

insanely_fast_whisper-0.0.9.tar.gz (10.5 kB view details)

Uploaded Source

Built Distribution

insanely_fast_whisper-0.0.9-py3-none-any.whl (12.3 kB view details)

Uploaded Python 3

File details

Details for the file insanely_fast_whisper-0.0.9.tar.gz.

File metadata

  • Download URL: insanely_fast_whisper-0.0.9.tar.gz
  • Upload date:
  • Size: 10.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: pdm/2.10.3 CPython/3.8.10

File hashes

Hashes for insanely_fast_whisper-0.0.9.tar.gz
Algorithm Hash digest
SHA256 d7eb2be949ce75a6f31448cb3e5db35b5a2012ac9a2037232fa7c1cf9d7753fa
MD5 3063bccfc2ee71c2184df702e89019df
BLAKE2b-256 f2434db4a313d0e242f91765b2c9a12d0834aa996141d05c36622a1bbd10a20c

See more details on using hashes here.

File details

Details for the file insanely_fast_whisper-0.0.9-py3-none-any.whl.

File metadata

File hashes

Hashes for insanely_fast_whisper-0.0.9-py3-none-any.whl
Algorithm Hash digest
SHA256 89dbe3743cbb9151f4a4c4f34ecb93b196853cda99959957cf539d8eeccec025
MD5 f55834a6a0b96cc3f4602568a060eebe
BLAKE2b-256 4fd4e99b1a696a7584ad5c3e98eade2f48deafb93f70d917fd461dd988fa8d16

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page