Skip to main content

Let whisper models transcribe in realtime.

Project description

Whisper Realtime Transcriber

Overview

This repository contains the source code of a realtime transcriber for various whisper models, published on huggingface.

Prerequisites

Before you begin, make sure you meet the following prerequisites:

  • Python 3.10.12 installed on your machine.
  • Microphone connected to your machine.

Installation

  1. Install torch with CUDA support (optional)
  • Follow the instructions here and install version >=2.4.0
  1. Install the package:
    pip install --upgrade whisper-realtime-transcriber
    

Usage

After completing the installation, you can now use the transcriber:

  • Necessary imports
import asyncio

from whisper_realtime_transcriber.InputStreamGenerator import InputStreamGenerator
from whisper_realtime_transcriber.WhisperModel import WhisperModel
from whisper_realtime_transcriber.RealtimeTranscriber import RealtimeTranscriber
  • Standard way - model and generator are initialized by the RealtimeTranscriber and all outputs get printed directly to the console.
transcriber = RealtimeTranscriber()

asyncio.run(transcriber.execute_event_loop())
  • Loading a different model than the ones provided.
asr_model = WhisperModel(inputstream_generator, model_id="openai/whisper-tiny")

transcriber = RealtimeTranscriber(asr_model=asr_model)

asyncio.run(transcriber.execute_event_loop())
  • Executing a custom function inside the RealtimeTranscriber.
def print_transcription(some_transcription):
  print(some_transcription)

# Specifying a function will set continuous to False - this will allow one
# to execute a custom function during the coroutine, that is doing something with the transcriptions.
# After the function finished it's work the coroutine will restart.
transcriber = RealtimeTranscriber(func=print_transcription)
  
asyncio.run(transcriber.execute_event_loop())
  • Loading the InputStreamGenerator and/or Whisper Model with custom values.
inputstream_generator = InputStreamGenerator(samplerate=8000, blocksize=2000, min_chunks=2)
asr_model = WhisperModel(inputstream_generator, model_size="large-v3", device="cuda")

transcriber = RealtimeTranscriber(inputstream_generator, asr_model)

asyncio.run(transcriber.execute_event_loop())

Feel free to reach out if you encounter any issues or have questions!

How it works

  • The transcriber consists of two modules: a Inputstream Generator and a Whisper Model.
  • The implementation of the Inputstream Generator is based on this implemantation.
  • The Inputstream Generator reads the microphone input and passes it to the Whisper Model. The Whisper Model then generates the transcription.
  • This is happening in an async event loop so that the Whsiper Model can continuously generate transcriptions from the provided audio input, generated and processed by the Inputstream Generator.
  • On a machine with a 12GB Nvidia RTX 3060 the distilled large-v3 model runs at a realtime-factor of about 0.4, this means 10s of audio input get transcribed in 4s - the longer the input the bigger is the realtime-factor.

ToDos

  • Add functionality to transcribe from audio files.
  • Get somehow rid of the hallucinations of the whisper models when no voice is active.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

whisper_realtime_transcriber-0.1.3.tar.gz (11.0 kB view details)

Uploaded Source

Built Distribution

File details

Details for the file whisper_realtime_transcriber-0.1.3.tar.gz.

File metadata

File hashes

Hashes for whisper_realtime_transcriber-0.1.3.tar.gz
Algorithm Hash digest
SHA256 b505aba6d282af4306804272f7b60f16418aec7cb96e179587103047b5785638
MD5 883320cbd34c8fcd0716c182ab413548
BLAKE2b-256 d29847bd56fecef1d9df99e2ed90e52f6f91eb8e8289af8c979447cc0eaeafb4

See more details on using hashes here.

File details

Details for the file whisper_realtime_transcriber-0.1.3-py3-none-any.whl.

File metadata

File hashes

Hashes for whisper_realtime_transcriber-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 ff1e390416557ebc55c0c9894f2a8cb286708d83735b36377e5dbf8858b1978c
MD5 c9441d2c4dac5eb0b0c1cf48adcd8952
BLAKE2b-256 b9f89c6591b847f51b09b3742c7cee8b849690f6fcc3663b04e2f40dea2791d7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page