Let whisper models transcribe in realtime.
Project description
Whisper Realtime Transcriber
Overview
This repository contains the source code of a realtime transcriber for various whisper models, published on huggingface.
Prerequisites
Before you begin, make sure you meet the following prerequisites:
- Python 3.10.12 installed on your machine.
- Microphone connected to your machine.
Installation
- Install torch with CUDA support (optional)
- Follow the instructions here and install version >=2.4.0
- Install the package:
pip install --upgrade whisper-realtime-transcriber
Usage
After completing the installation, you can now use the transcriber:
# Necessary imports
import asyncio
from whisper_realtime_transcriber.InputStreamGenerator import InputStreamGenerator
from whisper_realtime_transcriber.WhisperModel import WhisperModel
from whisper_realtime_transcriber.RealtimeTranscriber import RealtimeTranscriber
# Standard way - all outputs get printed directly to the console.
inputstream_generator = InputStreamGenerator()
asr_model = WhisperModel(inputstream_generator)
transcriber = RealtimeTranscriber(inputstream_generator, asr_model)
asyncio.run(transcriber.execute_event_loop())
# Loading a custom model id
# When specifying model_id, model_size becomes obsolete
inputstream_generator = InputStreamGenerator()
asr_model = WhisperModel(inputstream_generator, model_id="openai/whisper-tiny")
# ... #
# Executing a custom function inside the RealtimeTranscriber.
def print_transcription(some_transcription):
print(some_transcription)
inputstream_generator = InputStreamGenerator()
asr_model = WhisperModel(inputstream_generator)
# specifying a function and setting continuous to False will allow one
# to execute a custom function during the event loop, that is doing something with the transcriptions
transcriber = RealtimeTranscriber(inputstream_generator, asr_model, continuous=False, func=print_transcription)
asyncio.run(transcriber.execute_event_loop())
Feel free to reach out if you encounter any issues or have questions!
How it works
- The transcriber consists of two modules: a Inputstream Generator and a Whisper Model.
- The implementation of the Inputstream Generator is based on this implemantation.
- The Inputstream Generator reads the microphone input and passes it to the Whisper Model. The Whisper Model then generates the transcription.
- This is happening in an async event loop so that the Whsiper Model can continuously generate transcriptions from the provided audio input, generated and processed by the Inputstream Generator.
- On a machine with a 12GB Nvidia RTX 3060 the distilled large-v3 model runs at a realtime-factor of about 0.4, this means 10s of audio input get transcribed in 4s - the longer the input the bigger is the realtime-factor.
ToDos
- Add functionality to transcribe from audio files.
- Get somehow rid of the hallucinations of the whisper models when no voice is active.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file whisper_realtime_transcriber-0.1.2.tar.gz
.
File metadata
- Download URL: whisper_realtime_transcriber-0.1.2.tar.gz
- Upload date:
- Size: 10.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.11.5 Darwin/23.6.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9b7b24d79e90ccc9cbc5d5f777f3baf323b288ea4f93a2b1376d8bafbb556174 |
|
MD5 | d639b09ecdee633dac92857a0ea37f53 |
|
BLAKE2b-256 | 5f5e936b3bc57143c9f4104cf0b2ac66b68e074f315d23edfb64074e2ec78dcb |
File details
Details for the file whisper_realtime_transcriber-0.1.2-py3-none-any.whl
.
File metadata
- Download URL: whisper_realtime_transcriber-0.1.2-py3-none-any.whl
- Upload date:
- Size: 12.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.11.5 Darwin/23.6.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f010a4fcc6d7c6a31b88077697180f7e705ebb25748acd6a35dd2d2b907c6e68 |
|
MD5 | 2a5286e5fb5b370e256deec13ad35deb |
|
BLAKE2b-256 | a9e99d5009027d6de64506d58ef6132b88215a63b28a867c5406a7caf16fc1e6 |