Faster Whisper transcription with CTranslate2 with Live Capabilities for Edge Devices

These details have not been verified by PyPI

Project description

ScaledgeWhisper - Live Transcription & Translation for Edge Devices

ScaledgeWhisper is a Python package designed to provide live transcription and translation capabilities using Whisper models, optimized for edge devices like Raspberry Pi. Built on top of Faster Whisper, this package allows users to transcribe or translate audio in real-time, with support for multiple languages and customizable settings.

For more details on the Faster Whisper implementation and setup, please refer to the Faster Whisper README.md.

Features

Live Transcription & Translation: Supports both live audio transcription and translation (from any language to English) in real-time.
Optimized for Edge Devices: Designed to run efficiently on devices like Raspberry Pi, using small-sized Whisper models.
Language Detection & Support: Automatic language detection for transcription and support for English and Auto language modes for live tasks.
Customizability: Offers multiple configuration options such as saving recordings, transcription, and customizing file names.
Cross-Platform: Works across multiple platforms with automatic device selection (CPU, CUDA).

Installation

Prerequisites

Python 3.9+
torch installed (with proper support for your device)
ctranlate2 and other dependencies installed

You can install the package via pip:

pip install ScaledgeWhisper

Alternatively, you can install it from source:

git clone https://github.com/ScaledgeTechnology/ScaledgeWhisper.git
cd ScaledgeWhisper
pip install -e .

Usage

You can use ScaledgeWhisper via the command line interface (CLI). Here’s a breakdown of how to use it.

Command-line Options

usage: scaledgewhisper   [-h] [--list_models] [--available_languages] [--model]
                  [--live] [--live_language] [--language] [--info]
                  [--save_recording] [--save_transcription]
                  [--save_location] [--recording_name]
                  [--transcription_name] [--full_prediction]
                  [--chunk_size] [--num_threads]
                  [--info] task [paths ...]

task (required): Choose between transcribe or translate to specify the task.
path (optional): Path to the audio file for non-live tasks (required only for non-live tasks).
--available_languages: Returns a list of all the available language codes along with their names.
--list_models: Returns a list of all the available Whisper models to choose from if needed.
--info: Provides several information on given audio files (requires input audio file paths).
--live: Enable live transcription or translation (requires a microphone input).
--live_language: Set the language (English, Auto) for live transcription or translation (default is English).
--language: Set the language for non-live tasks (default is autodetect).
--model: Specify which Whisper model to use (default, edge, model name or custom model path).
--save_recording: Whether to save the audio recording after the live task.
--save_transcription: Whether to save the transcription after the live task.
--save_location: Directory to save the files if --save_recording or --save_transcription is enabled (default is cwd)
--recording_name: Custom name for the saved audio recording file (default is saved as full_recording.wav)
--transcription_name: Custom name for the transcription file (default is saved as transcriptions.json)
--full_prediction: Perform transcription or translation on the entire audio at the end
--chunk_size: Size of audio chunks (in samples per second) to process. Higher values improve accuracy but increase latency. Default is 32000.
--num_threads: Number of threads for parallel processing. Minimum is 2. Default is 4.

Example Usage

Live Transcription:

To transcribe audio from the microphone in real-time:

scaledgewhisper transcribe --live --model edge --live_language english

Live Translation:

To translate audio from any language to English in real-time:
```
scaledgewhisper translate --live --model edge --live_language english
```

Non-Live Transcription:

For transcribing a pre-recorded audio file:

scaledgewhisper transcribe /path/to/audio/file_1.wav /path/to/audio/file_2.wav --model default --language en

Non-Live Translation:

To translate a pre-recorded audio file into English:

scaledgewhisper transcribe /path/to/audio/file_1.wav /path/to/audio/file_2.wav --model edge --language multi

Listing Available Models:

To list all available Whisper models:
```
scaledgewhisper --list_models
```
Listing Available Languages:

To list all available languages for live and non-live tasks:
```
scaledgewhisper --available_languages
```

Saving Recording:

To save the final recording for live tasks:

scaledgewhisper transcribe --live --save_recording --recording_name /path/for/your/recording.wav

Saving Transcription:

To save the final recording for live tasks, path will be cwd and transcripton.json by default:
```
scaledgewhisper transcribe --live --save_transcription /cwd/your/transcription.json
```
Getting Full Prediction:

To get full prediction on your recorded audio:
```
scaledgewhisper transcribe --live --full_prediction
```
Getting info on audio files:

To get information on audio files such as language, language probability etc:
```
scaledgewhisper /path/to/audio/file_1.wav /path/to/audio/file_2.wav --info
```

RealTime Class

The core class for live transcription and translation is RealTime. It handles both real-time transcription and translation, making use of the keyboard library to start and stop recording using hotkeys.

Example Code:

from ScaledgeWhisper import RealTime

# Initialize RealTime with edge model and auto device selection
rstt = RealTime(model_size_or_path="edge", device="auto")

# Start live transcription
rstt.transcribe(
    task="transcribe",  # or "translate"
    language="English",  # Set language for transcription
    save_recording=True,
    save_transcription=True,
    save_location=None,   # saved_data by default
    recording_name="live_recording.wav",
    transcription_name="live_transcription.json"
    return_output=False
)

Development

Running Tests

ScaledgeWhisper comes with a suite of unit tests to verify its functionality. You can run the tests using pytest:

pytest tests/

Contributing

Feel free to open issues or submit pull requests for bug fixes or new features. To contribute, please fork the repository and submit a pull request.

Make sure the existing tests are still passing (and consider adding new tests as well!):

pytest tests/

Reformat and validate the code with the following tools:

black .
isort .
flake8 .

License

This package is open-source and available under the MIT License.

Notes

Ensure that the audio input and desired task settings align with the expected functionality for the best results.
For live tasks, make sure you have a microphone set up and accessible.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

1.1.4

Apr 23, 2025

1.1.3

Apr 23, 2025

This version

1.1.2

Mar 4, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scaledgewhisper-1.1.2.tar.gz (7.6 MB view details)

Uploaded Mar 4, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

scaledgewhisper-1.1.2-py3-none-any.whl (7.6 MB view details)

Uploaded Mar 4, 2025 Python 3

File details

Details for the file scaledgewhisper-1.1.2.tar.gz.

File metadata

Download URL: scaledgewhisper-1.1.2.tar.gz
Upload date: Mar 4, 2025
Size: 7.6 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.6

File hashes

Hashes for scaledgewhisper-1.1.2.tar.gz
Algorithm	Hash digest
SHA256	`9828590911e3bfab350793b97efecf2d3868a8f858fa4b942f0cf3c5934a0996`
MD5	`79de2e348992d014231fb7485592f0ea`
BLAKE2b-256	`7c4f6c469c36c7773cc50473f58a85bc3b82aec96768b98bcdbf5c5eb01e6f71`

See more details on using hashes here.

File details

Details for the file scaledgewhisper-1.1.2-py3-none-any.whl.

File metadata

Download URL: scaledgewhisper-1.1.2-py3-none-any.whl
Upload date: Mar 4, 2025
Size: 7.6 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.6

File hashes

Hashes for scaledgewhisper-1.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`563db78d90a7cebdf9d0a922f8607810290de3cecc5d06b5b1c11c20c77ad9dd`
MD5	`71543980074b35b7dc59bfcc09f131e4`
BLAKE2b-256	`e5fa10d2ba845521d4826452cec02e2ad0fbac42332b5c6b7ec65ae872a4cf48`

See more details on using hashes here.

ScaledgeWhisper 1.1.2

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

ScaledgeWhisper - Live Transcription & Translation for Edge Devices

Features

Installation

Prerequisites

Usage

Command-line Options

Example Usage

RealTime Class

Example Code:

Development

Running Tests

Contributing

License

Notes

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes