Whisper with speaker diarization
Project description
Whisper-Run
Whisper-Run is a pip CLI tool for processing audio files using Whisper models with speaker diarization capabilities. The tool allows you to process audio files, select models for audio processing, and save the results in JSON format.
It uses the OpenAI-Whisper model implementation from OpenAI Whisper, based on the ctranslate2 library from faster-whisper, and pyannote's speaker-diarization-3.1. Check their documentation if needed.
Before You Start
You must confirm the licensing permissions of these two models:
From Hugging Face, get your Hugging Face auth token. You can put the token in your env file or pass it to the CLI as --hf_auth_token
.
Installation
To install Whisper-Run, run the following command:
pip install whisper-run
Usage
You can call Whisper-Run from the command line using the following syntax:
whisper_run --file_path=<file_path>
Example
To process an audio file using the CPU and a specific file path:
whisper_run --device=cpu --file_path=test.wav
When you run the command, you'll be prompted to select a model for audio processing:
[?] Select a model for audio processing:
> distil-large-v3
distil-large-v2
large-v3
large-v2
large
medium
small
base
tiny
Flags
--device
: Specify the device to use for processing (e.g.,cpu
orcuda
).--file_path
: Specify the path to the audio file you want to process.--hf_auth_token
: Optional. Pass the Hugging Face Auth Token or set theHF_AUTH_TOKEN
environment variable.--save
: Optional. If set, the results will be saved to a JSON file.
Output
Results are printed to the terminal as a JSON object. If the --save
flag is set, the results are also saved in the results directory as a JSON file. You can change the output format in the audio_processor.py
file.
Programmatic Usage
You can also use Whisper-Run programmatically in your Python scripts. Below is a basic usage example demonstrating how to use the Whisper-Run library:
Example Script
from whisper_run import AudioProcessor
def main():
processor = AudioProcessor("test.wav", "cpu",
model_name="large-v3",
hf_auth_token="your_hf_token",
save=True)
processor.process()
if __name__ == "__main__":
main()
Contributing
Contributions are welcome! Please open an issue or submit a pull request on GitHub.
License
This project is licensed under the Apache 2.0 License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file whisper-run-1.0.0.tar.gz
.
File metadata
- Download URL: whisper-run-1.0.0.tar.gz
- Upload date:
- Size: 603.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.10.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 594da217d8f7fdad89122fab6fb4f7f1a0efd34ca141f3a504d2c435b4f6c3ad |
|
MD5 | 0f8ac2387701fd3fdee1f421c730c152 |
|
BLAKE2b-256 | 86de59daef9cbddf79f66f8063fd938d7c29bc52e5e13d51fc779ab12514d4f0 |
File details
Details for the file whisper_run-1.0.0-py3-none-any.whl
.
File metadata
- Download URL: whisper_run-1.0.0-py3-none-any.whl
- Upload date:
- Size: 11.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.10.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3c7982b25ce480033912a7b3073f6fd312f7e11963c0c2142f0dbae8d24ecfb8 |
|
MD5 | 5acd1160a4b334ddfd2f7fc4a8421a52 |
|
BLAKE2b-256 | 48242442a8b0027adf6ccb08e4ab3e428bef36ec562d94cda0173ebf4264941d |