Skip to main content

Whisper with speaker diarization

Project description

Whisper-Run

Whisper-Run is a pip CLI tool for processing audio files using Whisper models with speaker diarization capabilities. The tool allows you to process audio files, select models for audio processing, and save the results in JSON format.

It uses the OpenAI-Whisper model implementation from OpenAI Whisper, based on the ctranslate2 library from faster-whisper, and pyannote's speaker-diarization-3.1. Check their documentation if needed.

Before You Start

You must confirm the licensing permissions of these two models:

From Hugging Face, get your Hugging Face auth token. You can put the token in your env file or pass it to the CLI as --hf_auth_token.

Installation

To install Whisper-Run, run the following command:

pip install whisper-run

Usage

You can call Whisper-Run from the command line using the following syntax:

whisper_run --file_path=<file_path>

Example

To process an audio file using the CPU and a specific file path:

whisper_run --device=cpu --file_path=test.wav

When you run the command, you'll be prompted to select a model for audio processing:

[?] Select a model for audio processing:
 > distil-large-v3
   distil-large-v2
   large-v3
   large-v2
   large
   medium
   small
   base
   tiny

Flags

  • --device: Specify the device to use for processing (e.g., cpu or cuda).
  • --file_path: Specify the path to the audio file you want to process.
  • --hf_auth_token: Optional. Pass the Hugging Face Auth Token or set the HF_AUTH_TOKEN environment variable.
  • --save: Optional. If set, the results will be saved to a JSON file.

Output

Results are printed to the terminal as a JSON object. If the --save flag is set, the results are also saved in the results directory as a JSON file. You can change the output format in the audio_processor.py file.

Programmatic Usage

You can also use Whisper-Run programmatically in your Python scripts. Below is a basic usage example demonstrating how to use the Whisper-Run library:

Example Script

from whisper_run import AudioProcessor

def main():
    processor = AudioProcessor("test.wav", "cpu",
                               model_name="large-v3",
                               hf_auth_token="your_hf_token",
                               save=True)
    processor.process()

if __name__ == "__main__":
    main()

Contributing

Contributions are welcome! Please open an issue or submit a pull request on GitHub.

License

This project is licensed under the Apache 2.0 License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

whisper-run-1.0.0.tar.gz (603.9 kB view details)

Uploaded Source

Built Distribution

whisper_run-1.0.0-py3-none-any.whl (11.3 kB view details)

Uploaded Python 3

File details

Details for the file whisper-run-1.0.0.tar.gz.

File metadata

  • Download URL: whisper-run-1.0.0.tar.gz
  • Upload date:
  • Size: 603.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.10.13

File hashes

Hashes for whisper-run-1.0.0.tar.gz
Algorithm Hash digest
SHA256 594da217d8f7fdad89122fab6fb4f7f1a0efd34ca141f3a504d2c435b4f6c3ad
MD5 0f8ac2387701fd3fdee1f421c730c152
BLAKE2b-256 86de59daef9cbddf79f66f8063fd938d7c29bc52e5e13d51fc779ab12514d4f0

See more details on using hashes here.

File details

Details for the file whisper_run-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: whisper_run-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 11.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.10.13

File hashes

Hashes for whisper_run-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3c7982b25ce480033912a7b3073f6fd312f7e11963c0c2142f0dbae8d24ecfb8
MD5 5acd1160a4b334ddfd2f7fc4a8421a52
BLAKE2b-256 48242442a8b0027adf6ccb08e4ab3e428bef36ec562d94cda0173ebf4264941d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page