WhisperPlus: A Python library for WhisperPlus API.
Project description
🛠️ Installation
pip install whisperplus
🤗 Model Hub
You can find the models on the HuggingFace Spaces or on the HuggingFace Model Hub
🎙️ Usage
To use the whisperplus library, follow the steps below for different tasks:
🎵 Youtube URL to Audio
from whisperplus import SpeechToTextPipeline, download_and_convert_to_mp3
# Define the URL of the YouTube video that you want to convert to text.
url = "https://www.youtube.com/watch?v=di3rHkEZuUw"
# Initialize the Speech to Text Pipeline with the specified model.
audio_path = download_and_convert_to_mp3(url)
pipeline = SpeechToTextPipeline(model_id="openai/whisper-large-v3")
# Run the pipeline on the audio file.
transcript = pipeline(
audio_path=audio_path, model_id="openai/whisper-large-v3", language="english"
)
# Print the transcript of the audio.
print(transcript)
Summarization
from whisperplus.pipelines.summarization import TextSummarizationPipeline
summarizer = TextSummarizationPipeline(model_id="facebook/bart-large-cnn")
summary = summarizer.summarize(transcript)
print(summary[0]["summary_text"])
Speaker Diarization
from whisperplus import (
ASRDiarizationPipeline,
download_and_convert_to_mp3,
format_speech_to_dialogue,
)
audio_path = download_and_convert_to_mp3("https://www.youtube.com/watch?v=mRB14sFHw2E")
device = "cuda" # cpu or mps
pipeline = ASRDiarizationPipeline.from_pretrained(
asr_model="openai/whisper-large-v3",
diarizer_model="pyannote/speaker-diarization",
use_auth_token=False,
chunk_length_s=30,
device=device,
)
output_text = pipeline(audio_path, num_speakers=2, min_speaker=1, max_speaker=2)
dialogue = format_speech_to_dialogue(output_text)
print(dialogue)
Chat with Video - RAG
pip install -r dev-requirements
from wihsperplus.pipelines.chatbot import ChatWithVideo
input_file = "trascript.text"
llm_model_name = "TheBloke/Mistral-7B-v0.1-GGUF"
llm_model_file = "mistral-7b-v0.1.Q4_K_M.gguf"
llm_model_type = "mistral"
embedding_model_name = "sentence-transformers/all-MiniLM-L6-v2"
chat = ChatWithVideo(
input_file, llm_model_name, llm_model_file, llm_model_type, embedding_model_name
)
query = "what is this video about ?"
response = chat.run_query(query)
print(response)
Contributing
pip install -r dev-requirements.txt
pre-commit install
pre-commit run --all-files
📜 License
This project is licensed under the terms of the Apache License 2.0.
🤗 Acknowledgments
This project is based on the HuggingFace Transformers library.
🤗 Citation
@misc{radford2022whisper,
doi = {10.48550/ARXIV.2212.04356},
url = {https://arxiv.org/abs/2212.04356},
author = {Radford, Alec and Kim, Jong Wook and Xu, Tao and Brockman, Greg and McLeavey, Christine and Sutskever, Ilya},
title = {Robust Speech Recognition via Large-Scale Weak Supervision},
publisher = {arXiv},
year = {2022},
copyright = {arXiv.org perpetual, non-exclusive license}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
No source distribution files available for this release.See tutorial on generating distribution archives.
Built Distribution
Close
Hashes for whisperplus-0.2.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 35f7a6c1d4d60a425332c3cc4161e505bdc220294ecb8c72f7883f3ea29ad7a6 |
|
MD5 | 8be66498d6fe1974f9da6be1223f93eb |
|
BLAKE2b-256 | 54b31fd320af68496259a52212f619dbb01277a530405b954caef9d15ba47cb6 |