WhisperPlus: A Python library for WhisperPlus API.
Project description
🛠️ Installation
pip install whisperplus
🤗 Model Hub
You can find the models on the HuggingFace Spaces or on the HuggingFace Model Hub
🎙️ Usage
To use the whisperplus library, follow the steps below for different tasks:
🎵 Youtube URL to Audio
from whisperplus import SpeechToTextPipeline, download_and_convert_to_mp3
# Define the URL of the YouTube video that you want to convert to text.
url = "https://www.youtube.com/watch?v=di3rHkEZuUw"
# Initialize the Speech to Text Pipeline with the specified model.
audio_path = download_and_convert_to_mp3(url)
pipeline = SpeechToTextPipeline(model_id="openai/whisper-large-v3")
# Run the pipeline on the audio file.
transcript = pipeline(
audio_path=audio_path, model_id="openai/whisper-large-v3", language="english"
)
# Print the transcript of the audio.
print(transcript)
Summarization
from whisperplus.pipelines.summarization import TextSummarizationPipeline
summarizer = TextSummarizationPipeline(model_id="facebook/bart-large-cnn")
summary = summarizer.summarize(transcript)
print(summary[0]["summary_text"])
Speaker Diarization
from whisperplus import (
ASRDiarizationPipeline,
download_and_convert_to_mp3,
format_speech_to_dialogue,
)
audio_path = download_and_convert_to_mp3("https://www.youtube.com/watch?v=mRB14sFHw2E")
device = "cuda" # cpu or mps
pipeline = ASRDiarizationPipeline.from_pretrained(
asr_model="openai/whisper-large-v3",
diarizer_model="pyannote/speaker-diarization",
use_auth_token=False,
chunk_length_s=30,
device=device,
)
output_text = pipeline(audio_path, num_speakers=2, min_speaker=1, max_speaker=2)
dialogue = format_speech_to_dialogue(output_text)
print(dialogue)
Chat with Video - RAG
pip install -r dev-requirements
from wihsperplus.pipelines.chatbot import ChatWithVideo
input_file = "trascript.text"
llm_model_name = "TheBloke/Mistral-7B-v0.1-GGUF"
llm_model_file = "mistral-7b-v0.1.Q4_K_M.gguf"
llm_model_type = "mistral"
embedding_model_name = "sentence-transformers/all-MiniLM-L6-v2"
chat = ChatWithVideo(
input_file, llm_model_name, llm_model_file, llm_model_type, embedding_model_name
)
query = "what is this video about ?"
response = chat.run_query(query)
print(response)
Contributing
pip install -r dev-requirements.txt
pre-commit install
pre-commit run --all-files
📜 License
This project is licensed under the terms of the Apache License 2.0.
🤗 Acknowledgments
This project is based on the HuggingFace Transformers library.
🤗 Citation
@misc{radford2022whisper,
doi = {10.48550/ARXIV.2212.04356},
url = {https://arxiv.org/abs/2212.04356},
author = {Radford, Alec and Kim, Jong Wook and Xu, Tao and Brockman, Greg and McLeavey, Christine and Sutskever, Ilya},
title = {Robust Speech Recognition via Large-Scale Weak Supervision},
publisher = {arXiv},
year = {2022},
copyright = {arXiv.org perpetual, non-exclusive license}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
whisperplus-0.2.3.tar.gz
(17.9 kB
view hashes)