A Python package for speech transcription and speaker diarization with speaker matching functionality.
Reason this release was yanked:
in development
Project description
Speech Text Pipeline
speech_text_pipeline is a Python package that allows you to process audio files for automatic speech recognition (ASR), speaker diarization, and speaker matching. The package is designed to handle both cases with regular transcription with diarization and cases with transcription, diarization with speaker identification where one speaker’s identity is known and provided via an additional audio sample.
Installation
Prerequisites
Install the following dependencies before install speech_text_pipeline:
- datasets
- omegaconf
- pyannote.audio
- hydra-core
- git+https://github.com/openai/whisper.git
- git+https://github.com/NVIDIA/NeMo.git
Main Package
Once the prerequisite packages are installed, you can install speech_text_pipeline using pip:
pip install speech_text_pipeline
Usage
HF_TOKEN for Speaker Matching
Before using the package you need to have access to 🤗 HuggingFace pyannote/embedding model for speaker matching functionality. Follow steps to get access of the model:
-
Log in to your 🤗 Hugging Face account and visit pyannote/embedding model.
-
Request for access of the model(if not done already).
-
After getting access, generate your Hugging Face access token (HF_TOKEN) from Access Token tab in your account settings.
-
After generating token you use it in either of the two ways:
- CLI login:
huggingface-cli loginThen, input your
HF_TOKENwhen prompted.- In Code:
Pass your
HF_TOKENdirectly to the transcribe function as a parameter:import speech_text_pipeline as stp result = stp.transcribe(audio="path_to_audio_file.wav", speaker_audio="path_to_known_speaker_audio.wav", HF_TOKEN="Your HF_TOKEN")
Note: The Hugging Face token is only required for the speaker matching functionality.
Pipeline(anonymous speakers)
This mode generates a transcript with speaker diarization, assigning anonymous labels to speakers(e.g., "Speaker 1", "Speaker 2").
import speech_text_pipeline as stp
audio_url = "path_to_audio_file.wav"
result = stp.transcribe(audio=audio_url)
Get diarized transcript with anonymous speakers
print(result)
Pipeline(named speakers)
import speech_text_pipeline as stp
audio_url = "path_to_audio_file.wav"
agent_audio_url = "path_to_agent_audio.wav" # Sample of the known speaker
result_with_speaker = stp.transcribe(audio=audio_url,
speaker_audio=agent_audio_url,
HF_TOKEN="Your HF_TOKEN") # Passing your geenrated Hugging Face token
Get diarized transcript with named speaker
print(result_with_speaker)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file speech_text_pipeline-0.1.3.tar.gz.
File metadata
- Download URL: speech_text_pipeline-0.1.3.tar.gz
- Upload date:
- Size: 12.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.8.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e2839aad37e526ee21ca31eabef267cd789deae9d0d82e39e7438c079247e9b0
|
|
| MD5 |
b645efe2120d687246b9227a116dd121
|
|
| BLAKE2b-256 |
75caa0dd55a274621bfabe6da118dfaa677db1b27770a3888d1accef5444644d
|
File details
Details for the file speech_text_pipeline-0.1.3-py3-none-any.whl.
File metadata
- Download URL: speech_text_pipeline-0.1.3-py3-none-any.whl
- Upload date:
- Size: 11.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.8.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9cd475dda07b77757a5d4f727c590d78f4527c224a057bbf98ea78629dc2cd7d
|
|
| MD5 |
5339c2bfb36b9f7b9384f829eb5e4763
|
|
| BLAKE2b-256 |
042c4eb39fb2efe4fd48732d8962bc811cd729193ced5472da6c36bf5c6bf588
|