A Python package for speech transcription and speaker diarization with speaker matching functionality.

These details have not been verified by PyPI

Project links

Project description

Speech Text Pipeline

speech_text_pipeline is a Python package that allows you to process audio files for automatic speech recognition (ASR), speaker diarization, and speaker matching. The package is designed to handle both cases with regular transcription with diarization and cases with transcription, diarization with speaker identification where one speaker’s identity is known and provided via an additional audio sample.

Installation

Prerequisites

Install the following dependencies before install speech_text_pipeline:

datasets
omegaconf
pyannote.audio
hydra-core
git+https://github.com/openai/whisper.git
git+https://github.com/NVIDIA/NeMo.git

Main Package

Once the prerequisite packages are installed, you can install speech_text_pipeline using pip:

pip install speech_text_pipeline

Usage

HF_TOKEN for Speaker Matching

Before using the package you need to have access to 🤗 HuggingFace pyannote/embedding model for speaker matching functionality. Follow steps to get access of the model:

Log in to your 🤗 Hugging Face account and visit pyannote/embedding model.
Request for access of the model(if not done already).
After getting access, generate your Hugging Face access token (HF_TOKEN) from Access Token tab in your account settings.

After generating token you use it in either of the two ways:

CLI login:

huggingface-cli login

Then, input your HF_TOKEN when prompted.

In Code:

Pass your HF_TOKEN directly to the transcribe function as a parameter:

import speech_text_pipeline as stp

result = stp.transcribe(audio="path_to_audio_file.wav", 
                        speaker_audio="path_to_known_speaker_audio.wav", 
                        HF_TOKEN="Your HF_TOKEN")

Note: The Hugging Face token is only required for the speaker matching functionality.

Pipeline(anonymous speakers)

This mode generates a transcript with speaker diarization, assigning anonymous labels to speakers(e.g., "Speaker 1", "Speaker 2").

import speech_text_pipeline as stp

audio_url = "path_to_audio_file.wav"

result = stp.transcribe(audio=audio_url)

Get diarized transcript with anonymous speakers

print(result)

Pipeline(named speakers)

import speech_text_pipeline as stp

audio_url = "path_to_audio_file.wav"

agent_audio_url = "path_to_agent_audio.wav" # Sample of the known speaker

result_with_speaker = stp.transcribe(audio=audio_url, 
                                    speaker_audio=agent_audio_url, 
                                    HF_TOKEN="Your HF_TOKEN") # Passing your geenrated Hugging Face token

Get diarized transcript with named speaker

print(result_with_speaker)

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.3 yanked

Oct 10, 2024

Reason this release was yanked:

in development

0.1.2 yanked

Oct 9, 2024

Reason this release was yanked:

In deveelopment

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

speech_text_pipeline-0.1.3.tar.gz (12.5 kB view details)

Uploaded Oct 10, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

speech_text_pipeline-0.1.3-py3-none-any.whl (11.3 kB view details)

Uploaded Oct 10, 2024 Python 3

File details

Details for the file speech_text_pipeline-0.1.3.tar.gz.

File metadata

Download URL: speech_text_pipeline-0.1.3.tar.gz
Upload date: Oct 10, 2024
Size: 12.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.8.10

File hashes

Hashes for speech_text_pipeline-0.1.3.tar.gz
Algorithm	Hash digest
SHA256	`e2839aad37e526ee21ca31eabef267cd789deae9d0d82e39e7438c079247e9b0`
MD5	`b645efe2120d687246b9227a116dd121`
BLAKE2b-256	`75caa0dd55a274621bfabe6da118dfaa677db1b27770a3888d1accef5444644d`

See more details on using hashes here.

File details

Details for the file speech_text_pipeline-0.1.3-py3-none-any.whl.

File metadata

Download URL: speech_text_pipeline-0.1.3-py3-none-any.whl
Upload date: Oct 10, 2024
Size: 11.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.8.10

File hashes

Hashes for speech_text_pipeline-0.1.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9cd475dda07b77757a5d4f727c590d78f4527c224a057bbf98ea78629dc2cd7d`
MD5	`5339c2bfb36b9f7b9384f829eb5e4763`
BLAKE2b-256	`042c4eb39fb2efe4fd48732d8962bc811cd729193ced5472da6c36bf5c6bf588`

See more details on using hashes here.

speech-text-pipeline 0.1.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Speech Text Pipeline

Installation

Prerequisites

Main Package

Usage

HF_TOKEN for Speaker Matching

Pipeline(anonymous speakers)

Get diarized transcript with anonymous speakers

Pipeline(named speakers)

Get diarized transcript with named speaker

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes