Skip to main content

A Python package for speech transcription and speaker diarization with speaker matching functionality.

Reason this release was yanked:

In deveelopment

Project description

Speech Text Pipeline

speech_text_pipeline is a Python package that allows you to process audio files for automatic speech recognition (ASR), speaker diarization, and speaker matching. The package is designed to handle both cases with regular transcription with diarization and cases with transcription, diarization with speaker identification where one speaker’s identity is known and provided via an additional audio sample.

Installation

Prerequisites

Install the following dependencies before install speech_text_pipeline:

Main Package

Once the prerequisite packages are installed, you can install speech_text_pipeline using pip:

pip install speech_text_pipeline

Usage

HF_TOKEN for Speaker Matching

Before using the package you need to have access to 🤗 HuggingFace pyannote/embedding model for speaker matching functionality. Follow steps to get access of the model:

  1. Log in to your 🤗 Hugging Face account and visit pyannote/embedding model.

  2. Request for access of the model(if not done already).

  3. After getting access, generate your Hugging Face access token (HF_TOKEN) from Access Token tab in your account settings.

  4. After generating token you use it in either of the two ways:

    • CLI login:
    huggingface-cli login
    

    Then, input your HF_TOKEN when prompted.

    • In Code:

    Pass your HF_TOKEN directly to the transcribe function as a parameter:

    import speech_text_pipeline as stp
    
    result = stp.transcribe(audio="path_to_audio_file.wav", 
                            speaker_audio="path_to_known_speaker_audio.wav", 
                            HF_TOKEN="Your HF_TOKEN")
    

Note: The Hugging Face token is only required for the speaker matching functionality.

Pipeline(anonymous speakers)

This mode generates a transcript with speaker diarization, assigning anonymous labels to speakers(e.g., "Speaker 1", "Speaker 2").

import speech_text_pipeline as stp

audio_url = "path_to_audio_file.wav"

result = stp.transcribe(audio=audio_url)

Get diarized transcript with anonymous speakers

print(result)

Pipeline(named speakers)

import speech_text_pipeline as stp

audio_url = "path_to_audio_file.wav"

agent_audio_url = "path_to_agent_audio.wav" # Sample of the known speaker

result_with_speaker = stp.transcribe(audio=audio_url, 
                                    speaker_audio=agent_audio_url, 
                                    HF_TOKEN="Your HF_TOKEN") # Passing your geenrated Hugging Face token

Get diarized transcript with named speaker

print(result_with_speaker)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

speech_text_pipeline-0.1.2.tar.gz (12.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

speech_text_pipeline-0.1.2-py3-none-any.whl (11.2 kB view details)

Uploaded Python 3

File details

Details for the file speech_text_pipeline-0.1.2.tar.gz.

File metadata

  • Download URL: speech_text_pipeline-0.1.2.tar.gz
  • Upload date:
  • Size: 12.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.8.10

File hashes

Hashes for speech_text_pipeline-0.1.2.tar.gz
Algorithm Hash digest
SHA256 f0f51690d125adde86097e4fc15649e6d1a9d6e0371195d4611dcd1728209a6b
MD5 aeea87c968e5ef6836af8b19e27e01ef
BLAKE2b-256 f073b1619b16d2f8b4e8c444045e215aaabe9cc58ee3b06c5b21d6036d536279

See more details on using hashes here.

File details

Details for the file speech_text_pipeline-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for speech_text_pipeline-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 c50553fd2041753314488dea7506424af607638a464ab4d0e7bac4bcdc552769
MD5 6419892901fda4e0c3d2e09b237390ec
BLAKE2b-256 3eabbd2434b7dff0a0402c411e9ebf97b2b47e619974612542c9540426d73fcd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page