Skip to main content

A Python package for speech transcription and speaker diarization with speaker matching functionality.

Reason this release was yanked:

in development

Project description

Speech Text Pipeline

speech_text_pipeline is a Python package that allows you to process audio files for automatic speech recognition (ASR), speaker diarization, and speaker matching. The package is designed to handle both cases with regular transcription with diarization and cases with transcription, diarization with speaker identification where one speaker’s identity is known and provided via an additional audio sample.

Installation

Prerequisites

Install the following dependencies before install speech_text_pipeline:

Main Package

Once the prerequisite packages are installed, you can install speech_text_pipeline using pip:

pip install speech_text_pipeline

Usage

HF_TOKEN for Speaker Matching

Before using the package you need to have access to 🤗 HuggingFace pyannote/embedding model for speaker matching functionality. Follow steps to get access of the model:

  1. Log in to your 🤗 Hugging Face account and visit pyannote/embedding model.

  2. Request for access of the model(if not done already).

  3. After getting access, generate your Hugging Face access token (HF_TOKEN) from Access Token tab in your account settings.

  4. After generating token you use it in either of the two ways:

    • CLI login:
    huggingface-cli login
    

    Then, input your HF_TOKEN when prompted.

    • In Code:

    Pass your HF_TOKEN directly to the transcribe function as a parameter:

    import speech_text_pipeline as stp
    
    result = stp.transcribe(audio="path_to_audio_file.wav", 
                            speaker_audio="path_to_known_speaker_audio.wav", 
                            HF_TOKEN="Your HF_TOKEN")
    

Note: The Hugging Face token is only required for the speaker matching functionality.

Pipeline(anonymous speakers)

This mode generates a transcript with speaker diarization, assigning anonymous labels to speakers(e.g., "Speaker 1", "Speaker 2").

import speech_text_pipeline as stp

audio_url = "path_to_audio_file.wav"

result = stp.transcribe(audio=audio_url)

Get diarized transcript with anonymous speakers

print(result)

Pipeline(named speakers)

import speech_text_pipeline as stp

audio_url = "path_to_audio_file.wav"

agent_audio_url = "path_to_agent_audio.wav" # Sample of the known speaker

result_with_speaker = stp.transcribe(audio=audio_url, 
                                    speaker_audio=agent_audio_url, 
                                    HF_TOKEN="Your HF_TOKEN") # Passing your geenrated Hugging Face token

Get diarized transcript with named speaker

print(result_with_speaker)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

speech_text_pipeline-0.1.3.tar.gz (12.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

speech_text_pipeline-0.1.3-py3-none-any.whl (11.3 kB view details)

Uploaded Python 3

File details

Details for the file speech_text_pipeline-0.1.3.tar.gz.

File metadata

  • Download URL: speech_text_pipeline-0.1.3.tar.gz
  • Upload date:
  • Size: 12.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.8.10

File hashes

Hashes for speech_text_pipeline-0.1.3.tar.gz
Algorithm Hash digest
SHA256 e2839aad37e526ee21ca31eabef267cd789deae9d0d82e39e7438c079247e9b0
MD5 b645efe2120d687246b9227a116dd121
BLAKE2b-256 75caa0dd55a274621bfabe6da118dfaa677db1b27770a3888d1accef5444644d

See more details on using hashes here.

File details

Details for the file speech_text_pipeline-0.1.3-py3-none-any.whl.

File metadata

File hashes

Hashes for speech_text_pipeline-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 9cd475dda07b77757a5d4f727c590d78f4527c224a057bbf98ea78629dc2cd7d
MD5 5339c2bfb36b9f7b9384f829eb5e4763
BLAKE2b-256 042c4eb39fb2efe4fd48732d8962bc811cd729193ced5472da6c36bf5c6bf588

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page