speechlib is a library that can do speaker diarization, transcription and speaker recognition on an audio file to create transcripts with actual speaker names
Project description
This library do speaker diarization, speaker recognition, transcription on a single wav file to provide a transcript with actual speaker names. This library will also return an array containing result information.
Transcriptor takes 4 arguments. file to transcribe, log_folder, language used for transcribing, voices folder
voices_folder should contain subfolders named with speaker names and their voice samples. This will be used for speaker recognition to identify speaker.
if voice_folder is not provided then speaker tags will be arbitrary.
log_folder is to store final transcript as a text file.
example:
from speechlib import Transcriptor
file = "obama.wav"
voice_folder = "voices"
language = "english"
log_folder = "logs"
transcriptor = Transcriptor(file, log_folder, language, voice_folder)
res = transcriptor.transcribe()
print(res)
--> [["start", "end", "text", "speaker"], ["start", "end", "text", "speaker"]...]
start: starting time of speech
end: ending time of speech
text: transcribed text for speech during start and end
speaker: speaker of the text
This library uses following huggingface models:
https://huggingface.co/speechbrain/spkrec-ecapa-voxceleb
https://huggingface.co/Ransaka/whisper-tiny-sinhala-20k-8k-steps-v2
https://huggingface.co/openai/whisper-medium
https://huggingface.co/pyannote/speaker-diarization
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for speechlib-1.0.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | dd8a5a014409c0166abe0f7cdebdd3624958ec0de41684eea991204b4bae4546 |
|
MD5 | 6cf20d49f88a5d852bfe3ff5b99c582b |
|
BLAKE2b-256 | 34dd11768db0135fdf104afd8dd071075684a80b10f9bb25e2552b7b51dbf233 |