Skip to main content

TMH Speech package

Project description

TMH Speech

TMH Speech is a library that gives access to open source models for transcription.

Read the docs

https://tmh-docs.readthedocs.io/en/latest/docs.html#getting-started

Getting started

To start the project you first need to install tmh and pyannote, since we are using newer packages.

pip install tmh
pip install https://github.com/pyannote/pyannote-audio/archive/develop.zip

Example usage

Transcription

from tmh.transcribe import transcribe_from_audio_path
file_path = "./sv.wav"
transcription = "Nu prövar vi att spela in ljud på svenska sex laxar i en laxask de finns en stor banan"
print("creating transcription")
asr_transcription = transcribe_from_audio_path(file_path)
print("output")
print(asr_transcription)
print("the transcription is", transcription)

Transcribe with VAD

from tmh.transcribe_with_vad import transcribe_from_audio_path_split_on_speech
file_path = "./sv.wav"
print("creating transcription")
asr_transcription_with_vad = transcribe_from_audio_path_split_on_speech(file_path)
print("transcription")
print(asr_transcription_with_vad)

Overlap detection

from tmh.overalp import overlap_detection

file_path = "./sv.wav"
overlap = overlap_detection(audio_path)
print(overlap)

Language classification

from tmh.transcribe import classify_language
file_path = "./sv.wav"
transcription = "Nu prövar vi att spela in ljud på svenska sex laxar i en laxask de finns en stor banan"
print("classifying language")
language = classify_language(file_path)
print("the language is", language)

Classify emotion

from tmh.transcribe import classify_emotion
file_path = "./sv.wav"
print("classifying emotion")
language = classify_emotion(file_path)
print("the emotion is", language)

Speaker embeddings

The speaker embeddings are made using the following library https://huggingface.co/speechbrain/spkrec-xvect-voxceleb

Extract speaker embedding

from tmh.transcribe import extract_speaker_embedding
file_path = "./sv.wav"
print("extracting speaker embedding")
embeddings = extract_speaker_embedding(file_path)
print("the speaker embedding is", embeddings)

Voice activity detection

from tmh.vad import extract_silences
file_path = "./sv.wav"
print("extracting silences")
embeddings = extract_silences(file_path)
print("the silences are", embeddings)

Speech Generation

Tacotron 2

Make sure you install these packages before running tacotron 2

pip install numpy scipy librosa unidecode inflect librosa
apt-get update
apt-get install -y libsndfile1

Text generation

You can use the text generation api to generate text based on any pretrained model from huggingface.

Example Swedish

from tmh.text import generate_text

output = generate_text(model='birgermoell/swedish-gpt', prompt="AI har möjligheten att", min_length=150)
print(output)

Example GPT-j

from tmh.text import generate_text

output = generate_text(model='EleutherAI/gpt-neo-2.7B', prompt="EleutherAI has", min_length=150)
print(output)

Codex

Generate code and save to file. To use

from tmh.code import generate_from_prompt, write_to_file
response = generate_from_prompt('''
A pytorch neural network model for MNIST
'''
)
write_to_file(response, "generated.py")

Build instructions

Change the version number

python3 -m build 
twine upload --skip-existing dist/*

Read the docs

https://tmh-docs.readthedocs.io/en/latest/docs.html#getting-started

Github

https://github.com/BirgerMoell/tmh

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tmh-0.0.46.tar.gz (9.2 kB view hashes)

Uploaded Source

Built Distribution

tmh-0.0.46-py3-none-any.whl (11.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page