Skip to main content

An open-source offline speech-to-text package for Bangla language.

Project description

Bangla Speech to Text

BanglaSpeech2Text: An open-source offline speech-to-text package for Bangla language. Fine-tuned on the latest whisper speech to text model for optimal performance. Transcribe speech to text, convert voice to text and perform speech recognition in python with ease, even without internet connection.

Models

Model Size Best(WER)
'tiny' 100-200 MB 60
'base' 200-300 MB 46
'small' 1 GB 18
'large' 3-4 GB 11

NOTE: Bigger model have better accuracy but slower inference speed. More models HuggingFace Model Hub

Pre-requisites

  • Python 3.6+

Test it in Google Colab

  • Open In Colab

Installation

You can install the library using pip:

pip install banglaspeech2text

Usage

Model Initialization

To use the library, you need to initialize the Speech2Text class with the desired model. By default, it uses the "base" model, but you can choose from different pre-trained models: "tiny", "small", "medium", "base", or "large". Here's an example:

from banglaspeech2text import Speech2Text

stt = Speech2Text(model="base")

# You can use it wihout specifying model name (default model is "base")
stt = Speech2Text()

Transcribing Audio Files

You can transcribe an audio file by calling the transcribe method and passing the path to the audio file. It will return the transcribed text as a string. Here's an example:

transcription = stt.transcribe("audio.wav")
print(transcription)

For longer audio files (As different models have different max audio length, so you can use the following methods to transcribe longer audio files)

For longer audio files, you can use the generate_text or recognize method. It will return a generator object. Here's an example:

for text in stt.generate_text("audio.wav"):
    print(text)

# or
text = stt.recognize("audio.wav", split=True) # it will use split_on_silence from pydub to split the audio
print(text)

# or
# you can pass min_silence_length and silence_threshold to split_on_silence
text = stt.recognize("audio.wav", split=True, min_silence_length=1000, silence_threshold=-16)
print(text)

Use with SpeechRecognition

You can use SpeechRecognition package to get audio from microphone and transcribe it. Here's an example:

import speech_recognition as sr
from banglaspeech2text import Speech2Text

stt = Speech2Text(model="base")

r = sr.Recognizer()
with sr.Microphone() as source:
    print("Say something!")
    r.adjust_for_ambient_noise(source)
    audio = r.listen(source)
    output = stt.recognize(audio)

print(output)

Use GPU

You can use GPU for faster inference. Here's an example:

stt = Speech2Text(model="base",use_gpu=True)

Advanced GPU Usage

For more advanced GPU usage you can use device or device_map parameter. Here's an example:

stt = Speech2Text(model="base",device="cuda:0")
stt = Speech2Text(model="base",device_map="auto")

NOTE: Read more about Pytorch Device

Instantly Check with gradio

You can instantly check the model with gradio. Here's an example:

from banglaspeech2text import Speech2Text, available_models
import gradio as gr

stt = Speech2Text(model="base",use_gpu=True)

# You can also open the url and check it in mobile
gr.Interface(
    fn=stt.transcribe,
    inputs=gr.Audio(source="microphone", type="filepath"),
    outputs="text").launch(share=True)

Some more usage examples

Use huggingface model

stt = Speech2Text(model="openai/whisper-tiny")

Change Model Save location

stt = Speech2Text(model="base",cache_path="path/to/save/model")

See current model info

stt = Speech2Text(model="base")

print(stt.model_name) # the name of the model
print(stt.model_size) # the size of the model
print(stt.model_license) # the license of the model
print(stt.model_description) # the description of the model(in .md format)
print(stt.model_url) # the url of the model
print(stt.model_wer) # word error rate of the model

CLI

You can use the library from the command line. Here's an example:

bnstt 'file.wav'

You can also use it with microphone:

bnstt --mic

Other options:

usage: bnstt
       [-h]
       [-gpu]
       [-c CACHE]
       [-o OUTPUT]
       [-m MODEL]
       [-s]
       [-sm MIN_SILENCE_LENGTH]
       [-st SILENCE_THRESH]
       [-sp PADDING]
       [--list]
       [--info]
       [INPUT ...]

Bangla Speech to Text

positional arguments:
  INPUT
    inputfile(s) or list of files

options:
  -h, --help
    show this help message and exit
  -gpu
    use gpu
  -c CACHE, --cache CACHE
    cache directory
  -o OUTPUT, --output OUTPUT
    output directory
  -m MODEL, --model MODEL
    model name
  -s, --split
    split audio file using pydub split_on_silence
  -sm MIN_SILENCE_LENGTH, --min_silence_length MIN_SILENCE_LENGTH Minimum length of silence to split on (in ms)
  -st SILENCE_THRESH, --silence_thresh SILENCE_THRESH dBFS below reference to be considered silence
  -sp PADDING, --padding PADDING Padding to add to beginning and end of each split (in ms)
  --list list of available models
  --info show model info

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

BanglaSpeech2Text-1.0.1.tar.gz (17.2 kB view hashes)

Uploaded Source

Built Distribution

BanglaSpeech2Text-1.0.1-py3-none-any.whl (16.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page