Skip to main content

An open-source offline speech-to-text package for Bangla language. Fine-tuned on the latest whisper speech to text model for optimal performance.

Project description

Bangla Speech to Text

BanglaSpeech2Text: An open-source offline speech-to-text package for Bangla language. Fine-tuned on the latest whisper speech to text model for optimal performance. Transcribe speech to text, convert voice to text and perform speech recognition in python with ease, even without internet connection.

Installation

pip install banglaspeech2text

Models

Model Size Best(WER)
'tiny' 100-200 MB N/A
'base' 200-300 MB 46
'small' 900-1000 MB 18

NOTE: Bigger model have better accuracy but slower inference speed. Smaller wer is better.You can view the models from here

Usage

Download a model

from banglaspeech2text import Model, available_models

# Download a model
models = available_models()
print(models) # see the available models by diffrent people and diffrent sizes

model = models[0] # select a model
model.download() # download the model

Use with file

from banglaspeech2text import Model, available_models

# Load a model
models = available_models()
model = models[0] # select a model
model = Model(model) # load the model
model.load()

# Use with file
file_name = 'test.wav'
output = model.recognize(file_name)

print(output) # output will be a dict containing text
print(output['text'])

Use with SpeechRecognition

import speech_recognition as sr
from banglaspeech2text import Model, available_models

# Load a model
models = available_models()
model = models[0] # select a model
model = Model(model) # load the model
model.load()


r = sr.Recognizer()
with sr.Microphone() as source:
    print("Say something!")
    audio = r.listen(source)
    output = model.recognize(audio)

print(output) # output will be a dict containing text
print(output['text'])

Use GPU

import speech_recognition as sr
from banglaspeech2text import Model, available_models

# Load a model
models = available_models()
model = models[0] # select a model
model = Model(model,device="gpu") # load the model
model.load()


r = sr.Recognizer()
with sr.Microphone() as source:
    print("Say something!")
    audio = r.listen(source)
    output = model.recognize(audio)

print(output) # output will be a dict containing text
print(output['text'])

NOTE: This package uses torch as backend. So, you can use any device supported by torch. For more information, see here. But you need to setup torch for gpu first from here.

Some Methods

from banglaspeech2text import Model, available_models

models = available_models()
print(models[0]) # get first model
print(models['base']) # get base models
print(models['whisper_base_bn_sifat']) # get model by name

# set download path
model = Model(model,download_path=r"F:\Code\Python\BanglaSpeech2Text\models") # default is home directory
model.load()

# directly load a model
model = Model('base')
model.load()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

BanglaSpeech2Text-0.0.4.tar.gz (8.1 kB view hashes)

Uploaded Source

Built Distribution

BanglaSpeech2Text-0.0.4-py3-none-any.whl (8.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page