An open-source offline speech-to-text package for Bangla language. Fine-tuned on the latest whisper speech to text model for optimal performance.

These details have not been verified by PyPI

Project links

GitHub Statistics

Project description

Bangla Speech to Text

BanglaSpeech2Text: An open-source offline speech-to-text package for Bangla language. Fine-tuned on the latest whisper speech to text model for optimal performance. Transcribe speech to text, convert voice to text and perform speech recognition in python with ease, even without internet connection.

Installation

pip install banglaspeech2text

Models

Model	Size	Best(WER)
'tiny'	100-200 MB	N/A
'base'	200-300 MB	46
'small'	2-3 GB	18

NOTE: Bigger model have better accuracy but slower inference speed. Smaller wer is better.You can view the models from here

Pre-requisites

Python 3.6+
Git
Git LFS

Download Git

Windows

Download git from here
Download git lfs from here

Note: Must check git lfs is marked during installation. If not, you can install git lfs from here

Linux

Git
Git LFS Ubuntu 16.04

curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash
sudo apt-get install git-lfs

Ubuntu 18.04 and above

sudo apt-get install git-lfs

Mac

Git
Git LFS

brew install git-lfs

Download Git with banglaspeech2text

from banglaspeech2text.utils.install_packages import install_git_windows, install_git_linux

# for windows
install_git_windows()

# for linux
install_git_linux()

Usage

Download a model

from banglaspeech2text import Model, available_models

# Download a model
models = available_models()
print(models) # see the available models by diffrent people and diffrent sizes

model = models[0] # select a model
model.download() # download the model

Use with file

from banglaspeech2text import Model, available_models

# Load a model
models = available_models()
model = models[0] # select a model
model = Model(model) # load the model
model.load()

# Use with file
file_name = 'test.wav'
output = model.recognize(file_name)

print(output) # output will be a dict containing text
print(output['text'])

Use with SpeechRecognition

import speech_recognition as sr
from banglaspeech2text import Model, available_models

# Load a model
models = available_models()
model = models[0] # select a model
model = Model(model) # load the model
model.load()


r = sr.Recognizer()
with sr.Microphone() as source:
    print("Say something!")
    audio = r.listen(source)
    output = model.recognize(audio)

print(output) # output will be a dict containing text
print(output['text'])

Use GPU

import speech_recognition as sr
from banglaspeech2text import Model, available_models

# Load a model
models = available_models()
model = models[0] # select a model
model = Model(model,device="gpu") # load the model
model.load()


r = sr.Recognizer()
with sr.Microphone() as source:
    print("Say something!")
    audio = r.listen(source)
    output = model.recognize(audio)

print(output) # output will be a dict containing text
print(output['text'])

NOTE: This package uses torch as backend. So, you can use any device supported by torch. For more information, see here. But you need to setup torch for gpu first from here.

Some Methods

from banglaspeech2text import Model, available_models

models = available_models()
print(models[0]) # get first model
print(models['base']) # get base models
print(models['whisper_base_bn_sifat']) # get model by name

# set download path
model = Model(model,download_path=r"F:\Code\Python\BanglaSpeech2Text\models") # default is home directory
model.load()

# directly load a model
model = Model('base')
model.load()

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

Release history Release notifications | RSS feed

1.0.8

Sep 27, 2023

1.0.7

Aug 27, 2023

1.0.6

Aug 25, 2023

1.0.3

Aug 15, 2023

1.0.2

Aug 7, 2023

1.0.1

Aug 7, 2023

0.0.19

Jun 7, 2023

0.0.18

Jun 6, 2023

0.0.17

Jun 6, 2023

0.0.16

Jan 16, 2023

0.0.15

Jan 13, 2023

0.0.14

Jan 13, 2023

0.0.13

Jan 13, 2023

0.0.12

Jan 13, 2023

0.0.11

Jan 13, 2023

0.0.10

Jan 13, 2023

0.0.9

Jan 13, 2023

This version

0.0.8

Jan 13, 2023

0.0.7

Jan 13, 2023

0.0.6

Jan 13, 2023

0.0.5

Jan 12, 2023

0.0.4

Jan 12, 2023

0.0.3

Jan 12, 2023

0.0.2

Jan 12, 2023

0.0.1

Jan 12, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

BanglaSpeech2Text-0.0.8.tar.gz (10.3 kB view hashes)

Uploaded Jan 13, 2023 Source

Built Distribution

BanglaSpeech2Text-0.0.8-py3-none-any.whl (11.6 kB view hashes)

Uploaded Jan 13, 2023 Python 3

Hashes for BanglaSpeech2Text-0.0.8.tar.gz

Hashes for BanglaSpeech2Text-0.0.8.tar.gz
Algorithm	Hash digest
SHA256	`2f1430b754f8c98b3aba5fcb2602f9b30744d9a6cbad847f4572333f4696fc25`
MD5	`a61800545366ce2cc48acf80538756d2`
BLAKE2b-256	`5938bf029c132c73058a756e08e9aa885abd6d876666900973f1da41af3164cf`

Hashes for BanglaSpeech2Text-0.0.8-py3-none-any.whl

Hashes for BanglaSpeech2Text-0.0.8-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ddd52d5b55021947bcf1999a0e8cd6693576107868d586c4870d62e33dadd7a8`
MD5	`b917cd94eab065a052ac58941bb6b7eb`
BLAKE2b-256	`f83e3c98465ccee73d081ba9f8b7363544b2ecda3494180e4c075da3873a2adc`