An open-source offline speech-to-text package for Bangla language. Fine-tuned on the latest whisper speech to text model for optimal performance.
Project description
Bangla Speech to Text
BanglaSpeech2Text: An open-source offline speech-to-text package for Bangla language. Fine-tuned on the latest whisper speech to text model for optimal performance. Transcribe speech to text, convert voice to text and perform speech recognition in python with ease, even without internet connection.
Installation
pip install banglaspeech2text
Models
Model | Size | Best(WER) |
---|---|---|
'tiny' | 100-200 MB | N/A |
'base' | 200-300 MB | 46 |
'small' | 2-3 GB | 18 |
'large' | 5-6 GB | 11 |
NOTE: Bigger model have better accuracy but slower inference speed. Smaller wer is better.You can view the models from here. The size of the mode is an estimate. The actual size may vary.
Pre-requisites
- Python 3.6+
- Git
- Git LFS
Download Git
Windows
Note: Must check git lfs is marked during installation. If not, you can install git lfs from here
Linux
- Git
- Git LFS Ubuntu 16.04
curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash
sudo apt-get install git-lfs
Ubuntu 18.04 and above
sudo apt-get install git-lfs
Mac
- Git
- Git LFS
brew install git-lfs
Download Git with banglaspeech2text
from banglaspeech2text.utils.install_packages import install_git_windows, install_git_linux
# for windows
install_git_windows()
# for linux
install_git_linux()
Usage
Download a model
from banglaspeech2text import Model, available_models
# Download a model
models = available_models()
print(models) # see the available models by diffrent people and diffrent sizes
model = models[0] # select a model
model.download() # download the model
Use with file
from banglaspeech2text import Model, available_models
# Load a model
models = available_models()
model = models[0] # select a model
model = Model(model) # load the model
model.load()
# Use with file
file_name = 'test.wav'
output = model.recognize(file_name)
print(output) # output will be a dict containing text
print(output['text'])
Use with SpeechRecognition
import speech_recognition as sr
from banglaspeech2text import Model, available_models
# Load a model
models = available_models()
model = models[0] # select a model
model = Model(model) # load the model
model.load()
r = sr.Recognizer()
with sr.Microphone() as source:
print("Say something!")
audio = r.listen(source)
output = model.recognize(audio)
print(output) # output will be a dict containing text
print(output['text'])
Use GPU
import speech_recognition as sr
from banglaspeech2text import Model, available_models
# Load a model
models = available_models()
model = models[0] # select a model
model = Model(model,device="gpu") # load the model
model.load()
r = sr.Recognizer()
with sr.Microphone() as source:
print("Say something!")
audio = r.listen(source)
output = model.recognize(audio)
print(output) # output will be a dict containing text
print(output['text'])
NOTE: This package uses torch as backend. So, you can use any device supported by torch. For more information, see here. But you need to setup torch for gpu first from here.
Some Methods
from banglaspeech2text import Model, available_models
models = available_models()
print(models[0]) # get first model
print(models['base']) # get base models
print(models['whisper_base_bn_sifat']) # get model by name
# set download path
model = Model(model,download_path=r"F:\Code\Python\BanglaSpeech2Text\models") # default is home directory
model.load()
# directly load a model
model = Model('base')
model.load()
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for BanglaSpeech2Text-0.0.15-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 16e408e8bbeaa9dda67cfc13db1268c1546b0e430b0122e55dbc5e784898f67b |
|
MD5 | 04d4d6eba9e24845b025c69e87046f9e |
|
BLAKE2b-256 | 1bf8e80d9b48f70bbd9776588d7ef0423b814e2006e67e60e6f90d104f35fcc6 |