An open-source offline speech-to-text package for Bangla language.
Project description
Bangla Speech to Text
BanglaSpeech2Text: An open-source offline speech-to-text package for Bangla language. Fine-tuned on the latest whisper speech to text model for optimal performance. Transcribe speech to text, convert voice to text and perform speech recognition in python with ease, even without internet connection.
Models
Model | Size | Best(WER) |
---|---|---|
'tiny' | 100-200 MB | 60 |
'base' | 200-300 MB | 46 |
'small' | 1 GB | 18 |
'large' | 3-4 GB | 11 |
NOTE: Bigger model have better accuracy but slower inference speed. More models HuggingFace Model Hub
Pre-requisites
- Python 3.6+
Test it in Google Colab
Installation
You can install the library using pip:
pip install banglaspeech2text
Usage
Model Initialization
To use the library, you need to initialize the Speech2Text class with the desired model. By default, it uses the "base" model, but you can choose from different pre-trained models: "tiny", "small", "medium", "base", or "large". Here's an example:
from banglaspeech2text import Speech2Text
stt = Speech2Text(model="base")
# You can use it wihout specifying model name (default model is "base")
stt = Speech2Text()
Transcribing Audio Files
You can transcribe an audio file by calling the transcribe method and passing the path to the audio file. It will return the transcribed text as a string. Here's an example:
transcription = stt.transcribe("audio.wav")
print(transcription)
Use with SpeechRecognition
You can use SpeechRecognition package to get audio from microphone and transcribe it. Here's an example:
import speech_recognition as sr
from banglaspeech2text import Speech2Text
stt = Speech2Text(model="base")
r = sr.Recognizer()
with sr.Microphone() as source:
print("Say something!")
audio = r.listen(source)
output = stt.recognize(audio)
print(output)
Use GPU
You can use GPU for faster inference. Here's an example:
stt = Speech2Text(model="base",use_gpu=True)
Advanced GPU Usage
For more advanced GPU usage you can use device
or device_map
parameter. Here's an example:
stt = Speech2Text(model="base",device="cuda:0")
stt = Speech2Text(model="base",device_map="auto")
NOTE: Read more about Pytorch Device
Instantly Check with gradio
You can instantly check the model with gradio. Here's an example:
from banglaspeech2text import Speech2Text, available_models
import gradio as gr
stt = Speech2Text(model="base",use_gpu=True)
# You can also open the url and check it in mobile
gr.Interface(
fn=stt.transcribe,
inputs=gr.Audio(source="microphone", type="filepath"),
outputs="text").launch(share=True)
Some more usage examples
Change Model from huggingface model hub
sst = Speech2Text(model="openai/whisper-tiny")
Change Model Save location
sst = Speech2Text(model="base",cache_path="path/to/save/model")
See current model info
sst = Speech2Text(model="base")
print(sst.model_name) # the name of the model
print(sst.model_size) # the size of the model
print(sst.model_license) # the license of the model
print(sst.model_description) # the description of the model(in .md format)
print(sst.model_url) # the url of the model
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for BanglaSpeech2Text-0.0.19-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | de559551eb451990b25181d34b24bb1bf5f4e566261f0fdfd67ce902358abb1d |
|
MD5 | 12d423a2d81cadcfc2f0849b691321c3 |
|
BLAKE2b-256 | 608ffed6886cb79f538711be069c20648f0fb879ed24a8487e8a0db73c63025b |