An open-source offline speech-to-text package for Bangla language.

These details have not been verified by PyPI

Project links

Project description

BanglaSpeech2Text (Bangla Speech to Text)

BanglaSpeech2Text: An open-source offline speech-to-text package for Bangla language. Fine-tuned on the latest whisper speech to text model for optimal performance. Transcribe speech to text, convert voice to text and perform speech recognition in python with ease, even without internet connection.

Models

Model	Size	Best(WER)
`tiny`	100-200 MB	74
`base`	200-300 MB	46
`small`	1 GB	18
`large`	3-4 GB	11

NOTE: Bigger model have better accuracy but slower inference speed. More models HuggingFace Model Hub

Pre-requisites

Python 3.7 or higher

Test it in Google Colab

Installation

You can install the library using pip:

pip install banglaspeech2text

Usage

Model Initialization

To use the library, you need to initialize the Speech2Text class with the desired model. By default, it uses the "base" model, but you can choose from different pre-trained models: "tiny", "small", "base", or "large". Here's an example:

from banglaspeech2text import Speech2Text

stt = Speech2Text("base")

# You can use it wihout specifying model name (default model is "large")
stt = Speech2Text()

Transcribing Audio Files

You can transcribe an audio file by calling the recognize method and passing the path to the audio file. It will return the transcribed text as a string. Here's an example:

transcription = stt.recognize("audio.wav")
print(transcription)

Get Transcription as they are processed with time

segments = stt.recognize("audio.wav", return_segments=True)
for segment in segments:
    print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))

Multiple Audio Formats

BanglaSpeech2Text supports the following audio formats for input:

File Formats: mp3, mp4, mpeg, mpga, m4a, wav, webm, and more.
Bytes: Raw audio data in byte format.
Numpy Array: Numpy array representing audio data, preferably obtained using librosa.load.
AudioData: Audio data obtained from the speech_recognition library.
AudioSegment: Audio segment objects from the pydub library.
BytesIO: Audio data provided through BytesIO objects from the io module.
Path: Pathlib Path object pointing to an audio file.

No need for extra code to convert audio files to a specific format. BanglaSpeech2Text automatically handles the conversion for you:

transcription = stt.recognize("audio.mp3")
print(transcription)

Use with SpeechRecognition

You can use SpeechRecognition package to get audio from microphone and transcribe it. Here's an example:

import speech_recognition as sr
from banglaspeech2text import Speech2Text

stt = Speech2Text()

r = sr.Recognizer()
with sr.Microphone() as source:
    print("Say something!")
    r.adjust_for_ambient_noise(source)
    audio = r.listen(source)
    output = stt.recognize(audio)

print(output)

Instantly Check with gradio

You can instantly check the model with gradio. Here's an example:

from banglaspeech2text import Speech2Text, available_models
import gradio as gr

stt = Speech2Text()

# You can also open the url and check it in mobile
gr.Interface(
    fn=stt.recognize,
    inputs=gr.Audio(source="microphone", type="filepath"),
    outputs="text").launch(share=True)

Some more usage examples

Use huggingface model

stt = Speech2Text("openai/whisper-tiny")

See current model info

stt = Speech2Text("base")

print(stt.model_metadata) # Model metadata (name, size, wer, license, etc.)
print(stt.model_metadata.wer) # Word Error Rate (not available for all models)

CLI

You can use the library from the command line. Here's an example:

bnstt 'file.wav'

You can also use it with microphone:

bnstt --mic

Other options:

usage: bnstt
       [-h]
       [-gpu]
       [-c CACHE]
       [-o OUTPUT]
       [-m MODEL]
       [-s]
       [-sm MIN_SILENCE_LENGTH]
       [-st SILENCE_THRESH]
       [-sp PADDING]
       [--list]
       [--info]
       [INPUT ...]

Bangla Speech to Text

positional arguments:
  INPUT
    inputfile(s) or list of files

options:
  -h, --help
    show this help message and exit
  -o OUTPUT, --output OUTPUT
    output directory
  -m MODEL, --model MODEL
    model name
  --list list of available models
  --info show model info

Custom Use Cases and Support

If your business or project has specific speech-to-text requirements that go beyond the capabilities of the provided open-source package, I'm here to help! I understand that each use case is unique, and I'm open to collaborating on custom solutions that meet your needs. Whether you have longer audio files that need accurate transcription, require model fine-tuning, or need assistance in implementing the package effectively, I'm available for support.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.1.0

Mar 1, 2025

1.0.9

Nov 8, 2024

1.0.8

Sep 27, 2023

1.0.7

Aug 27, 2023

1.0.6

Aug 25, 2023

1.0.3

Aug 15, 2023

1.0.2

Aug 7, 2023

1.0.1

Aug 7, 2023

0.0.19

Jun 7, 2023

0.0.18

Jun 6, 2023

0.0.17

Jun 6, 2023

0.0.16

Jan 16, 2023

0.0.15

Jan 13, 2023

0.0.14

Jan 13, 2023

0.0.13

Jan 13, 2023

0.0.12

Jan 13, 2023

0.0.11

Jan 13, 2023

0.0.10

Jan 13, 2023

0.0.9

Jan 13, 2023

0.0.8

Jan 13, 2023

0.0.7

Jan 13, 2023

0.0.6

Jan 13, 2023

0.0.5

Jan 12, 2023

0.0.4

Jan 12, 2023

0.0.3

Jan 12, 2023

0.0.2

Jan 12, 2023

0.0.1

Jan 12, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

banglaspeech2text-1.1.0.tar.gz (19.7 kB view details)

Uploaded Mar 1, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

banglaspeech2text-1.1.0-py3-none-any.whl (18.0 kB view details)

Uploaded Mar 1, 2025 Python 3

File details

Details for the file banglaspeech2text-1.1.0.tar.gz.

File metadata

Download URL: banglaspeech2text-1.1.0.tar.gz
Upload date: Mar 1, 2025
Size: 19.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for banglaspeech2text-1.1.0.tar.gz
Algorithm	Hash digest
SHA256	`bda5adf222b19c4abd55a1cde60b8664fe4bf8200f1a0341c9e88ee73f2c969d`
MD5	`c9830ad0ab1b3d9bc58e6522315f2d9b`
BLAKE2b-256	`b4078e2309bdfed77f17596b71461113c01d0d617c58d5fa8e0361ed09336c10`

See more details on using hashes here.

File details

Details for the file banglaspeech2text-1.1.0-py3-none-any.whl.

File metadata

Download URL: banglaspeech2text-1.1.0-py3-none-any.whl
Upload date: Mar 1, 2025
Size: 18.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for banglaspeech2text-1.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`52dde46a0c1d9c2a776b53e9cd2b5b7b6a90b08c9b8028a45fac2b15e588108c`
MD5	`606fb2512016891f9e89e1644ebd70a1`
BLAKE2b-256	`e4ebb60d088fdf00e87376dd6d7e2e6fb072a493076294a3b051c1ded28f9cea`

See more details on using hashes here.

BanglaSpeech2Text 1.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

BanglaSpeech2Text (Bangla Speech to Text)

Models

Pre-requisites

Test it in Google Colab

Installation

Usage

Model Initialization

Transcribing Audio Files

Get Transcription as they are processed with time

Multiple Audio Formats

Use with SpeechRecognition

Instantly Check with gradio

Some more usage examples

Use huggingface model

See current model info

CLI

Custom Use Cases and Support

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes