Bengali Speech Recognition Tool using Hugging Face Inference API
Project description
bongovaad (বঙ্গবাদ) 
bongovaad is a Python package for transcribing Bengali audio from YouTube videos. It uses the Hugging Face Inference API with Whisper models for high-quality speech recognition.
Features
- Cloud-based Transcription: Uses Hugging Face Inference API with state-of-the-art Whisper models.
- SRT Subtitle Generation: Automatically creates SRT subtitle files for video players.
- Efficient Audio Processing: Handles audio segmentation for longer videos with progress tracking.
- Concurrent Processing: Uses asynchronous requests for faster transcription of multiple segments.
- Temporary File Management: Uses temporary directories for clean processing.
- Robust Error Handling: Comprehensive error handling and logging.
- Command-line Interface: Easy-to-use CLI with multiple options.
Requirements
- Python 3.8 or higher
- Hugging Face API key
- ffmpeg
Installation
1. Install ffmpeg
Ubuntu/Debian:
sudo apt install ffmpeg -y
macOS (using Homebrew):
brew install ffmpeg
Windows (using Chocolatey):
choco install ffmpeg
2. Install bongovaad
pip install bongovaad
3. Get a Hugging Face API Key
- Create an account on Hugging Face
- Generate an API key at https://huggingface.co/settings/tokens
- Set it as an environment variable:
export HF_API_KEY="your_api_key_here"
Usage
Basic Usage
bongovaad --url "https://www.youtube.com/watch?v=VIDEO_ID"
Advanced Options
bongovaad --url "https://www.youtube.com/watch?v=VIDEO_ID" \
--segment-length 10 \
--output-format both \
--model-id "openai/whisper-large-v3-turbo" \
--api-key "your_api_key_here" \
--verbose
Command-line Arguments
| Argument | Description | Default |
|---|---|---|
--url |
YouTube URL for audio transcription | (Required) |
--segment-length |
Length of each audio segment in seconds | 8 |
--output-format |
Output format (txt, srt, or both) | both |
--api-key |
Hugging Face API key | HF_API_KEY env var |
--model-id |
Model ID to use for transcription | openai/whisper-large-v3-turbo |
--verbose |
Enable verbose logging | False |
Output Files
The tool generates two types of output files:
- Text File (
VIDEO_ID.txt): Contains the full transcription text. - SRT File (
VIDEO_ID.srt): Contains time-coded subtitles compatible with video players.
Python API
You can also use bongovaad as a Python library:
import os
from bongovaad import BongoVaadTranscriber
# Get API key from environment variable or set it directly
api_key = os.environ.get("HF_API_KEY", "your_api_key_here")
# Initialize the transcriber
transcriber = BongoVaadTranscriber(
api_key=api_key,
model_id="openai/whisper-large-v3-turbo"
)
# Transcribe a YouTube video
output_files = transcriber.transcribe(
url="https://www.youtube.com/watch?v=VIDEO_ID",
segment_length_seconds=10,
output_format="both"
)
# Print output file paths
print(f"Text file: {output_files['txt']}")
print(f"SRT file: {output_files['srt']}")
Performance Considerations
- Processing time depends on video length, internet connection, and Hugging Face API response times.
- Longer segment lengths may improve speed but could reduce accuracy for complex audio.
- The API has rate limits, so be mindful of how many requests you make.
License
This project is licensed under the MIT License. See the LICENSE file for details.
Contributing
Contributions are welcome! Here's how you can contribute:
- Fork the repository
- Create a feature branch:
git checkout -b feature-name - Commit your changes:
git commit -am 'Add some feature' - Push to the branch:
git push origin feature-name - Submit a pull request
Acknowledgements
- Hugging Face for providing the Inference API
- Whisper for the state-of-the-art ASR model
- OpenAI for developing the Whisper model
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file bongovaad-0.5.0.tar.gz.
File metadata
- Download URL: bongovaad-0.5.0.tar.gz
- Upload date:
- Size: 8.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
aeb217c40f1d699a0971bc600805f21427bc7d3f45a564c9ef0f906873591704
|
|
| MD5 |
0611b7b3b12697886f9ed2580131639e
|
|
| BLAKE2b-256 |
cdb0d14c2c3f2ff0c98c92585341514fc6fa74874fd8e5ea176add3a3dadf754
|
File details
Details for the file bongovaad-0.5.0-py3-none-any.whl.
File metadata
- Download URL: bongovaad-0.5.0-py3-none-any.whl
- Upload date:
- Size: 8.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
43d53c1936fe50b5fb621a4f5813859d4db27cacd2783c86852c51718db8ef5f
|
|
| MD5 |
0660b0b32a856a1269d0bff7d71060ef
|
|
| BLAKE2b-256 |
08836b28bcb18e351418c549b8c351486e3783e3f2262c357fc6a5ad45f119e6
|