Skip to main content

Bengali Speech Recognition Tool using Hugging Face Inference API

Project description

bongovaad (বঙ্গবাদ) PyPI version

bongovaad is a Python package for transcribing Bengali audio from YouTube videos. It uses the Hugging Face Inference API with Whisper models for high-quality speech recognition.

Features

  • Cloud-based Transcription: Uses Hugging Face Inference API with state-of-the-art Whisper models.
  • SRT Subtitle Generation: Automatically creates SRT subtitle files for video players.
  • Efficient Audio Processing: Handles audio segmentation for longer videos with progress tracking.
  • Concurrent Processing: Uses asynchronous requests for faster transcription of multiple segments.
  • Temporary File Management: Uses temporary directories for clean processing.
  • Robust Error Handling: Comprehensive error handling and logging.
  • Command-line Interface: Easy-to-use CLI with multiple options.

Requirements

  • Python 3.8 or higher
  • Hugging Face API key
  • ffmpeg

Installation

1. Install ffmpeg

Ubuntu/Debian:

sudo apt install ffmpeg -y

macOS (using Homebrew):

brew install ffmpeg

Windows (using Chocolatey):

choco install ffmpeg

2. Install bongovaad

pip install bongovaad

3. Get a Hugging Face API Key

  1. Create an account on Hugging Face
  2. Generate an API key at https://huggingface.co/settings/tokens
  3. Set it as an environment variable:
export HF_API_KEY="your_api_key_here"

Usage

Basic Usage

bongovaad --url "https://www.youtube.com/watch?v=VIDEO_ID"

Advanced Options

bongovaad --url "https://www.youtube.com/watch?v=VIDEO_ID" \
          --segment-length 10 \
          --output-format both \
          --model-id "openai/whisper-large-v3-turbo" \
          --api-key "your_api_key_here" \
          --verbose

Command-line Arguments

Argument Description Default
--url YouTube URL for audio transcription (Required)
--segment-length Length of each audio segment in seconds 8
--output-format Output format (txt, srt, or both) both
--api-key Hugging Face API key HF_API_KEY env var
--model-id Model ID to use for transcription openai/whisper-large-v3-turbo
--verbose Enable verbose logging False

Output Files

The tool generates two types of output files:

  1. Text File (VIDEO_ID.txt): Contains the full transcription text.
  2. SRT File (VIDEO_ID.srt): Contains time-coded subtitles compatible with video players.

Python API

You can also use bongovaad as a Python library:

import os
from bongovaad import BongoVaadTranscriber

# Get API key from environment variable or set it directly
api_key = os.environ.get("HF_API_KEY", "your_api_key_here")

# Initialize the transcriber
transcriber = BongoVaadTranscriber(
    api_key=api_key,
    model_id="openai/whisper-large-v3-turbo"
)

# Transcribe a YouTube video
output_files = transcriber.transcribe(
    url="https://www.youtube.com/watch?v=VIDEO_ID",
    segment_length_seconds=10,
    output_format="both"
)

# Print output file paths
print(f"Text file: {output_files['txt']}")
print(f"SRT file: {output_files['srt']}")

Performance Considerations

  • Processing time depends on video length, internet connection, and Hugging Face API response times.
  • Longer segment lengths may improve speed but could reduce accuracy for complex audio.
  • The API has rate limits, so be mindful of how many requests you make.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Contributing

Contributions are welcome! Here's how you can contribute:

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature-name
  3. Commit your changes: git commit -am 'Add some feature'
  4. Push to the branch: git push origin feature-name
  5. Submit a pull request

Acknowledgements

  • Hugging Face for providing the Inference API
  • Whisper for the state-of-the-art ASR model
  • OpenAI for developing the Whisper model

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bongovaad-0.5.0.tar.gz (8.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bongovaad-0.5.0-py3-none-any.whl (8.6 kB view details)

Uploaded Python 3

File details

Details for the file bongovaad-0.5.0.tar.gz.

File metadata

  • Download URL: bongovaad-0.5.0.tar.gz
  • Upload date:
  • Size: 8.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.0

File hashes

Hashes for bongovaad-0.5.0.tar.gz
Algorithm Hash digest
SHA256 aeb217c40f1d699a0971bc600805f21427bc7d3f45a564c9ef0f906873591704
MD5 0611b7b3b12697886f9ed2580131639e
BLAKE2b-256 cdb0d14c2c3f2ff0c98c92585341514fc6fa74874fd8e5ea176add3a3dadf754

See more details on using hashes here.

File details

Details for the file bongovaad-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: bongovaad-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 8.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.0

File hashes

Hashes for bongovaad-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 43d53c1936fe50b5fb621a4f5813859d4db27cacd2783c86852c51718db8ef5f
MD5 0660b0b32a856a1269d0bff7d71060ef
BLAKE2b-256 08836b28bcb18e351418c549b8c351486e3783e3f2262c357fc6a5ad45f119e6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page