Skip to main content

A streamlined Speech-to-Text pipeline for Whisper using CTranslate2

Project description

WhisperS2T-Reborn ⚡

A Streamlined Speech-to-Text Pipeline for Whisper Models using CTranslate2



WhisperS2T-Reborn is a streamlined fork of the original WhisperS2T project, focused exclusively on the CTranslate2 backend for fast and efficient speech transcription.

What's Different from the Original?

This fork simplifies the original WhisperS2T by:

  • Single Backend Focus: Removed TensorRT-LLM, HuggingFace, and OpenAI backends—CTranslate2 only
  • Curated Model Selection: Uses optimized CTranslate2 whisper models from ctranslate2-4you on HuggingFace
  • Cleaner Codebase: Streamlined architecture with reduced dependencies
  • Simplified Setup: Easier installation without complex backend configurations

Features

  • 🚀 Fast Inference: CTranslate2 backend provides excellent speed/accuracy tradeoff
  • 🎙️ Built-in VAD: Integrated Voice Activity Detection using NeMo's Marblenet models
  • 🎧 Flexible Audio Input: Handles both small and large audio files efficiently
  • 🌐 Multi-language Support: Transcription and translation for 99+ languages
  • ⏱️ Word-level Timestamps: Optional word alignment for precise timing
  • 📝 Multiple Export Formats: Export to TXT, JSON, TSV, SRT, and VTT

Supported Models

Model English-only Multilingual
tiny ✅ tiny.en ✅ tiny
base ✅ base.en ✅ base
small ✅ small.en ✅ small
medium ✅ medium.en ✅ medium
large-v3 ✅ large-v3
distil-small.en
distil-medium.en
distil-large-v3

Installation

Prerequisites

Install FFmpeg for audio processing:

Ubuntu/Debian:

apt-get install -y libsndfile1 ffmpeg

macOS:

brew install ffmpeg

Conda (any platform):

conda install conda-forge::ffmpeg

Install WhisperS2T-Reborn

CPU only:

pip install whisper-s2t-reborn

With GPU support (recommended for faster inference):

pip install whisper-s2t-reborn[gpu]

Note: The [gpu] extra installs NVIDIA CUDA libraries required for GPU acceleration with CTranslate2. Requires an NVIDIA GPU with compatible drivers.

Quick Start

Basic Transcription

import whisper_s2t

# Load model (downloads automatically on first use)
model = whisper_s2t.load_model(model_identifier="large-v3")

# Transcribe with VAD
files = ['audio/sample.wav']
out = model.transcribe_with_vad(files,
                                lang_codes=['en'],
                                tasks=['transcribe'],
                                initial_prompts=[None],
                                batch_size=32)

print(out[0][0])
# {'text': 'Your transcribed text here...',
#  'avg_logprob': -0.25,
#  'no_speech_prob': 0.0001,
#  'start_time': 0.0,
#  'end_time': 24.8}

With Word Timestamps

model = whisper_s2t.load_model("large-v3", asr_options={'word_timestamps': True})

out = model.transcribe_with_vad(files,
                                lang_codes=['en'],
                                tasks=['transcribe'],
                                initial_prompts=[None],
                                batch_size=32)

Export Transcripts

from whisper_s2t import write_outputs

# Export to various formats
write_outputs(out, format='srt', save_dir='./output/')
write_outputs(out, format='vtt', save_dir='./output/')
write_outputs(out, format='json', save_dir='./output/')

Translation

# Translate non-English audio to English
out = model.transcribe_with_vad(files,
                                lang_codes=['fr'],  # Source language
                                tasks=['translate'],  # Translate to English
                                initial_prompts=[None],
                                batch_size=32)

Configuration Options

Model Loading Options

model = whisper_s2t.load_model(
    model_identifier="large-v3",  # Model name or path
    device="cuda",                 # "cuda" or "cpu"
    compute_type="float16",        # "float16", "float32", or "bfloat16"
    asr_options={
        'beam_size': 5,
        'word_timestamps': False,
        'repetition_penalty': 1.01,
    }
)

Transcription Options

out = model.transcribe_with_vad(
    files,
    lang_codes=['en'],           # Language codes for each file
    tasks=['transcribe'],        # 'transcribe' or 'translate'
    initial_prompts=[None],      # Optional prompts for each file
    batch_size=32                # Batch size for inference
)

Acknowledgements

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

whisper_s2t_reborn-1.4.2.tar.gz (1.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

whisper_s2t_reborn-1.4.2-py3-none-any.whl (1.4 MB view details)

Uploaded Python 3

File details

Details for the file whisper_s2t_reborn-1.4.2.tar.gz.

File metadata

  • Download URL: whisper_s2t_reborn-1.4.2.tar.gz
  • Upload date:
  • Size: 1.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for whisper_s2t_reborn-1.4.2.tar.gz
Algorithm Hash digest
SHA256 a3700d6beeb55a4b5d0cf44cf005b247e2d9cb8fcfe1a9f17c847b1c5e04bd2b
MD5 8463c160a61361ee48f9ca79b1e68cff
BLAKE2b-256 f072f49460a5da98550101cf19d401f01be2218935e312a09fa76db08f3422c8

See more details on using hashes here.

Provenance

The following attestation bundles were made for whisper_s2t_reborn-1.4.2.tar.gz:

Publisher: publish.yml on BBC-Esq/WhisperS2T-reborn

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file whisper_s2t_reborn-1.4.2-py3-none-any.whl.

File metadata

File hashes

Hashes for whisper_s2t_reborn-1.4.2-py3-none-any.whl
Algorithm Hash digest
SHA256 6c29382195678cf34fe94dd670f3eaa3df60398e0d52d7d1dad54a460ddd0edb
MD5 2dd090ed4468cab7df91f07ce4b6b4e6
BLAKE2b-256 405e8f49d3f36049dea66c1e8c95d34bb223d63e6c4c2ecb3c4cc52fe0aa989b

See more details on using hashes here.

Provenance

The following attestation bundles were made for whisper_s2t_reborn-1.4.2-py3-none-any.whl:

Publisher: publish.yml on BBC-Esq/WhisperS2T-reborn

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page