A streamlined Speech-to-Text pipeline for Whisper using CTranslate2
Project description
WhisperS2T-Reborn ⚡
A Streamlined Speech-to-Text Pipeline for Whisper Models using CTranslate2
WhisperS2T-Reborn is a streamlined fork of the original WhisperS2T project, focused exclusively on the CTranslate2 backend for fast and efficient speech transcription.
What's Different from the Original?
This fork simplifies the original WhisperS2T by:
- Single Backend Focus: Removed TensorRT-LLM, HuggingFace, and OpenAI backends—CTranslate2 only
- Curated Model Selection: Uses optimized CTranslate2 whisper models from ctranslate2-4you on HuggingFace
- Cleaner Codebase: Streamlined architecture with reduced dependencies
- Simplified Setup: Easier installation without complex backend configurations
Features
- 🚀 Fast Inference: CTranslate2 backend provides excellent speed/accuracy tradeoff
- 🎙️ Built-in VAD: Integrated Voice Activity Detection using NeMo's Marblenet models
- 🎧 Flexible Audio Input: Handles both small and large audio files efficiently
- 🌐 Multi-language Support: Transcription and translation for 99+ languages
- ⏱️ Word-level Timestamps: Optional word alignment for precise timing
- 📝 Multiple Export Formats: Export to TXT, JSON, TSV, SRT, and VTT
Supported Models
| Model | English-only | Multilingual |
|---|---|---|
| tiny | ✅ tiny.en | ✅ tiny |
| base | ✅ base.en | ✅ base |
| small | ✅ small.en | ✅ small |
| medium | ✅ medium.en | ✅ medium |
| large-v3 | — | ✅ large-v3 |
| distil-small.en | ✅ | — |
| distil-medium.en | ✅ | — |
| distil-large-v3 | — | ✅ |
Installation
Prerequisites
Install FFmpeg for audio processing:
Ubuntu/Debian:
apt-get install -y libsndfile1 ffmpeg
macOS:
brew install ffmpeg
Conda (any platform):
conda install conda-forge::ffmpeg
Install WhisperS2T-Reborn
CPU only:
pip install whisper-s2t-reborn
With GPU support (recommended for faster inference):
pip install whisper-s2t-reborn[gpu]
Note: The
[gpu]extra installs NVIDIA CUDA libraries required for GPU acceleration with CTranslate2. Requires an NVIDIA GPU with compatible drivers.
Quick Start
Basic Transcription
import whisper_s2t
# Load model (downloads automatically on first use)
model = whisper_s2t.load_model(model_identifier="large-v3")
# Transcribe with VAD
files = ['audio/sample.wav']
out = model.transcribe_with_vad(files,
lang_codes=['en'],
tasks=['transcribe'],
initial_prompts=[None],
batch_size=32)
print(out[0][0])
# {'text': 'Your transcribed text here...',
# 'avg_logprob': -0.25,
# 'no_speech_prob': 0.0001,
# 'start_time': 0.0,
# 'end_time': 24.8}
With Word Timestamps
model = whisper_s2t.load_model("large-v3", asr_options={'word_timestamps': True})
out = model.transcribe_with_vad(files,
lang_codes=['en'],
tasks=['transcribe'],
initial_prompts=[None],
batch_size=32)
Export Transcripts
from whisper_s2t import write_outputs
# Export to various formats
write_outputs(out, format='srt', save_dir='./output/')
write_outputs(out, format='vtt', save_dir='./output/')
write_outputs(out, format='json', save_dir='./output/')
Translation
# Translate non-English audio to English
out = model.transcribe_with_vad(files,
lang_codes=['fr'], # Source language
tasks=['translate'], # Translate to English
initial_prompts=[None],
batch_size=32)
Configuration Options
Model Loading Options
model = whisper_s2t.load_model(
model_identifier="large-v3", # Model name or path
device="cuda", # "cuda" or "cpu"
compute_type="float16", # "float16", "float32", or "bfloat16"
asr_options={
'beam_size': 5,
'word_timestamps': False,
'repetition_penalty': 1.01,
}
)
Transcription Options
out = model.transcribe_with_vad(
files,
lang_codes=['en'], # Language codes for each file
tasks=['transcribe'], # 'transcribe' or 'translate'
initial_prompts=[None], # Optional prompts for each file
batch_size=32 # Batch size for inference
)
Acknowledgements
- Original WhisperS2T: This project is a fork of WhisperS2T by Shashi Kant Gupta
- OpenAI Whisper: The foundational Whisper model
- CTranslate2: Fast inference engine for Transformer models
- NVIDIA NeMo: VAD models used in this pipeline
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file whisper_s2t_reborn-1.4.0.tar.gz.
File metadata
- Download URL: whisper_s2t_reborn-1.4.0.tar.gz
- Upload date:
- Size: 1.4 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
30561b2a9a9c724ea7264b85de5ed434f0daa2217fc3bf8d40418f736c61cb5f
|
|
| MD5 |
9f7aa9c6cade77df4d7eb0a7894e6b8e
|
|
| BLAKE2b-256 |
6c0347dcc66038d570acadb85ed7d7d04a17ba6aac91239548b0cb00d66e8244
|
Provenance
The following attestation bundles were made for whisper_s2t_reborn-1.4.0.tar.gz:
Publisher:
publish.yml on BBC-Esq/WhisperS2T-reborn
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
whisper_s2t_reborn-1.4.0.tar.gz -
Subject digest:
30561b2a9a9c724ea7264b85de5ed434f0daa2217fc3bf8d40418f736c61cb5f - Sigstore transparency entry: 854899658
- Sigstore integration time:
-
Permalink:
BBC-Esq/WhisperS2T-reborn@42efce62bea651caecccb620ab1601ad9f107857 -
Branch / Tag:
refs/tags/1.4.0 - Owner: https://github.com/BBC-Esq
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@42efce62bea651caecccb620ab1601ad9f107857 -
Trigger Event:
release
-
Statement type:
File details
Details for the file whisper_s2t_reborn-1.4.0-py3-none-any.whl.
File metadata
- Download URL: whisper_s2t_reborn-1.4.0-py3-none-any.whl
- Upload date:
- Size: 1.4 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a9bedf1b17b5fbc50c8dd12cb710ff101f68726425f29f35014f3858d12f7f29
|
|
| MD5 |
614bc14df3ce45223256f83f67e64e56
|
|
| BLAKE2b-256 |
800decc1d5aae235b070ef8643b2698e004a130a2acd115015f397236a0a74b2
|
Provenance
The following attestation bundles were made for whisper_s2t_reborn-1.4.0-py3-none-any.whl:
Publisher:
publish.yml on BBC-Esq/WhisperS2T-reborn
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
whisper_s2t_reborn-1.4.0-py3-none-any.whl -
Subject digest:
a9bedf1b17b5fbc50c8dd12cb710ff101f68726425f29f35014f3858d12f7f29 - Sigstore transparency entry: 854899664
- Sigstore integration time:
-
Permalink:
BBC-Esq/WhisperS2T-reborn@42efce62bea651caecccb620ab1601ad9f107857 -
Branch / Tag:
refs/tags/1.4.0 - Owner: https://github.com/BBC-Esq
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@42efce62bea651caecccb620ab1601ad9f107857 -
Trigger Event:
release
-
Statement type: