A streamlined Speech-to-Text pipeline for Whisper using CTranslate2

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

AI_for_Lawyers

Project description

WhisperS2T-Reborn ⚡

A Streamlined Speech-to-Text Pipeline for Whisper Models using CTranslate2

WhisperS2T-Reborn is a streamlined fork of the original WhisperS2T project, focused exclusively on the CTranslate2 backend for fast and efficient speech transcription.

What's Different from the Original?

This fork simplifies the original WhisperS2T by:

Single Backend Focus: Removed TensorRT-LLM, HuggingFace, and OpenAI backends—CTranslate2 only
Curated Model Selection: Uses optimized CTranslate2 whisper models from ctranslate2-4you on HuggingFace
Cleaner Codebase: Streamlined architecture with reduced dependencies
Simplified Setup: Easier installation without complex backend configurations

Features

🚀 Fast Inference: CTranslate2 backend provides excellent speed/accuracy tradeoff
🎙️ Built-in VAD: Integrated Voice Activity Detection using NeMo's Marblenet models
🎧 Flexible Audio Input: Handles both small and large audio files efficiently
🌐 Multi-language Support: Transcription and translation for 99+ languages
⏱️ Word-level Timestamps: Optional word alignment for precise timing
📝 Multiple Export Formats: Export to TXT, JSON, TSV, SRT, and VTT

Supported Models

Model	English-only	Multilingual
tiny	✅ tiny.en	✅ tiny
base	✅ base.en	✅ base
small	✅ small.en	✅ small
medium	✅ medium.en	✅ medium
large-v3	—	✅ large-v3
distil-small.en	✅	—
distil-medium.en	✅	—
distil-large-v3	—	✅

Installation

Prerequisites

Install FFmpeg for audio processing:

Ubuntu/Debian:

apt-get install -y libsndfile1 ffmpeg

macOS:

brew install ffmpeg

Conda (any platform):

conda install conda-forge::ffmpeg

Install WhisperS2T-Reborn

CPU only:

pip install whisper-s2t-reborn

With GPU support (recommended for faster inference):

pip install whisper-s2t-reborn[gpu]

Note: The [gpu] extra installs NVIDIA CUDA libraries required for GPU acceleration with CTranslate2. Requires an NVIDIA GPU with compatible drivers.

Quick Start

Basic Transcription

import whisper_s2t

# Load model (downloads automatically on first use)
model = whisper_s2t.load_model(model_identifier="large-v3")

# Transcribe with VAD
files = ['audio/sample.wav']
out = model.transcribe_with_vad(files,
                                lang_codes=['en'],
                                tasks=['transcribe'],
                                initial_prompts=[None],
                                batch_size=32)

print(out[0][0])
# {'text': 'Your transcribed text here...',
#  'avg_logprob': -0.25,
#  'no_speech_prob': 0.0001,
#  'start_time': 0.0,
#  'end_time': 24.8}

With Word Timestamps

model = whisper_s2t.load_model("large-v3", asr_options={'word_timestamps': True})

out = model.transcribe_with_vad(files,
                                lang_codes=['en'],
                                tasks=['transcribe'],
                                initial_prompts=[None],
                                batch_size=32)

Export Transcripts

from whisper_s2t import write_outputs

# Export to various formats
write_outputs(out, format='srt', save_dir='./output/')
write_outputs(out, format='vtt', save_dir='./output/')
write_outputs(out, format='json', save_dir='./output/')

Translation

# Translate non-English audio to English
out = model.transcribe_with_vad(files,
                                lang_codes=['fr'],  # Source language
                                tasks=['translate'],  # Translate to English
                                initial_prompts=[None],
                                batch_size=32)

Configuration Options

Model Loading Options

model = whisper_s2t.load_model(
    model_identifier="large-v3",  # Model name or path
    device="cuda",                 # "cuda" or "cpu"
    compute_type="float16",        # "float16", "float32", or "bfloat16"
    asr_options={
        'beam_size': 5,
        'word_timestamps': False,
        'repetition_penalty': 1.01,
    }
)

Transcription Options

out = model.transcribe_with_vad(
    files,
    lang_codes=['en'],           # Language codes for each file
    tasks=['transcribe'],        # 'transcribe' or 'translate'
    initial_prompts=[None],      # Optional prompts for each file
    batch_size=32                # Batch size for inference
)

Acknowledgements

Original WhisperS2T: This project is a fork of WhisperS2T by Shashi Kant Gupta
OpenAI Whisper: The foundational Whisper model
CTranslate2: Fast inference engine for Transformer models
NVIDIA NeMo: VAD models used in this pipeline

Project details

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

AI_for_Lawyers

Release history Release notifications | RSS feed

1.6.0

Mar 24, 2026

1.5.2

Mar 10, 2026

1.5.1

Mar 3, 2026

1.5.0

Mar 3, 2026

1.4.6 yanked

Mar 3, 2026

Reason this release was yanked:

mistake in code

1.4.5

Feb 28, 2026

1.4.4

Jan 31, 2026

1.4.3

Jan 31, 2026

This version

1.4.2

Jan 31, 2026

1.4.1

Jan 27, 2026

1.4.0

Jan 26, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

whisper_s2t_reborn-1.4.2.tar.gz (1.4 MB view details)

Uploaded Jan 31, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

whisper_s2t_reborn-1.4.2-py3-none-any.whl (1.4 MB view details)

Uploaded Jan 31, 2026 Python 3

File details

Details for the file whisper_s2t_reborn-1.4.2.tar.gz.

File metadata

Download URL: whisper_s2t_reborn-1.4.2.tar.gz
Upload date: Jan 31, 2026
Size: 1.4 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for whisper_s2t_reborn-1.4.2.tar.gz
Algorithm	Hash digest
SHA256	`a3700d6beeb55a4b5d0cf44cf005b247e2d9cb8fcfe1a9f17c847b1c5e04bd2b`
MD5	`8463c160a61361ee48f9ca79b1e68cff`
BLAKE2b-256	`f072f49460a5da98550101cf19d401f01be2218935e312a09fa76db08f3422c8`

See more details on using hashes here.

Provenance

The following attestation bundles were made for whisper_s2t_reborn-1.4.2.tar.gz:

Publisher: publish.yml on BBC-Esq/WhisperS2T-reborn

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: whisper_s2t_reborn-1.4.2.tar.gz
- Subject digest: a3700d6beeb55a4b5d0cf44cf005b247e2d9cb8fcfe1a9f17c847b1c5e04bd2b
- Sigstore transparency entry: 885460030
- Sigstore integration time: Jan 31, 2026
Source repository:
- Permalink: BBC-Esq/WhisperS2T-reborn@d3da5ae309f7b3604d70580d1dfecc6ae98fd93a
- Branch / Tag: refs/tags/v1.4.2
- Owner: https://github.com/BBC-Esq
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@d3da5ae309f7b3604d70580d1dfecc6ae98fd93a
- Trigger Event: release

File details

Details for the file whisper_s2t_reborn-1.4.2-py3-none-any.whl.

File metadata

Download URL: whisper_s2t_reborn-1.4.2-py3-none-any.whl
Upload date: Jan 31, 2026
Size: 1.4 MB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for whisper_s2t_reborn-1.4.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6c29382195678cf34fe94dd670f3eaa3df60398e0d52d7d1dad54a460ddd0edb`
MD5	`2dd090ed4468cab7df91f07ce4b6b4e6`
BLAKE2b-256	`405e8f49d3f36049dea66c1e8c95d34bb223d63e6c4c2ecb3c4cc52fe0aa989b`

See more details on using hashes here.

Provenance

The following attestation bundles were made for whisper_s2t_reborn-1.4.2-py3-none-any.whl:

Publisher: publish.yml on BBC-Esq/WhisperS2T-reborn

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: whisper_s2t_reborn-1.4.2-py3-none-any.whl
- Subject digest: 6c29382195678cf34fe94dd670f3eaa3df60398e0d52d7d1dad54a460ddd0edb
- Sigstore transparency entry: 885460068
- Sigstore integration time: Jan 31, 2026
Source repository:
- Permalink: BBC-Esq/WhisperS2T-reborn@d3da5ae309f7b3604d70580d1dfecc6ae98fd93a
- Branch / Tag: refs/tags/v1.4.2
- Owner: https://github.com/BBC-Esq
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@d3da5ae309f7b3604d70580d1dfecc6ae98fd93a
- Trigger Event: release

whisper-s2t-reborn 1.4.2

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Project description

WhisperS2T-Reborn ⚡

What's Different from the Original?

Features

Supported Models

Installation

Prerequisites

Install WhisperS2T-Reborn

Quick Start

Basic Transcription

With Word Timestamps

Export Transcripts

Translation

Configuration Options

Model Loading Options

Transcription Options

Acknowledgements

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance