A library for transcribing audio files using Whisper models
Project description
Whisper Transcriber
A Python library for transcribing audio files using Whisper models with intelligent silence detection and segmentation.
Installation
pip install whisper-transcriber
Requirements
- Python 3.7 or higher
- ffmpeg and ffprobe installed on your system
Features
- Intelligent silence detection for natural segmentation
- Adaptive audio analysis for optimal threshold detection
- Memory-efficient processing of large audio files
- Parallel processing for faster silence detection
- Two-pass transcription for improved segment boundaries
- Enhanced generation parameters for controlling output quality
- High-quality transcription using Whisper models
- Support for various audio formats
- Optional SRT subtitle output
- Control over transcript output (quiet mode, JSON output)
- Verbose/silent operation modes
Usage
Command Line
# Basic usage
whisper-transcribe audio_file.mp3
# Advanced usage
whisper-transcribe audio_file.mp3 -m openai/whisper-small \
--min-segment 5 \
--max-segment 15 \
--silence-duration 0.2 \
--sample-rate 16000 \
--batch-size 8 \
--normalize \
--hf-token YOUR_HF_TOKEN \
--no-timestamps
# Memory-efficient processing with parallel jobs
whisper-transcribe long_audio.mp3 --chunk-size 900 --parallel-jobs 4
# Enhanced generation with two-pass transcription
whisper-transcribe podcast.mp3 --two-pass --temperature 0.1 --num-beams 8
# Run in quiet mode (no transcript printing during processing)
whisper-transcribe audio_file.mp3 --quiet
# Output results as JSON
whisper-transcribe audio_file.mp3 --json
Available Arguments:
input: Input audio file or directory (required)-o, --output: Output file path (optional)-m, --model: Whisper model to use (default: openai/whisper-small)--hf-token: HuggingFace API token--min-segment: Minimum segment length in seconds (default: 5)--max-segment: Maximum segment length in seconds (default: 15)--silence-duration: Minimum silence duration in seconds (default: 0.2)--sample-rate: Audio sample rate (default: 16000)--batch-size: Batch size for transcription (default: 8)--normalize: Normalize audio volume--no-text-normalize: Skip text normalization--no-timestamps: Don't print timestamps during processing--quiet: Run in quiet mode (suppress transcript printing)--json: Output results as JSON instead of text--chunk-size: Size of audio chunks in seconds for memory-efficient processing (default: 600)--parallel-jobs: Number of parallel jobs for silence detection (default: automatic)--two-pass: Use two-pass transcription for improved segment boundaries--temperature: Temperature for sampling, higher values make output more random (default: 0.0)--top-p: Top-p sampling probability threshold (default: None)--num-beams: Number of beams for beam search (default: 5)
Python Library
from whisper_transcriber import WhisperTranscriber
# Initialize the transcriber
transcriber = WhisperTranscriber(model_name="openai/whisper-small", hf_token="YOUR_HF_TOKEN")
# Basic transcription
results = transcriber.transcribe(
"audio_file.mp3",
min_segment=5,
max_segment=15,
silence_duration=0.2,
sample_rate=16000,
batch_size=8,
normalize=True,
normalize_text=True,
print_timestamps=True,
verbose=True
)
# Advanced transcription with memory optimization and enhanced generation
results = transcriber.transcribe(
"long_audio.mp3",
output="transcript.srt",
min_segment=5,
max_segment=15,
silence_duration=0.2,
sample_rate=16000,
batch_size=8,
normalize=True,
normalize_text=True,
print_timestamps=True,
verbose=True,
# New advanced parameters
two_pass=True, # Use two-pass transcription for better segments
chunk_size=900, # Process in 15-min chunks (memory efficient)
parallel_jobs=4, # Use 4 parallel processes for silence detection
temperature=1, # Deterministic output (0.0) or add randomness (>0.0)
top_p=0.95, # Only used when temperature > 0
num_beams=5, # Number of beams for beam search (higher = better quality but slower)
language="en" # Specify language for transcription (e.g., "en" for English)
)
# Access the transcription results manually
for i, segment in enumerate(results):
print(f"\n[{segment['start']} --> {segment['end']}]")
print(f"Segment {i+1}: {segment['transcript']}")
Parameters Explained
model_name: Which Whisper model to use (e.g., "openai/whisper-tiny", "openai/whisper-small", "openai/whisper-medium", "openai/whisper-large")min_segment: Minimum length in seconds for audio segments (shorter segments will be merged)max_segment: Maximum length in seconds for audio segments (longer segments will be split)silence_duration: How long a silence needs to be (in seconds) to be considered a natural break pointsample_rate: Audio sample rate in Hz for processingbatch_size: Number of segments to process at once (higher values use more memory but can be faster with GPU)normalize: Whether to normalize audio volumenormalize_text: Whether to normalize transcription textprint_timestamps: Whether to include timestamps when printing transcriptsverbose: Whether to print processing information and transcripts during transcription
Advanced Parameters
two_pass: Use two-pass transcription to refine segment boundaries based on linguistic analysischunk_size: Size of audio chunks in seconds for memory-efficient processing of large filesparallel_jobs: Number of parallel jobs for silence detection (None for automatic)temperature: Controls randomness in generation (0.0 for deterministic, higher for more variety)top_p: Top-p probability threshold for nucleus sampling (between 0 and 1)num_beams: Number of beams for beam search during generation (higher values = better quality but slower)language: Target language code for transcription (e.g., 'en' for English, 'fr' for French, etc.)
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
whisper_transcriber-0.2.6.tar.gz
(22.3 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file whisper_transcriber-0.2.6.tar.gz.
File metadata
- Download URL: whisper_transcriber-0.2.6.tar.gz
- Upload date:
- Size: 22.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
886d5dd9686c7ebb2b604567e054e4b7a1fbbaaf2a6260e30c42060de0592efe
|
|
| MD5 |
816c76fb1694b82305495e9ba4a96e5e
|
|
| BLAKE2b-256 |
dbf93136cb772f6c6e2aecbba0677a8456353c85a854ffca43d716324d1ac4a3
|
File details
Details for the file whisper_transcriber-0.2.6-py3-none-any.whl.
File metadata
- Download URL: whisper_transcriber-0.2.6-py3-none-any.whl
- Upload date:
- Size: 22.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0b79d7a8f897347a43be6e3bb7f8bea895aaec1dd2d3cf96ccfa0bfec3503001
|
|
| MD5 |
bd2de17dbe131be4464b4b75eee099c3
|
|
| BLAKE2b-256 |
5d649084b81874198f2e25d9524f3a53392c3a9cb37f200b659df6ebc507b71d
|