Skip to main content

A library for transcribing audio files using Whisper models

Project description

Whisper Transcriber

A Python library for transcribing audio files using Whisper models with intelligent silence detection and segmentation.

Installation

pip install whisper-transcriber

Requirements

  • Python 3.7 or higher
  • ffmpeg and ffprobe installed on your system

Features

  • Intelligent silence detection for natural segmentation
  • Adaptive audio analysis for optimal threshold detection
  • High-quality transcription using Whisper models
  • Support for various audio formats
  • Optional SRT subtitle output

Usage

Command Line

# Basic usage
whisper-transcribe audio_file.mp3

# Advanced usage
whisper-transcribe audio_file.mp3 -m openai/whisper-small \
  --min-segment 5 \
  --max-segment 15 \
  --silence-duration 0.2 \
  --sample-rate 16000 \
  --batch-size 8 \
  --normalize \
  --hf-token YOUR_HF_TOKEN \
  --no-timestamps

Available Arguments:

  • input: Input audio file or directory (required)
  • -o, --output: Output file path (optional)
  • -m, --model: Whisper model to use (default: openai/whisper-small)
  • --hf-token: HuggingFace API token
  • --min-segment: Minimum segment length in seconds (default: 5)
  • --max-segment: Maximum segment length in seconds (default: 15)
  • --silence-duration: Minimum silence duration in seconds (default: 0.2)
  • --sample-rate: Audio sample rate (default: 16000)
  • --batch-size: Batch size for transcription (default: 8)
  • --normalize: Normalize audio volume
  • --no-text-normalize: Skip text normalization
  • --no-timestamps: Don't print timestamps during processing

Python Library

from whisper_transcriber import WhisperTranscriber

# Initialize the transcriber
transcriber = WhisperTranscriber(model_name="openai/whisper-small", hf_token="YOUR_HF_TOKEN")

# Transcribe an audio file
results = transcriber.transcribe(
    "audio_file.mp3",
    min_segment=5,
    max_segment=15,
    silence_duration=0.2,
    sample_rate=16000,
    batch_size=8,
    normalize=True,
    normalize_text=True,
    print_timestamps=True
)

# Optionally save to an SRT file
# If you want to save the transcription, provide an output path
results = transcriber.transcribe(
    "audio_file.mp3",
    output="transcript.srt"
)

# Access the transcription results
for i, segment in enumerate(results):
    print(f"Segment {i+1}: {segment['transcript']}")

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

whisper_transcriber-0.1.5.tar.gz (13.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

whisper_transcriber-0.1.5-py3-none-any.whl (15.1 kB view details)

Uploaded Python 3

File details

Details for the file whisper_transcriber-0.1.5.tar.gz.

File metadata

  • Download URL: whisper_transcriber-0.1.5.tar.gz
  • Upload date:
  • Size: 13.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.10

File hashes

Hashes for whisper_transcriber-0.1.5.tar.gz
Algorithm Hash digest
SHA256 954415c9fe96506d6d1ddf027589180afd235c386f01ba84e06ba8f97a53f284
MD5 5b3dafb8a53fd5cc1ddfbee0c1e1078a
BLAKE2b-256 b8e3c432e4a61196fb261e6050ff453ed6f2d7e665efd3d201fe2595bdd05869

See more details on using hashes here.

File details

Details for the file whisper_transcriber-0.1.5-py3-none-any.whl.

File metadata

File hashes

Hashes for whisper_transcriber-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 5cba697fb626385c5b7e260816fc10e9805afbce19e5e0be7d4520bf8e368636
MD5 a42858d7ac3b98439eef0bfa583cbbc0
BLAKE2b-256 14fc248d8046ee0f8e7ee8f3ea8c07df18ae236bd46f59d2d61859e3515cda54

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page