Skip to main content

A command-line tool for transcribing audio files using OpenAI's Whisper model

Project description

Audio Transcribe

A command-line tool for transcribing audio files using OpenAI's Whisper model.

Features

  • Transcribe individual audio files with customizable options
  • Batch process multiple audio files in a directory
  • Support for various audio formats (MP3, WAV, M4A, OGG, FLAC)
  • Multiple Whisper model sizes (tiny, base, small, medium, large)
  • Progress indicators with rich terminal output
  • Optional JSON output with detailed transcription results

Installation

# Clone the repository
git clone https://github.com/samurmaykrr/audio_transcribe.git
cd audio_transcribe

# Install the package
pip install -e .

Usage

Single File Transcription

# Basic usage
audio-transcribe transcribe path/to/audio.mp3

# Specify output directory
audio-transcribe transcribe path/to/audio.mp3 -o output/directory

# Use a different model size
audio-transcribe transcribe path/to/audio.mp3 -m large

# Specify language (auto-detects if not specified)
audio-transcribe transcribe path/to/audio.mp3 -l en

# Save detailed results in JSON format
audio-transcribe transcribe path/to/audio.mp3 -j

# Translate to English
audio-transcribe transcribe path/to/audio.mp3 -t translate

Batch Processing

# Process all audio files in a directory
audio-transcribe batch path/to/audio/directory

# Specify output directory
audio-transcribe batch path/to/audio/directory -o output/directory

# Use a different model size
audio-transcribe batch path/to/audio/directory -m large

# Process specific file extensions
audio-transcribe batch path/to/audio/directory -e mp3 wav

Configuration Options

Model Sizes

  • tiny: Fastest, lowest accuracy
  • base: Good balance of speed and accuracy
  • small: Better accuracy, slower than base
  • medium: High accuracy, slower processing
  • large: Best accuracy, slowest processing

Supported Audio Formats

  • MP3
  • WAV
  • M4A
  • OGG
  • FLAC

Output Options

  • Text file output (default)
  • Optional JSON output with timestamps and confidence scores
  • Custom output directory specification

Requirements

  • Python 3.7 or higher
  • Dependencies:
    • openai-whisper
    • typer
    • rich

License

This project is licensed under the MIT License - see the LICENSE.md file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

audio_transcribe-0.1.0.tar.gz (7.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

audio_transcribe-0.1.0-py3-none-any.whl (8.6 kB view details)

Uploaded Python 3

File details

Details for the file audio_transcribe-0.1.0.tar.gz.

File metadata

  • Download URL: audio_transcribe-0.1.0.tar.gz
  • Upload date:
  • Size: 7.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for audio_transcribe-0.1.0.tar.gz
Algorithm Hash digest
SHA256 f546f674efe5fe64f8d2bf7a2f04d779ae3f1744d602a3cc6161a653758e0b68
MD5 149b8591badabbe25220e118b0809158
BLAKE2b-256 36d68ebbfb169790f96073d6e5bb839b4caed810f2aa96b7a62c546b9f9c044a

See more details on using hashes here.

File details

Details for the file audio_transcribe-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for audio_transcribe-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 671e0eb26408227b5913e8dcbd2ce74b4992878c70006c9c36ecb967ce31daab
MD5 ca714fec7275ab5aa700296d70b14e4e
BLAKE2b-256 c9fcbf78ee041158c8040760ad403bffdbff5129a95506ab68dabd9304c7cbe5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page