A library to bypass OpenAI Whisper's 25MB limit by chunking long audio files, with LLM-powered text enhancement and precise timestamp reconstruction for SRT/VTT formats.

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

bokamix

These details have not been verified by PyPI

Project description

Long Audio Whisper

A Python library to effortlessly transcribe long audio files using OpenAI's Whisper API.

long-audio-whisper solves a key limitation of OpenAI's Whisper API: the 25MB file size limit. It enables you to transcribe audio files of any length by intelligently splitting them into smaller chunks based on silence. For text-based outputs (text, json), it enhances the final transcription using an LLM for superior punctuation and formatting. For subtitle formats (srt, vtt), it masterfully reconstructs precise, word-level timestamps, ensuring professional-grade synchronization even for multi-hour recordings.

It supports a wide range of output formats: json, text, srt, verbose_json, and vtt.

Key Features

Transcribe Audio of Any Length: Seamlessly process audio files far exceeding the 25MB API limit.
Accurate Timestamp Reconstruction: Generate precise SRT, VTT, and verbose_json outputs for long audio files by intelligently chunking audio and stitching the results back together.
Multiple Output Formats: Choose from a variety of formats:
- text: Plain text, optionally enhanced by an LLM.
- json: Simple JSON object with the transcription text.
- srt / vtt: Subtitle files with accurate timing, even for large audio.
- verbose_json: Detailed JSON with word-level timestamps and segment data.
LLM-Powered Enhancement: For text and json outputs, use a model like GPT-4o to automatically correct punctuation, capitalization, and formatting.
Asynchronous & Efficient: Built with asyncio for concurrent processing of audio chunks, maximizing speed and efficiency.
Command-Line Interface: Comes with a ready-to-use example script that functions as a powerful CLI.

Installation

You can install the library directly from the source code.

First, clone the repository:

git clone https://github.com/bokamix/long-audio-whisper.git
cd long-audio-whisper

Then, install the package in editable mode:

pip install -e .

Usage

The library includes an example script that can be used as a command-line tool.

Prerequisites

Before you begin, make sure you have your OpenAI API key. You can set it as an environment variable:

export OPENAI_API_KEY="YOUR_OPENAI_API_KEY"

Alternatively, you can pass the key directly using the --api-key argument.

Command-Line Example

To transcribe an audio file, navigate to the examples directory and run the main.py script.

cd examples
python main.py /path/to/your/audio.mp3 --format srt

Arguments:

audio_file: (Required) The path to your audio file.
--format: (Optional) The desired output format. Choose from json, text, srt, verbose_json, vtt. Defaults to text.
--api-key: (Optional) Your OpenAI API key. Defaults to the OPENAI_API_KEY environment variable.
--prompt: (Optional) A custom system prompt for the LLM post-processor (used for text and json formats).

Library Usage

You can also import WhisperProcessor directly into your own Python scripts.

import asyncio
from long_audio_whisper import WhisperProcessor

async def transcribe_audio():
    processor = WhisperProcessor(api_key="YOUR_OPENAI_API_KEY")
    
    transcription = await processor.process(
        audio_file_path="/path/to/your/audio.mp3",
        response_format="text" # or "json", "srt", "vtt", "verbose_json"
    )
    
    print(transcription)

if __name__ == "__main__":
    asyncio.run(transcribe_audio())

How It Works

The library employs a sophisticated workflow tailored to the output format:

For text and json formats:
- The audio is split into chunks based on detected silence.
- Each chunk is transcribed into plain text concurrently.
- The resulting text snippets are combined.
- An LLM (e.g., GPT-4o) post-processes the full text to improve formatting and punctuation.
For srt, vtt, and verbose_json formats:
- For files under 25MB, the transcription is done in a single API call to ensure maximum accuracy.
- For files over 25MB, the audio is split into chunks, but this time, the exact start time of each chunk is recorded.
- Each chunk is transcribed using the verbose_json format to get word-level timestamps.
- The library then reconstructs the full transcription, carefully adjusting all timestamps based on the original start time of each chunk.
- Finally, it formats the reconstructed data into the desired srt, vtt, or verbose_json output.

This dual-path approach ensures both the highest quality text output and the most accurate timestamp synchronization, regardless of file size.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Project details

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

bokamix

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

1.0.0

Aug 21, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

long_audio_whisper-1.0.0.tar.gz (9.1 kB view details)

Uploaded Aug 21, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

long_audio_whisper-1.0.0-py3-none-any.whl (9.2 kB view details)

Uploaded Aug 21, 2025 Python 3

File details

Details for the file long_audio_whisper-1.0.0.tar.gz.

File metadata

Download URL: long_audio_whisper-1.0.0.tar.gz
Upload date: Aug 21, 2025
Size: 9.1 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for long_audio_whisper-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`3ccc212705d1785f44954382df2001da0d4e6a000931c0e5d29727938c259f91`
MD5	`72cff7edc686a9f365b18362b4510edd`
BLAKE2b-256	`f369662fbd6af7b2e04e08a416c4790ea09c442c0163d93f8b29d824e7fc07f2`

See more details on using hashes here.

Provenance

The following attestation bundles were made for long_audio_whisper-1.0.0.tar.gz:

Publisher: workflow.yml on bokamix/long-audio-whisper

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: long_audio_whisper-1.0.0.tar.gz
- Subject digest: 3ccc212705d1785f44954382df2001da0d4e6a000931c0e5d29727938c259f91
- Sigstore transparency entry: 416201608
- Sigstore integration time: Aug 21, 2025
Source repository:
- Permalink: bokamix/long-audio-whisper@3d4c4288c31756df3b120b0ce8c737c0d540651a
- Branch / Tag: refs/tags/v1.0.0.0
- Owner: https://github.com/bokamix
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: workflow.yml@3d4c4288c31756df3b120b0ce8c737c0d540651a
- Trigger Event: release

File details

Details for the file long_audio_whisper-1.0.0-py3-none-any.whl.

File metadata

Download URL: long_audio_whisper-1.0.0-py3-none-any.whl
Upload date: Aug 21, 2025
Size: 9.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for long_audio_whisper-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4da9337fcfdcabccd1bb7ead9dd59930eea3c2567ec021aa68755fbeafea42e1`
MD5	`80ea17d4f4395b7c504d7be9d9284e67`
BLAKE2b-256	`48dd853704f414ae445add7826996798e7408394a599e0919dcea7c6668cae86`

See more details on using hashes here.

Provenance

The following attestation bundles were made for long_audio_whisper-1.0.0-py3-none-any.whl:

Publisher: workflow.yml on bokamix/long-audio-whisper

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: long_audio_whisper-1.0.0-py3-none-any.whl
- Subject digest: 4da9337fcfdcabccd1bb7ead9dd59930eea3c2567ec021aa68755fbeafea42e1
- Sigstore transparency entry: 416201633
- Sigstore integration time: Aug 21, 2025
Source repository:
- Permalink: bokamix/long-audio-whisper@3d4c4288c31756df3b120b0ce8c737c0d540651a
- Branch / Tag: refs/tags/v1.0.0.0
- Owner: https://github.com/bokamix
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: workflow.yml@3d4c4288c31756df3b120b0ce8c737c0d540651a
- Trigger Event: release

long-audio-whisper 1.0.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Long Audio Whisper

Key Features

Installation

Usage

Prerequisites

Command-Line Example

Library Usage

How It Works

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance