OpenAI Whisper with Apple MPS support

These details have not been verified by PyPI

Project links

Project description

atai-whisper-tool

atai-whisper-tool is a command-line tool that leverages the OpenAI Whisper model with Apple MPS support for efficient audio transcription and translation. It supports multiple output formats and a wide range of languages, making it a versatile tool for speech recognition tasks.

Features

Automatic Speech Recognition (ASR): Transcribe audio files into text.
Speech Translation: Translate spoken language into another language.
Multiple Output Formats: Save results as plain text, JSON, SRT, VTT, TSV, or all available formats.
Configurable Transcription Options: Customize parameters like model size, temperature, beam search settings, and more.
Support for Multiple Languages: Auto-detect language or specify one from over 100 supported languages.
Apple MPS Support: Optimized for Apple hardware using MPS for faster inference.

Installation

Install ffmpeg:

# on macOS using Homebrew (https://brew.sh/)
brew install ffmpeg

You can install the dependencies via pip:

pip install -r requirements.txt

Alternatively, if you are installing from the source distribution, ensure that you have the necessary files by including the MANIFEST.in:

include atai_whisper_tool/whisper/assets/mel_filters.npz
include atai_whisper_tool/whisper/assets/multilingual.tiktoken
include atai_whisper_tool/whisper/assets/gpt2.tiktoken

Installation from PyPI

If the package is published on PyPI, you can install it using:

pip install atai-whisper-tool

Usage

After installation, the tool is available as a command-line utility named atai-whisper-tool. Run the help command to see all available options:

atai-whisper-tool -h

The help output will look similar to:

usage: atai-whisper-tool [-h] [--model MODEL] [--output-name OUTPUT_NAME] [--output-dir OUTPUT_DIR] [--output-format {txt,vtt,srt,tsv,json,all}]
                         [--verbose VERBOSE] [--task {transcribe,translate}]
                         [--language {af,am,ar,...,Yiddish,Yoruba}]
                         [--temperature TEMPERATURE] [--best-of BEST_OF] [--patience PATIENCE] [--length-penalty LENGTH_PENALTY]
                         [--suppress-tokens SUPPRESS_TOKENS] [--initial-prompt INITIAL_PROMPT] [--condition-on-previous-text CONDITION_ON_PREVIOUS_TEXT]
                         [--fp16 FP16] [--compression-ratio-threshold COMPRESSION_RATIO_THRESHOLD] [--logprob-threshold LOGPROB_THRESHOLD]
                         [--no-speech-threshold NO_SPEECH_THRESHOLD] [--word-timestamps WORD_TIMESTAMPS] [--prepend-punctuations PREPEND_PUNCTUATIONS]
                         [--append-punctuations APPEND_PUNCTUATIONS] [--highlight-words HIGHLIGHT_WORDS] [--max-line-width MAX_LINE_WIDTH]
                         [--max-line-count MAX_LINE_COUNT] [--max-words-per-line MAX_WORDS_PER_LINE]
                         [--hallucination-silence-threshold HALLUCINATION_SILENCE_THRESHOLD] [--clip-timestamps CLIP_TIMESTAMPS]
                         audio [audio ...]

Below is a detailed usage guide for atai-whisper-tool that covers the most common scenarios and explains each of the key options.

Basic Usage

1. Transcribing Audio

Transcription converts spoken words in an audio file into text while preserving the original language.

Example:
```
atai-whisper-tool audio.wav
```
This command uses the default model (usually mlx-community/whisper-tiny), transcribes the audio in audio.wav, and outputs the result as a text file (default format is txt) in the current directory.

2. Translating Audio

Translation not only transcribes the speech but also translates it into English. This is useful when the audio is in a non-English language.

Example:
```
atai-whisper-tool audio.wav --task translate
```
This command will perform both transcription and translation, outputting the result in the chosen format.

Key Options Explained

Model Selection

--model MODEL
- Description: Specify the model directory or Hugging Face repository to use.
- Default: mlx-community/whisper-tiny
- Usage Example:
```
atai-whisper-tool audio.wav --model path/to/your/model
```

Output Configuration

--output-name OUTPUT_NAME
- Description: The base name for the generated output file(s).
--output-dir, -o OUTPUT_DIR
- Description: Directory where the output files will be saved.
--output-format, -f {txt,vtt,srt,tsv,json,all}
- Description: Choose the format for your output file.
- Example: To output as SRT (SubRip subtitle) file:
```
atai-whisper-tool audio.wav --output-format srt
```

Task Type

--task {transcribe,translate}
- Description: Choose whether to transcribe the audio (retain the original language) or translate it into English.
- Usage Example (Transcribe):
```
atai-whisper-tool audio.wav --task transcribe
```
- Usage Example (Translate):
```
atai-whisper-tool audio.wav --task translate
```

Language Options

--language {list...}
- Description: Specify the language spoken in the audio. When not provided and using translation, the tool can auto-detect the language.
- Example: If you know the audio is in Spanish:
```
atai-whisper-tool audio.wav --language es
```

Verbosity and Debugging

--verbose VERBOSE
- Description: Control whether detailed progress and debugging messages are printed during processing.
- Default: True

Decoding & Sampling Parameters

These options allow fine-tuning of the transcription/translation process:

--temperature TEMPERATURE
- Description: Sampling temperature. A value of 0 means deterministic decoding.
--best-of BEST_OF
- Description: When using non-zero temperature, the number of candidate outputs to consider.
--patience PATIENCE and --length-penalty LENGTH_PENALTY
- Description: Advanced beam decoding parameters to control output quality.
--compression-ratio-threshold COMPRESSION_RATIO_THRESHOLD
- Description: Threshold for filtering out repetitive outputs.
--logprob-threshold LOGPROB_THRESHOLD
- Description: Threshold for the average log probability to decide if decoding is successful.
--no-speech-threshold NO_SPEECH_THRESHOLD
- Description: Defines a threshold to determine if a segment contains speech.

Advanced Timing & Formatting Options

For subtitle generation or word-level timing:

--word-timestamps WORD_TIMESTAMPS
- Description: If set, extracts detailed word-level timestamps.
--prepend-punctuations and --append-punctuations
- Description: Define punctuation handling when using word timestamps.
--highlight-words HIGHLIGHT_WORDS
- Description: Underlines words in subtitle outputs (requires word timestamps).
Options like --max-line-width, --max-line-count, and --max-words-per-line help format the text for subtitle files.
--clip-timestamps CLIP_TIMESTAMPS
- Description: Process only specified clips from the audio by providing start and end timestamps (in seconds).

Common Usage Examples

Example 1: Basic Transcription with Default Settings

atai-whisper-tool audio.wav

Outcome: Transcribes audio.wav using the default model and outputs a text file.

Example 2: Transcription with Custom Output

atai-whisper-tool audio.wav --output-name my_transcript --output-dir ./transcripts --output-format json

Outcome: Transcribes the audio file and saves the output as my_transcript.json in the ./transcripts directory.

Example 3: Translation of a Non-English Audio File

atai-whisper-tool audio.wav --task translate --language fr --output-format srt

Outcome: Translates the French audio to English and outputs an SRT subtitle file.

Example 4: Using Advanced Decoding Options

atai-whisper-tool audio.wav --temperature 0.2 --best-of 5 --logprob-threshold -1.0 --compression-ratio-threshold 2.4

Outcome: Fine-tunes the transcription process with custom sampling and decoding parameters for improved quality.

License

This project is licensed under the MIT License.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.0.7

Mar 10, 2025

0.0.6

Mar 4, 2025

0.0.5

Mar 4, 2025

0.0.4

Mar 4, 2025

0.0.3

Mar 3, 2025

This version

0.0.2

Mar 3, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

atai_whisper_tool-0.0.2.tar.gz (783.6 kB view details)

Uploaded Mar 3, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

atai_whisper_tool-0.0.2-py3-none-any.whl (785.5 kB view details)

Uploaded Mar 3, 2025 Python 3

File details

Details for the file atai_whisper_tool-0.0.2.tar.gz.

File metadata

Download URL: atai_whisper_tool-0.0.2.tar.gz
Upload date: Mar 3, 2025
Size: 783.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for atai_whisper_tool-0.0.2.tar.gz
Algorithm	Hash digest
SHA256	`81a7b98b257f93ac5e07f326aac142c7abb01fce631722ed80ce789d5a26cf33`
MD5	`a13308d38ee249d5bece209c80601e95`
BLAKE2b-256	`88a29420258816d8b0930bc036b9a44e2ae49474c13b7306453f67607c5927c1`

See more details on using hashes here.

File details

Details for the file atai_whisper_tool-0.0.2-py3-none-any.whl.

File metadata

Download URL: atai_whisper_tool-0.0.2-py3-none-any.whl
Upload date: Mar 3, 2025
Size: 785.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for atai_whisper_tool-0.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2082b0d0860a545c510d89acc442718d4137caa9123e0a276e7c7b6305869a7b`
MD5	`7ee9a850ba87943fdff041796cf345df`
BLAKE2b-256	`5d4137d5e0fb7c8d2bee36fdea824270306b7f85911e5cc93b94f47ca5dd43a5`

See more details on using hashes here.

atai-whisper-tool 0.0.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

atai-whisper-tool

Features

Installation

Installation from PyPI

Usage

Basic Usage

1. Transcribing Audio

2. Translating Audio

Key Options Explained

Model Selection

Output Configuration

Task Type

Language Options

Verbosity and Debugging

Decoding & Sampling Parameters

Advanced Timing & Formatting Options

Common Usage Examples

Example 1: Basic Transcription with Default Settings

Example 2: Transcription with Custom Output

Example 3: Translation of a Non-English Audio File

Example 4: Using Advanced Decoding Options

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes