OpenAI Whisper with Apple MPS support
Project description
atai-whisper-tool
⚡️⚡️⚡️ Long Audio Processing ⚡️⚡️⚡️
Parallel support for Whisper delivers at least 20x speed improvements on long audio files! Experience lightning-fast transcription:
atai-whisper-tool output_bushi.wav --speedup
atai-whisper-tool is a command-line tool that leverages the OpenAI Whisper model with Apple MPS support for efficient audio transcription and translation. It supports multiple output formats and a wide range of languages, making it a versatile tool for speech recognition tasks.
Features
- Automatic Speech Recognition (ASR): Transcribe audio files into text.
- Speech Translation: Translate spoken language into another language.
- Multiple Output Formats: Save results as plain text, JSON, SRT, VTT, TSV, or all available formats.
- Configurable Transcription Options: Customize parameters like model size, temperature, beam search settings, and more.
- Support for Multiple Languages: Auto-detect language or specify one from over 100 supported languages.
- Apple MPS Support: Optimized for Apple hardware using MPS for faster inference.
Installation
Install ffmpeg:
# on macOS using Homebrew (https://brew.sh/)
brew install ffmpeg
You can install the dependencies via pip:
pip install -r requirements.txt
Alternatively, if you are installing from the source distribution, ensure that you have the necessary files by including the MANIFEST.in:
include atai_whisper_tool/whisper/assets/mel_filters.npz
include atai_whisper_tool/whisper/assets/multilingual.tiktoken
include atai_whisper_tool/whisper/assets/gpt2.tiktoken
Installation from PyPI
If the package is published on PyPI, you can install it using:
pip install atai-whisper-tool
Usage
After installation, the tool is available as a command-line utility named atai-whisper-tool. Run the help command to see all available options:
atai-whisper-tool -h
The help output will look similar to:
usage: atai-whisper-tool [-h] [--model MODEL] [--output-name OUTPUT_NAME] [--output-dir OUTPUT_DIR] [--output-format {txt,vtt,srt,tsv,json,all}]
[--verbose VERBOSE] [--task {transcribe,translate}]
[--language {af,am,ar,...,Yiddish,Yoruba}]
[--temperature TEMPERATURE] [--best-of BEST_OF] [--patience PATIENCE] [--length-penalty LENGTH_PENALTY]
[--suppress-tokens SUPPRESS_TOKENS] [--initial-prompt INITIAL_PROMPT] [--condition-on-previous-text CONDITION_ON_PREVIOUS_TEXT]
[--fp16 FP16] [--compression-ratio-threshold COMPRESSION_RATIO_THRESHOLD] [--logprob-threshold LOGPROB_THRESHOLD]
[--no-speech-threshold NO_SPEECH_THRESHOLD] [--word-timestamps WORD_TIMESTAMPS] [--prepend-punctuations PREPEND_PUNCTUATIONS]
[--append-punctuations APPEND_PUNCTUATIONS] [--highlight-words HIGHLIGHT_WORDS] [--max-line-width MAX_LINE_WIDTH]
[--max-line-count MAX_LINE_COUNT] [--max-words-per-line MAX_WORDS_PER_LINE]
[--hallucination-silence-threshold HALLUCINATION_SILENCE_THRESHOLD] [--clip-timestamps CLIP_TIMESTAMPS]
audio [audio ...]
Below is a detailed usage guide for atai-whisper-tool that covers the most common scenarios and explains each of the key options.
Basic Usage
1. Transcribing Audio
Transcription converts spoken words in an audio file into text while preserving the original language.
- Example:
atai-whisper-tool audio.wavThis command uses the default model (usuallymlx-community/whisper-tiny), transcribes the audio inaudio.wav, and outputs the result as a text file (default format istxt) in the current directory.
2. Translating Audio
Translation not only transcribes the speech but also translates it into English. This is useful when the audio is in a non-English language.
- Example:
atai-whisper-tool audio.wav --task translate
This command will perform both transcription and translation, outputting the result in the chosen format.
Key Options Explained
Model Selection
--model MODEL- Description: Specify the model directory or Hugging Face repository to use.
- Default:
mlx-community/whisper-tiny - Usage Example:
atai-whisper-tool audio.wav --model path/to/your/model
Output Configuration
--output-name OUTPUT_NAME- Description: The base name for the generated output file(s).
--output-dir, -o OUTPUT_DIR- Description: Directory where the output files will be saved.
--output-format, -f {txt,vtt,srt,tsv,json,all}- Description: Choose the format for your output file.
- Example: To output as SRT (SubRip subtitle) file:
atai-whisper-tool audio.wav --output-format srt
Task Type
--task {transcribe,translate}- Description: Choose whether to transcribe the audio (retain the original language) or translate it into English.
- Usage Example (Transcribe):
atai-whisper-tool audio.wav --task transcribe
- Usage Example (Translate):
atai-whisper-tool audio.wav --task translate
Language Options
--language {list...}- Description: Specify the language spoken in the audio. When not provided and using translation, the tool can auto-detect the language.
- Example: If you know the audio is in Spanish:
atai-whisper-tool audio.wav --language es
Verbosity and Debugging
--verbose VERBOSE- Description: Control whether detailed progress and debugging messages are printed during processing.
- Default: True
Decoding & Sampling Parameters
These options allow fine-tuning of the transcription/translation process:
--temperature TEMPERATURE- Description: Sampling temperature. A value of 0 means deterministic decoding.
--best-of BEST_OF- Description: When using non-zero temperature, the number of candidate outputs to consider.
--patience PATIENCEand--length-penalty LENGTH_PENALTY- Description: Advanced beam decoding parameters to control output quality.
--compression-ratio-threshold COMPRESSION_RATIO_THRESHOLD- Description: Threshold for filtering out repetitive outputs.
--logprob-threshold LOGPROB_THRESHOLD- Description: Threshold for the average log probability to decide if decoding is successful.
--no-speech-threshold NO_SPEECH_THRESHOLD- Description: Defines a threshold to determine if a segment contains speech.
Advanced Timing & Formatting Options
For subtitle generation or word-level timing:
--word-timestamps WORD_TIMESTAMPS- Description: If set, extracts detailed word-level timestamps.
--prepend-punctuationsand--append-punctuations- Description: Define punctuation handling when using word timestamps.
--highlight-words HIGHLIGHT_WORDS- Description: Underlines words in subtitle outputs (requires word timestamps).
- Options like
--max-line-width,--max-line-count, and--max-words-per-linehelp format the text for subtitle files. --clip-timestamps CLIP_TIMESTAMPS- Description: Process only specified clips from the audio by providing start and end timestamps (in seconds).
Common Usage Examples
Example 1: Basic Transcription with Default Settings
atai-whisper-tool audio.wav
- Outcome: Transcribes
audio.wavusing the default model and outputs a text file.
Example 2: Transcription with Custom Output
atai-whisper-tool audio.wav --output-name my_transcript --output-dir ./transcripts --output-format json
- Outcome: Transcribes the audio file and saves the output as
my_transcript.jsonin the./transcriptsdirectory.
Example 3: Translation of a Non-English Audio File
atai-whisper-tool audio.wav --task translate --language fr --output-format srt
- Outcome: Translates the French audio to English and outputs an SRT subtitle file.
Example 4: Using Advanced Decoding Options
atai-whisper-tool audio.wav --temperature 0.2 --best-of 5 --logprob-threshold -1.0 --compression-ratio-threshold 2.4
- Outcome: Fine-tunes the transcription process with custom sampling and decoding parameters for improved quality.
License
This project is licensed under the MIT License.
Note
Most of the codes from mlx_whisper
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file atai_whisper_tool-0.0.7.tar.gz.
File metadata
- Download URL: atai_whisper_tool-0.0.7.tar.gz
- Upload date:
- Size: 785.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a9553fc1922a1343c955925e570c983de3c4400befcb66f8ba5648553d892cec
|
|
| MD5 |
b4e75562f06e12ccd9fb1ae34f5df7a9
|
|
| BLAKE2b-256 |
bd86cd9e76bba41339acdf179049ef0d4b22356d363ecd9d8d1bf31c6a372e8c
|
File details
Details for the file atai_whisper_tool-0.0.7-py3-none-any.whl.
File metadata
- Download URL: atai_whisper_tool-0.0.7-py3-none-any.whl
- Upload date:
- Size: 787.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7bf9aec9ff54cfb170992aa7a7a53a765f31f0b0d0ae416d846edf884033f095
|
|
| MD5 |
c32d6e218dccee49768a7aea85274bcc
|
|
| BLAKE2b-256 |
a73549ba7de8a75be442ebb108f4ced73fdce25f63bba623ca2a9044ee6212df
|