A command-line tool for audio transcription with Whisper and Pyannote.
Project description
Audio Scribe
A Command-Line Tool for Audio Transcription and Speaker Diarization Using OpenAI Whisper and Pyannote
Support the Project ☕
If you find Audio Scribe helpful, consider supporting the project with a coffee!
Your contribution helps maintain the project and develop new features.
Overview
Audio Scribe is a command-line tool that transcribes audio files with speaker diarization. Leveraging OpenAI Whisper for transcription and Pyannote Audio for speaker diarization, this solution converts audio into segmented text files, identifying each speaker turn. Key features include:
- Progress Bar & Resource Monitoring: See real-time CPU, memory, and GPU usage with a live progress bar.
- Speaker Diarization: Automatically separates speaker turns using Pyannote’s state-of-the-art models.
- Tab-Completion for File Paths: Easily navigate your file system when prompted for the audio path.
- Secure Token Storage: Encrypts and stores your Hugging Face token for private model downloads.
- Customizable Whisper Models: Default to
base.en, or specifytiny,small,medium,large, etc.
This repository is licensed under the Apache License 2.0.
Table of Contents
- Audio Scribe
Features
- Whisper Transcription
Utilizes OpenAI Whisper to convert speech to text in multiple languages. - Pyannote Speaker Diarization
Identifies different speakers and segments your audio output accordingly. - Progress Bar & Resource Usage
Displays a live progress bar with CPU, memory, and GPU stats through alive-progress, psutil, and GPUtil. - Tab-Completion
Press Tab to autocomplete file paths on Unix-like systems (and on Windows with pyreadline3). - Secure Token Storage
Saves your Hugging Face token via cryptography for model downloads (e.g.,pyannote/speaker-diarization-3.1). - Configurable Models
Default isbase.enbut you can specify any other Whisper model using--whisper-model.
Installation
Installing from PyPI
Audio Scribe is available on PyPI. You can install it with:
pip install audio-scribe
After installation, the audio-scribe command should be available in your terminal (depending on how your PATH is configured). If you prefer to run via Python module, you can also do:
python -m audio-scribe --audio path/to/yourfile.wav
Installing from GitHub
To install the latest development version directly from GitHub:
git clone https://gitlab.genomicops.cloud/innovation-hub/audio-scribe.git
cd audio-scribe
pip install -r requirements.txt
This approach is particularly useful if you want the newest changes or plan to contribute.
Quick Start
-
Obtain a Hugging Face Token
- Create a token at Hugging Face Settings.
- Accept the model conditions for
pyannote/segmentation-3.0andpyannote/speaker-diarization-3.1.
-
Run the Command-Line Tool
audio-scribe --audio path/to/audio.wav
On the first run, you’ll be prompted for your Hugging Face token if you haven’t stored one yet.
-
Watch the Progress Bar
- The tool displays a progress bar for each diarized speaker turn, along with real-time CPU, GPU, and memory usage.
Usage
Below is a summary of the main command-line options:
usage: audio-scribe [options]
Audio Transcription (Audio Scribe) Pipeline using Whisper + Pyannote, with optional progress bar.
optional arguments:
--audio PATH Path to the audio file to transcribe.
--token TOKEN HuggingFace API token. Overrides any saved token.
--output PATH Path to the output directory for transcripts and temporary files.
--delete-token Delete any stored Hugging Face token and exit.
--show-warnings Enable user warnings (e.g., from pyannote.audio). Disabled by default.
--whisper-model MODEL Specify the Whisper model to use (default: 'base.en').
Examples:
-
Basic Transcription
audio-scribe --audio meeting.wav
-
Specify a Different Whisper Model
audio-scribe --audio webinar.mp3 --whisper-model small
-
Delete a Stored Token
audio-scribe --delete-token -
Show Internal Warnings
audio-scribe --audio session.wav --show-warnings
-
Tab-Completion
audio-scribe # When prompted for an audio file path, press Tab to autocomplete
Dependencies
Core Libraries
Optional for Extended Functionality
- alive-progress – Real-time progress bar
- psutil – CPU/memory usage
- GPUtil – GPU usage
- pyreadline3 (for Windows tab-completion)
Sample requirements.txt
Below is a typical requirements.txt you can place in your repository:
torch>=1.9
openai-whisper
pyannote.audio
pytorch-lightning
cryptography
keyring
alive-progress
psutil
GPUtil
pyreadline3; sys_platform == "win32"
Note:
pyreadline3is appended with a PEP 508 marker (; sys_platform == "win32") so it only installs on Windows.- For GPU support, ensure you install a compatible PyTorch version with CUDA.
Troubleshooting
IndexError: list index out of range
Symptom
You encounter the following error when running audio-scribe or importing pyannote.audio:
IndexError: list index out of range
File ".../pyannote/audio/core/io.py", line 214, in __init__
backend = "soundfile" if "soundfile" in backends else backends[0]
This occurs when pyannote.audio is unable to detect any supported audio backend. Most commonly, the soundfile module is missing or its dependency libsndfile is not properly installed.
Solution
You have two ways to resolve this issue:
Option 1: System-level Installation (requires sudo access)
Install the system-level audio backend library:
sudo apt-get update
sudo apt-get install libsndfile1
Then reinstall the soundfile Python package inside your environment:
# If using conda
conda activate your-environment-name
pip uninstall soundfile -y
pip install soundfile
# If using pip/virtualenv
source your-venv/bin/activate # or equivalent activation command
pip uninstall soundfile -y
pip install soundfile
Option 2: Conda-only Installation (no sudo required)
Inside your Conda environment:
conda activate your-environment-name
conda install -c conda-forge libsndfile
Then ensure Python uses the correct bindings:
pip uninstall soundfile -y
pip install soundfile
Verification
Test that audio backends are now available:
python -c "import soundfile as sf; print(sf.available_formats())"
Expected output:
{'WAV': 'Microsoft WAV format (little endian)', 'FLAC': 'FLAC format', ...}
Then re-run audio-scribe:
audio-scribe --audio path/to/your/audio.wav
The tool should now initialize without error.
Contributing
We welcome contributions to Audio Scribe!
- Fork the repository and clone your fork.
- Create a new branch for your feature or bugfix.
- Implement your changes, ensuring code is well-documented and follows best practices.
- Open a pull request, detailing the changes you’ve made.
Please read any available guidelines or templates in our repository (such as CONTRIBUTING.md or CODE_OF_CONDUCT.md) before submitting.
License
This project is licensed under the Apache License 2.0.
Copyright 2025 Gurasis Osahan
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
Thank you for using Audio Scribe!
For questions or feedback, please open a GitHub issue or contact the maintainers.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file audio_scribe-0.1.6.tar.gz.
File metadata
- Download URL: audio_scribe-0.1.6.tar.gz
- Upload date:
- Size: 30.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
53337f69a02298a832bec85fc6f7a73ad091afed430b9c1f9b402eea3a44542d
|
|
| MD5 |
f7ea69f5ca1fce4bf9c01d1b13c24a8a
|
|
| BLAKE2b-256 |
b893164cbb78165c4df092c19b80e98a60823d5b2e41690b978eaad673d63aec
|
File details
Details for the file audio_scribe-0.1.6-py3-none-any.whl.
File metadata
- Download URL: audio_scribe-0.1.6-py3-none-any.whl
- Upload date:
- Size: 20.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d692c2ba0c73b9335465e477c2de05b87dd36be95e1a278159ca3f297cbb10fe
|
|
| MD5 |
308991d127295f5aa268b5f1b48d1503
|
|
| BLAKE2b-256 |
1bb6b5bcd261518d0ac65443f86f17d98b7a53ad8fcf724873c322c537e94e06
|