High-performance audio and video transcription using whisper.cpp with automatic model downloading and CPU parallelism
Project description
Whisper Parallel CPU Audio & Video Transcriber
A minimal, robust Python package for whisper.cpp with CPU-optimized threading and integrated model management. Transcribe both audio and video files with high performance. Targeting distributed cloud deployments and transcription workflows.
🚀 Quick Start
Install from PyPI:
pip install whisper-parallel-cpu
Use in Python:
import whisper_parallel_cpu
# Transcribe audio files
text = whisper_parallel_cpu.transcribe("audio.mp3", model="base")
# Transcribe video files
text = whisper_parallel_cpu.transcribe("video.mp4", model="base")
# Or use specific functions
text = whisper_parallel_cpu.transcribe_audio("audio.wav", model="small")
text = whisper_parallel_cpu.transcribe_video("video.mkv", model="medium")
Or use the CLI:
# Transcribe audio
whisper_parallel_cpu transcribe audio.mp3 --model base
# Transcribe video
whisper_parallel_cpu transcribe video.mp4 --model base
✨ Features
- Native C++/pybind11 speed (CPU & GPU acceleration)
- Automatic model download/caching - no manual setup required
- Simple Python & CLI interface - just
pip installand go - Input: Audio (
.mp3,.wav,.flac,.aac,.ogg,.m4a) and video (.mp4,.mkv,.avi,.mov) formats - Output: Transcribed text as a Python string
- Benchmarking: Built-in performance testing and optimization tools
- Cross-platform: Works on macOS, Linux, and Windows
📦 Installation
From PyPI (Recommended)
pip install whisper-parallel-cpu
From Source (Development)
# Clone the repository
git clone https://github.com/krisfur/whisper-parallel-cpu.git
cd whisper-parallel-cpu
# Install in editable mode
pip install -e .
# Test the installation
python test_transcribe.py video.mp4
🧰 Requirements
System Tools
- C++17 compiler (
g++,clang++) - automatically handled by pip - cmake (>=3.15) - automatically handled by pip
- ffmpeg (for audio extraction)
Install ffmpeg
macOS:
brew install ffmpeg
Ubuntu/Debian:
sudo apt update && sudo apt install ffmpeg
Windows: Download from ffmpeg.org or use Chocolatey:
choco install ffmpeg
🧪 Usage
Python API
Basic Usage (Function-based)
import whisper_parallel_cpu
# Transcribe any audio or video file (auto-detects format)
text = whisper_parallel_cpu.transcribe("audio.mp3", model="base", threads=4)
text = whisper_parallel_cpu.transcribe("video.mp4", model="small")
# Use specific functions for audio or video
text = whisper_parallel_cpu.transcribe_audio("audio.wav", model="base", threads=4)
text = whisper_parallel_cpu.transcribe_video("video.mkv", model="medium", threads=8)
# CPU-only mode (no GPU)
text = whisper_parallel_cpu.transcribe("audio.flac", model="base", use_gpu=False)
Advanced Usage (Model Reuse)
For better performance when transcribing multiple files, use the WhisperModel class to load the model once and reuse it:
from whisper_parallel_cpu import WhisperModel
# Create a model instance (model is loaded on first use)
model = WhisperModel(model="base", use_gpu=False, threads=4)
# Transcribe multiple files using the same loaded model
files = ["audio1.mp3", "audio2.wav", "video1.mp4", "video2.mkv"]
for file_path in files:
text = model.transcribe(file_path)
print(f"Transcribed {file_path}: {text[:100]}...")
# Use as context manager
with WhisperModel(model="small", use_gpu=True) as model:
text1 = model.transcribe("audio1.mp3")
text2 = model.transcribe("audio2.wav")
# Model is automatically managed
# Memory management
model.clear_contexts() # Free memory
print(f"Active contexts: {model.get_context_count()}")
Supported File Formats
Audio Formats:
.mp3,.wav,.flac,.aac,.ogg,.m4a,.wma,.opus,.webm,.3gp,.amr,.au,.ra,.mid,.midi
Video Formats:
.mp4,.avi,.mov,.mkv,.wmv,.flv,.webm,.m4v,.3gp,.ogv,.ts,.mts,.m2ts
Available Models
The following models are available and will be downloaded automatically:
| Model | Size | Accuracy | Speed | Use Case |
|---|---|---|---|---|
tiny |
74MB | Good | Fastest | Quick transcriptions |
base |
141MB | Better | Fast | General purpose |
small |
444MB | Better | Medium | High accuracy needed |
medium |
1.4GB | Best | Slow | Maximum accuracy |
large |
2.9GB | Best | Slowest | Professional use |
Command Line Interface
# List available models
whisper_parallel_cpu list
# Download a specific model
whisper_parallel_cpu download base
# Transcribe audio files
whisper_parallel_cpu transcribe audio.mp3 --model base --threads 4
whisper_parallel_cpu transcribe audio.wav --model small
# Transcribe video files
whisper_parallel_cpu transcribe video.mp4 --model base --threads 4
whisper_parallel_cpu transcribe video.mkv --model medium
# Transcribe without GPU (CPU-only)
whisper_parallel_cpu transcribe audio.flac --model small --no-gpu
Model Management
import whisper_parallel_cpu
# List available models
whisper_parallel_cpu.list_models()
# Download a specific model
whisper_parallel_cpu.download_model("medium")
# Force re-download
whisper_parallel_cpu.download_model("base", force=True)
📊 Benchmarking & Performance
Run Performance Tests
# Test with 5 audio/video copies
python benchmark.py audio.mp3 5
python benchmark.py video.mp4 5
What the Benchmark Tests
- Thread Scaling: Tests different thread counts (1, 2, 4, 8, 16, etc.) for single audio/video transcription
- Sequential Processing: Measures throughput when processing multiple audio/video files one after another
- Parallel Processing: Tests concurrent processing with different numbers of workers
- Optimal Configuration: Provides the best settings for your specific hardware
Performance Optimization Tips
- Model Reuse: Use
WhisperModelclass for multiple transcriptions to avoid reloading the model each time - GPU Acceleration: The system automatically uses Metal (macOS) or CUDA (Linux/Windows) when available
- Thread Count: Use the benchmark to find optimal thread count for your CPU
- Batch Processing: For multiple audio/video files, use parallel processing with ThreadPoolExecutor
- Model Size: Smaller models (base, small) are faster but less accurate than larger ones (medium, large)
Model Reuse Performance
When transcribing multiple files, using the WhisperModel class can provide significant performance improvements:
from whisper_parallel_cpu import WhisperModel
import time
# Method 1: Using WhisperModel (model reuse) - FASTER
model = WhisperModel(model="base")
start = time.time()
for file in files:
text = model.transcribe(file)
model_time = time.time() - start
# Method 2: Using transcribe function (no reuse) - SLOWER
start = time.time()
for file in files:
text = whisper_parallel_cpu.transcribe(file, model="base")
function_time = time.time() - start
print(f"Speedup with model reuse: {function_time / model_time:.2f}x")
Typical speedups:
- 2-5x faster for multiple files with the same model
- Reduced memory usage through context sharing
- Better for batch processing workflows
⚙️ API Reference
transcribe(file_path, model, threads, use_gpu)
Transcribes an audio or video file using Whisper. Automatically detects file type.
Parameters:
file_path(str): Path to the audio or video filemodel(str): Model name (e.g. "base", "tiny", etc.) or path to Whisper model binary (.bin file)threads(int): Number of CPU threads to use (default: 4)use_gpu(bool): Whether to use GPU acceleration (default: True)
Returns:
str: Transcribed text
transcribe_audio(audio_path, model, threads, use_gpu)
Transcribes an audio file using Whisper.
Parameters:
audio_path(str): Path to the audio filemodel(str): Model name (e.g. "base", "tiny", etc.) or path to Whisper model binary (.bin file)threads(int): Number of CPU threads to use (default: 4)use_gpu(bool): Whether to use GPU acceleration (default: True)
Returns:
str: Transcribed text
transcribe_video(video_path, model, threads, use_gpu)
Transcribes a video file using Whisper.
Parameters:
video_path(str): Path to the video filemodel(str): Model name (e.g. "base", "tiny", etc.) or path to Whisper model binary (.bin file)threads(int): Number of CPU threads to use (default: 4)use_gpu(bool): Whether to use GPU acceleration (default: True)
Returns:
str: Transcribed text
Example:
import whisper_parallel_cpu
# Basic usage
text = whisper_parallel_cpu.transcribe_video("sample.mp4")
# Advanced usage
text = whisper_parallel_cpu.transcribe_video(
"sample.mp4",
model="medium",
threads=8,
use_gpu=False
)
WhisperModel(model, use_gpu, threads)
A class for efficient model reuse across multiple transcriptions.
Parameters:
model(str): Model name (e.g. "base", "tiny", etc.) or path to Whisper model binary (.bin file)use_gpu(bool): Whether to use GPU acceleration (default: False)threads(int): Number of CPU threads to use (default: 4)
Methods:
transcribe(file_path): Transcribe any audio or video filetranscribe_audio(audio_path): Transcribe an audio filetranscribe_video(video_path): Transcribe a video fileclear_contexts(): Clear all cached contexts to free memoryget_context_count(): Get number of cached contexts
Example:
from whisper_parallel_cpu import WhisperModel
# Create model instance
model = WhisperModel(model="base", use_gpu=False, threads=4)
# Transcribe multiple files efficiently
files = ["audio1.mp3", "audio2.wav", "video1.mp4"]
for file_path in files:
text = model.transcribe(file_path)
print(f"Transcribed: {text[:50]}...")
# Memory management
model.clear_contexts()
clear_contexts()
Clear all cached whisper contexts to free memory.
Example:
import whisper_parallel_cpu
# Clear all cached contexts
whisper_parallel_cpu.clear_contexts()
get_context_count()
Get the number of currently cached whisper contexts.
Returns:
int: Number of cached contexts
Example:
import whisper_parallel_cpu
# Check how many contexts are cached
count = whisper_parallel_cpu.get_context_count()
print(f"Active contexts: {count}")
🤝 Contributing
- Fork the repository
- Create a feature branch:
git checkout -b feature-name - Make your changes and test thoroughly
- Commit your changes:
git commit -m 'Add feature' - Push to the branch:
git push origin feature-name - Submit a pull request
📄 License
MIT License - see LICENSE file for details.
🙏 Acknowledgments
- Built on whisper.cpp by Georgi Gerganov
- Uses pybind11 for Python bindings
- Model management inspired by the original OpenAI Whisper project
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file whisper_parallel_cpu-1.2.3.tar.gz.
File metadata
- Download URL: whisper_parallel_cpu-1.2.3.tar.gz
- Upload date:
- Size: 27.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.23
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c415480d97d9cc8f21f09d1a8c5677b6280c00a226eb85b7d1a6b1e63b1ec7cf
|
|
| MD5 |
c1f4c9d11f1ad55654f31acbf4972b6c
|
|
| BLAKE2b-256 |
7d02d3548b0b089cf582c2f53a4d2c1c01e1fb7c31b672cea4a1bdaade8b1dfc
|