Offline audio transcription with speaker diarization, optimized for Apple Silicon
Project description
LocalTranscribe
Turn audio into speaker-labeled transcripts. Entirely offline. One command.
Transform recordings into detailed transcripts showing who said what and when—all on your Mac, no cloud services required.
Why LocalTranscribe?
| Feature | LocalTranscribe | Cloud Services |
|---|---|---|
| Privacy | 100% offline | Data uploaded to servers |
| Cost | Free forever | $10-50/month |
| Speaker ID | Automatic | Often extra cost |
| Speed (M1/M2) | Real-time to 2x | Depends on upload |
| Quality | OpenAI Whisper | Varies |
Built for: Researchers, podcasters, journalists, lawyers, content creators—anyone who needs accurate transcripts with speaker labels and complete privacy.
Quick Start
1. Install
# Clone repository
git clone https://github.com/aporb/transcribe-diarization.git
cd transcribe-diarization
# Create virtual environment
python3 -m venv .venv
source .venv/bin/activate
# Install
pip install -e .
2. Setup HuggingFace Token (One-Time)
Required for speaker diarization:
- Get token (free): https://huggingface.co/settings/tokens
- Accept model licenses (required for both models):
- Add to project:
echo "HUGGINGFACE_TOKEN=hf_your_token_here" > .env
3. Process Audio
localtranscribe process your-audio.mp3
That's it! Results appear in ./output/ with:
- Speaker labels (who spoke)
- Timestamps (when they spoke)
- Full transcript (what they said)
Examples
Basic Usage
# Transcribe with automatic settings
localtranscribe process meeting.mp3
# Know how many speakers? Tell it for better accuracy
localtranscribe process interview.wav --speakers 2
# Use larger model for higher quality
localtranscribe process podcast.m4a --model medium
# Save to custom location
localtranscribe process audio.mp3 --output ./results/
Batch Processing
# Process entire folder
localtranscribe batch ./audio-files/ --workers 2
# With custom settings
localtranscribe batch ./recordings/ --model small --output ./transcripts/
Single-Speaker Content
# Skip speaker detection for faster processing
localtranscribe process lecture.mp3 --skip-diarization
Advanced Options
localtranscribe process audio.mp3 \
--model medium \ # Model size: tiny|base|small|medium|large
--speakers 3 \ # Number of speakers (if known)
--language en \ # Force language
--format txt json srt \ # Output formats
--output ./results/ \ # Output directory
--verbose # Show detailed progress
Python SDK
Use programmatically in your Python projects:
from localtranscribe import LocalTranscribe
# Initialize
lt = LocalTranscribe(model_size="base", num_speakers=2)
# Process single file
result = lt.process("meeting.mp3")
print(result.transcript)
print(f"Found {result.num_speakers} speakers")
# Access detailed segments
for segment in result.segments:
print(f"[{segment.speaker}] ({segment.start:.1f}s): {segment.text}")
# Batch processing
results = lt.process_batch("./audio-files/", max_workers=4)
print(f"Processed {results.successful}/{results.total} files")
# Handle errors
for failed in results.get_failed():
print(f"Failed: {failed.audio_file} - {failed.error}")
Commands
| Command | Description |
|---|---|
process |
Transcribe single audio file |
batch |
Process multiple files |
doctor |
Verify installation and system setup |
label |
Replace generic speaker IDs with real names |
version |
Show version and system info |
config |
Manage configuration |
Get help: localtranscribe --help or localtranscribe <command> --help
Output Formats
Every run creates multiple files for different use cases:
| Format | File | Best For |
|---|---|---|
| Markdown | *_combined.md |
Reading, documentation, sharing |
| Plain Text | *_transcript.txt |
Simple text analysis |
| JSON | *_transcript.json |
Programming, data processing |
| SRT | *_transcript.srt |
Video subtitles |
Combined transcript includes:
- Speaker labels (SPEAKER_00, SPEAKER_01, etc.)
- Timestamp ranges for each speaker turn
- Full transcript with proper formatting
- Speaker statistics (who spoke most, how long)
System Requirements
Recommended:
- Mac with Apple Silicon (M1/M2/M3/M4)
- 16GB RAM
- 10GB free space
- macOS 12.0+
Minimum:
- Any Mac with Python 3.9+
- 8GB RAM
- 5GB free space
Performance (10-minute audio on M2):
tinymodel: ~30 secondsbasemodel: ~2 minutessmallmodel: ~5 minutesmediummodel: ~10 minutes
Model Selection Guide
| Model | Speed | Quality | RAM | Best For |
|---|---|---|---|---|
| tiny | Fastest | Basic | 1GB | Quick drafts, testing |
| base | Fast | Good | 1GB | Most use cases |
| small | Moderate | Better | 2GB | Professional work |
| medium | Slow | Best | 5GB | Publication-quality |
| large | Very slow | Best+ | 10GB | Maximum accuracy |
Recommendation: Start with base, upgrade to medium if accuracy matters more than speed.
What's New in v2.0
Complete rewrite focused on usability:
Before (v1.x): Three manual steps
cd scripts
python3 diarization.py # Step 1
python3 transcription.py # Step 2
python3 combine.py # Step 3
Now (v2.0): One command
localtranscribe process audio.mp3
Other improvements:
- Professional CLI with helpful error messages
- Python SDK for programmatic use
- Batch processing support
- Health check (
doctorcommand) - Modular architecture
- Beautiful terminal output
Installation Options
Option 1: Development (Recommended)
git clone https://github.com/aporb/transcribe-diarization.git
cd transcribe-diarization
python3 -m venv .venv
source .venv/bin/activate
pip install -e .
Option 2: PyPI (Coming Soon)
# When published
pip install localtranscribe
# With Apple Silicon optimization
pip install localtranscribe[mlx]
Troubleshooting
Common Issues
Command not found:
source .venv/bin/activate # Activate virtual environment first
HuggingFace token error:
# Check .env file exists and has correct format
cat .env
# Should show: HUGGINGFACE_TOKEN="hf_..."
Slow processing:
localtranscribe process audio.mp3 --model tiny # Use faster model
Run health check:
localtranscribe doctor # Diagnoses setup issues
How It Works
-
Speaker Diarization (pyannote.audio)
- Analyzes audio waveforms
- Identifies when different speakers talk
- Creates speaker timeline
-
Speech-to-Text (Whisper)
- Converts speech to text
- Detects language automatically
- Creates timestamped segments
-
Intelligent Combination
- Matches speaker labels to transcript segments
- Aligns timestamps
- Generates formatted output
Technology:
- Whisper - OpenAI's speech recognition
- MLX-Whisper - Apple Silicon optimization
- Pyannote - Speaker diarization
- Typer - Modern CLI
- Rich - Beautiful terminal output
Documentation
📚 SDK Reference - Python API for developers
🐛 Troubleshooting - Common issues & solutions
📝 Changelog - Version history
🚀 PyPI Release Guide - For maintainers
Roadmap
v2.0-beta (Current):
- ✅ Modern CLI
- ✅ Python SDK
- ✅ Batch processing
- ✅ Health checks
v2.1 (Next):
- Interactive speaker labeling (replace SPEAKER_00 with names)
- Progress bars for large files
- Resume interrupted jobs
- Audio quality analysis
v3.0 (Future):
- Real-time transcription
- Web interface
- Docker support
- Cloud sync (optional)
Contributing
Contributions welcome! Please:
- Check existing issues
- Fork the repository
- Create feature branch (
git checkout -b feature/amazing-feature) - Commit changes (
git commit -m 'Add amazing feature') - Push branch (
git push origin feature/amazing-feature) - Open Pull Request
License
MIT License - free for personal and commercial use.
Support
Need help?
- Run
localtranscribe doctorto check your setup - Check Troubleshooting Guide
- Search existing issues
- Open new issue with
doctoroutput and error message
Credits
Built by the LocalTranscribe community.
Special thanks:
- OpenAI - Whisper model
- Apple - MLX framework
- Pyannote team - Speaker diarization models
- HuggingFace - Model hosting
⭐ Star on GitHub • 🐛 Report Bug • 💡 Request Feature
Made with ❤️ for privacy-conscious professionals
Transform audio to text. Know who said what. Keep it private.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file localtranscribe-2.0.1b0.tar.gz.
File metadata
- Download URL: localtranscribe-2.0.1b0.tar.gz
- Upload date:
- Size: 57.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8f430a58e74ff6301c14d88fb9f25eb68fdf9c809de4c5088eb457db2d641dc0
|
|
| MD5 |
cabeeb1c89b8cbdd48257574333ab135
|
|
| BLAKE2b-256 |
2b45f59d12fa099c931806e42531c9b86519fcfef3573aaef14b9996d30e9321
|
File details
Details for the file localtranscribe-2.0.1b0-py3-none-any.whl.
File metadata
- Download URL: localtranscribe-2.0.1b0-py3-none-any.whl
- Upload date:
- Size: 71.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b312a968a8d08e845db74e6e5f5c1822a358ac4e83876af058f852246c9877a9
|
|
| MD5 |
de75a2b44b8d7ecad58787ee233870d0
|
|
| BLAKE2b-256 |
c07f5388b7f4a1293902fd108f3c96e8a1a271e5256538cf5f50d503a46b01e7
|