Privacy-first audio transcription with speaker diarization. Entirely offline.
Project description
LocalTranscribe
Privacy-first audio transcription with speaker diarization. Entirely offline.
Transform recordings into detailed transcripts showing who said what and when—all on your Mac, with complete privacy.
Why LocalTranscribe?
| Feature | LocalTranscribe | Cloud Services |
|---|---|---|
| Privacy | 100% offline processing | Data uploaded to third-party servers |
| Cost | Free forever | $10-50/month subscription |
| Speaker Identification | Automatic speaker detection | Often extra cost or unavailable |
| Speed (Apple Silicon) | Real-time to 2x audio length | Depends on upload/download speed |
| Quality | OpenAI Whisper models | Varies by provider |
| Data Ownership | All files stay on your machine | Depends on provider terms |
Perfect for: Researchers, podcasters, journalists, legal professionals, content creators—anyone who needs accurate transcripts with speaker labels and complete data privacy.
Features
- 🔒 Complete Privacy - All processing happens locally on your machine
- 🎯 Speaker Diarization - Automatic detection of who spoke when
- 📝 High Accuracy - Powered by OpenAI's Whisper models
- ⚡️ Apple Silicon Optimized - Blazing fast on M1/M2/M3/M4 Macs
- 🚀 Simple CLI - One command to transcribe any audio file
- 📦 Python SDK - Integrate transcription into your applications
- 🔄 Batch Processing - Process multiple files simultaneously
- 📊 Multiple Formats - Output as TXT, JSON, SRT, or Markdown
Quick Start
Install from PyPI
Package: pypi.org/project/localtranscribe
pip install localtranscribe
Setup HuggingFace Token (One-Time)
Speaker diarization requires a free HuggingFace account:
- Create account & get token: https://huggingface.co/settings/tokens
- Accept model licenses (click "Agree" on each):
- Configure token:
echo "HUGGINGFACE_TOKEN=hf_your_token_here" > .env
Transcribe Audio
localtranscribe process your-audio.mp3
Done! Results appear in ./output/ with speaker labels, timestamps, and full transcript.
Installation
Option 1: Install from PyPI (Recommended)
# Basic installation
pip install localtranscribe
# For Apple Silicon optimization (recommended for M1/M2/M3/M4)
pip install localtranscribe[mlx]
# For NVIDIA GPU support
pip install localtranscribe[faster]
# Install all optional dependencies
pip install localtranscribe[all]
Option 2: Install from Source
# Clone repository
git clone https://github.com/aporb/LocalTranscribe.git
cd LocalTranscribe
# Create virtual environment
python3 -m venv .venv
source .venv/bin/activate
# Install in development mode
pip install -e .
Verify Installation
localtranscribe doctor
This command checks your system configuration and reports any issues.
Usage Examples
Basic Transcription
# Transcribe with automatic settings
localtranscribe process meeting.mp3
# Specify number of speakers for better accuracy
localtranscribe process interview.wav --speakers 2
# Use larger model for higher quality
localtranscribe process podcast.m4a --model medium
# Save to custom location
localtranscribe process audio.mp3 --output ./results/
Batch Processing
# Process entire folder
localtranscribe batch ./audio-files/
# Process with multiple workers
localtranscribe batch ./recordings/ --workers 4
# With custom settings
localtranscribe batch ./files/ --model small --output ./transcripts/
Single-Speaker Content
# Skip speaker detection for faster processing
localtranscribe process lecture.mp3 --skip-diarization
Advanced Options
localtranscribe process audio.mp3 \
--model medium \ # Model: tiny|base|small|medium|large
--speakers 3 \ # Number of speakers (if known)
--language en \ # Force specific language
--format txt json srt \ # Output formats
--output ./results/ \ # Output directory
--verbose # Show detailed progress
Using the Python SDK
from localtranscribe import LocalTranscribe
# Initialize with options
lt = LocalTranscribe(
model_size="base",
num_speakers=2,
output_dir="./transcripts"
)
# Process single file
result = lt.process("meeting.mp3")
# Access results
print(f"Transcript: {result.transcript}")
print(f"Speakers: {result.num_speakers}")
print(f"Duration: {result.duration}s")
# Access detailed segments
for segment in result.segments:
print(f"[{segment.speaker}] {segment.text}")
# Batch processing
results = lt.process_batch("./audio-files/", max_workers=4)
print(f"Completed: {results.successful}/{results.total}")
Output Formats
LocalTranscribe generates multiple output files for different use cases:
| Format | File | Description |
|---|---|---|
| Markdown | *_combined.md |
Formatted transcript with speaker labels and timestamps |
| Plain Text | *_transcript.txt |
Simple text output for analysis |
| JSON | *_transcript.json |
Structured data for programming |
| SRT | *_transcript.srt |
Subtitle format for video |
| Diarization | *_diarization.md |
Speaker timeline and statistics |
Example Output:
# Combined Transcript
**Audio File:** interview.mp3
**Processing Date:** 2025-10-13 22:30:00
## SPEAKER_00
**Time:** [0.0s - 5.2s]
Hello, welcome to the show. Thanks for joining us today.
## SPEAKER_01
**Time:** [5.5s - 12.8s]
Thanks for having me. I'm excited to discuss our new project.
Commands
| Command | Description | Example |
|---|---|---|
process |
Transcribe single audio file | localtranscribe process audio.mp3 |
batch |
Process multiple files | localtranscribe batch ./folder/ |
doctor |
Verify system setup | localtranscribe doctor |
label |
Replace speaker IDs with names | localtranscribe label output.md |
version |
Show version information | localtranscribe version |
config |
Manage configuration | localtranscribe config show |
Run localtranscribe --help or localtranscribe <command> --help for detailed options.
Model Selection Guide
Choose the right Whisper model for your needs:
| Model | Speed | Quality | RAM | Use Case |
|---|---|---|---|---|
| tiny | Fastest | Basic | 1GB | Quick drafts, testing |
| base | Fast | Good | 1GB | Most use cases |
| small | Moderate | Better | 2GB | Professional work |
| medium | Slow | Excellent | 5GB | Publication quality |
| large | Very slow | Best | 10GB | Maximum accuracy |
Performance on M2 Mac (10-minute audio):
tiny: ~30 secondsbase: ~2 minutes ← Recommended starting pointsmall: ~5 minutesmedium: ~10 minutes
System Requirements
Recommended:
- Mac with Apple Silicon (M1/M2/M3/M4)
- 16GB RAM
- 10GB free disk space
- macOS 12.0 or later
Minimum:
- Any Mac with Python 3.9+
- 8GB RAM
- 5GB free disk space
- macOS 11.0 or later
Supported Audio Formats:
- MP3, WAV, M4A, OGG, FLAC, AAC, WMA
- Video files (MP4, MOV, AVI) - audio will be extracted
How It Works
LocalTranscribe uses a three-stage pipeline:
1. Speaker Diarization (pyannote.audio)
- Analyzes audio waveform patterns
- Identifies distinct speakers
- Creates precise speaker timeline
- Optimized for 2-10 speakers
2. Speech-to-Text (Whisper)
- Converts speech to text using OpenAI's Whisper
- Automatically detects language
- Handles accents and background noise
- Creates timestamped segments
3. Intelligent Combination
- Aligns speaker labels with transcript
- Matches timestamps accurately
- Formats output for readability
- Generates multiple export formats
Technologies:
- Whisper - State-of-the-art speech recognition
- MLX-Whisper - Apple Silicon optimization
- Pyannote.audio - Speaker diarization
- Typer - Modern CLI framework
- Rich - Beautiful terminal output
Documentation
📚 SDK Reference - Python API documentation 🐛 Troubleshooting Guide - Common issues and solutions 📝 Changelog - Version history and updates 🚀 Contributing Guide - How to contribute
Troubleshooting
Common Issues
Command not found after installation:
# Ensure package is installed
pip install --upgrade localtranscribe
# If using virtual environment, activate it first
source .venv/bin/activate
HuggingFace authentication error:
# Verify token is correctly set
cat .env
# Should show: HUGGINGFACE_TOKEN=hf_...
# Make sure you accepted both model licenses
Slow processing:
# Use a faster model
localtranscribe process audio.mp3 --model tiny
# Skip diarization for single speaker
localtranscribe process audio.mp3 --skip-diarization
Run system check:
localtranscribe doctor
This command diagnoses common setup issues and suggests fixes.
What's New
v2.0.2b1 (Current)
- ✅ Updated package description and metadata
- ✅ Enhanced README with PyPI link
- ✅ Professional documentation polish
v2.0.1-beta
- ✅ Published to PyPI - Install with
pip install localtranscribe - ✅ Fixed pyannote.audio 3.x API compatibility
- ✅ Updated documentation for model licenses
v2.0.0-beta
- ✅ Complete rewrite with modern CLI
- ✅ Python SDK for programmatic use
- ✅ Batch processing support
- ✅ System health checks with
doctorcommand - ✅ Modular architecture
Roadmap
v2.1 (Next Release)
- Interactive speaker labeling (replace SPEAKER_00 with real names)
- Enhanced progress indicators for large files
- Resume interrupted transcription jobs
- Audio quality pre-analysis
v3.0 (Future)
- Real-time transcription support
- Web-based interface
- Docker containerization
- Optional cloud sync for results
Contributing
We welcome contributions! Here's how to get started:
- Check existing issues at github.com/aporb/LocalTranscribe/issues
- Fork the repository and create your feature branch
- Make your changes following the existing code style
- Add tests if applicable
- Submit a pull request with a clear description
Development Setup:
git clone https://github.com/aporb/LocalTranscribe.git
cd LocalTranscribe
python3 -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
License
MIT License - Free for personal and commercial use.
See LICENSE for full details.
Support
Need help?
- Run
localtranscribe doctorto check your setup - Check the Troubleshooting Guide
- Search existing issues
- Open a new issue with:
- Output from
localtranscribe doctor - Error message or unexpected behavior
- Your system info (OS, Python version)
- Output from
Acknowledgments
LocalTranscribe builds on excellent open-source work:
- OpenAI - Whisper speech recognition model
- Apple - MLX framework for Metal acceleration
- Pyannote team - Speaker diarization models
- HuggingFace - Model hosting and distribution
⭐ Star on GitHub • 🐛 Report Bug • 💡 Request Feature
Made for privacy-conscious professionals who value data ownership.
Transform audio to text. Know who said what. Keep it private.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file localtranscribe-2.0.2b1.tar.gz.
File metadata
- Download URL: localtranscribe-2.0.2b1.tar.gz
- Upload date:
- Size: 59.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
52a7d4b1fda2e5049078200e0a812703f47a555bc989251dd2a38bc970b2c091
|
|
| MD5 |
662705b5f89c75732bf7083e66d29cf1
|
|
| BLAKE2b-256 |
47b2336fda77e96b05ebde23fe2d7fa79fb1190bf53d12ca210c3f05d1e8f74a
|
File details
Details for the file localtranscribe-2.0.2b1-py3-none-any.whl.
File metadata
- Download URL: localtranscribe-2.0.2b1-py3-none-any.whl
- Upload date:
- Size: 72.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5d83c742bd4e7281856f6a2b6fa26a7812340145d528023fec5f084dcb865c81
|
|
| MD5 |
dc9094eb62afbb78d8ffa7e1da13117d
|
|
| BLAKE2b-256 |
931524db06012aaaf47fc8b119b2d1221a7bfa618067c6a12e6ffed41f126b17
|