Skip to main content

Offline audio transcription with speaker diarization, optimized for Apple Silicon

Project description

LocalTranscribe

Turn audio into speaker-labeled transcripts. Entirely offline. One command.

Transform recordings into detailed transcripts showing who said what and when—all on your Mac, no cloud services required.


Why LocalTranscribe?

Feature LocalTranscribe Cloud Services
Privacy 100% offline Data uploaded to servers
Cost Free forever $10-50/month
Speaker ID Automatic Often extra cost
Speed (M1/M2) Real-time to 2x Depends on upload
Quality OpenAI Whisper Varies

Built for: Researchers, podcasters, journalists, lawyers, content creators—anyone who needs accurate transcripts with speaker labels and complete privacy.


Quick Start

1. Install

# Clone repository
git clone https://github.com/aporb/transcribe-diarization.git
cd transcribe-diarization

# Create virtual environment
python3 -m venv .venv
source .venv/bin/activate

# Install
pip install -e .

2. Setup HuggingFace Token (One-Time)

Required for speaker diarization:

  1. Get token (free): https://huggingface.co/settings/tokens
  2. Accept model licenses (required for both models):
  3. Add to project:
    echo "HUGGINGFACE_TOKEN=hf_your_token_here" > .env
    

3. Process Audio

localtranscribe process your-audio.mp3

That's it! Results appear in ./output/ with:

  • Speaker labels (who spoke)
  • Timestamps (when they spoke)
  • Full transcript (what they said)

Examples

Basic Usage

# Transcribe with automatic settings
localtranscribe process meeting.mp3

# Know how many speakers? Tell it for better accuracy
localtranscribe process interview.wav --speakers 2

# Use larger model for higher quality
localtranscribe process podcast.m4a --model medium

# Save to custom location
localtranscribe process audio.mp3 --output ./results/

Batch Processing

# Process entire folder
localtranscribe batch ./audio-files/ --workers 2

# With custom settings
localtranscribe batch ./recordings/ --model small --output ./transcripts/

Single-Speaker Content

# Skip speaker detection for faster processing
localtranscribe process lecture.mp3 --skip-diarization

Advanced Options

localtranscribe process audio.mp3 \
  --model medium \              # Model size: tiny|base|small|medium|large
  --speakers 3 \                # Number of speakers (if known)
  --language en \               # Force language
  --format txt json srt \       # Output formats
  --output ./results/ \         # Output directory
  --verbose                     # Show detailed progress

Python SDK

Use programmatically in your Python projects:

from localtranscribe import LocalTranscribe

# Initialize
lt = LocalTranscribe(model_size="base", num_speakers=2)

# Process single file
result = lt.process("meeting.mp3")
print(result.transcript)
print(f"Found {result.num_speakers} speakers")

# Access detailed segments
for segment in result.segments:
    print(f"[{segment.speaker}] ({segment.start:.1f}s): {segment.text}")

# Batch processing
results = lt.process_batch("./audio-files/", max_workers=4)
print(f"Processed {results.successful}/{results.total} files")

# Handle errors
for failed in results.get_failed():
    print(f"Failed: {failed.audio_file} - {failed.error}")

→ Full SDK Documentation


Commands

Command Description
process Transcribe single audio file
batch Process multiple files
doctor Verify installation and system setup
label Replace generic speaker IDs with real names
version Show version and system info
config Manage configuration

Get help: localtranscribe --help or localtranscribe <command> --help


Output Formats

Every run creates multiple files for different use cases:

Format File Best For
Markdown *_combined.md Reading, documentation, sharing
Plain Text *_transcript.txt Simple text analysis
JSON *_transcript.json Programming, data processing
SRT *_transcript.srt Video subtitles

Combined transcript includes:

  • Speaker labels (SPEAKER_00, SPEAKER_01, etc.)
  • Timestamp ranges for each speaker turn
  • Full transcript with proper formatting
  • Speaker statistics (who spoke most, how long)

System Requirements

Recommended:

  • Mac with Apple Silicon (M1/M2/M3/M4)
  • 16GB RAM
  • 10GB free space
  • macOS 12.0+

Minimum:

  • Any Mac with Python 3.9+
  • 8GB RAM
  • 5GB free space

Performance (10-minute audio on M2):

  • tiny model: ~30 seconds
  • base model: ~2 minutes
  • small model: ~5 minutes
  • medium model: ~10 minutes

Model Selection Guide

Model Speed Quality RAM Best For
tiny Fastest Basic 1GB Quick drafts, testing
base Fast Good 1GB Most use cases
small Moderate Better 2GB Professional work
medium Slow Best 5GB Publication-quality
large Very slow Best+ 10GB Maximum accuracy

Recommendation: Start with base, upgrade to medium if accuracy matters more than speed.


What's New in v2.0

Complete rewrite focused on usability:

Before (v1.x): Three manual steps

cd scripts
python3 diarization.py      # Step 1
python3 transcription.py    # Step 2
python3 combine.py          # Step 3

Now (v2.0): One command

localtranscribe process audio.mp3

Other improvements:

  • Professional CLI with helpful error messages
  • Python SDK for programmatic use
  • Batch processing support
  • Health check (doctor command)
  • Modular architecture
  • Beautiful terminal output

→ Full Changelog


Installation Options

Option 1: Development (Recommended)

git clone https://github.com/aporb/transcribe-diarization.git
cd transcribe-diarization
python3 -m venv .venv
source .venv/bin/activate
pip install -e .

Option 2: PyPI (Coming Soon)

# When published
pip install localtranscribe

# With Apple Silicon optimization
pip install localtranscribe[mlx]

Troubleshooting

Common Issues

Command not found:

source .venv/bin/activate  # Activate virtual environment first

HuggingFace token error:

# Check .env file exists and has correct format
cat .env
# Should show: HUGGINGFACE_TOKEN="hf_..."

Slow processing:

localtranscribe process audio.mp3 --model tiny  # Use faster model

Run health check:

localtranscribe doctor  # Diagnoses setup issues

→ Full Troubleshooting Guide


How It Works

  1. Speaker Diarization (pyannote.audio)

    • Analyzes audio waveforms
    • Identifies when different speakers talk
    • Creates speaker timeline
  2. Speech-to-Text (Whisper)

    • Converts speech to text
    • Detects language automatically
    • Creates timestamped segments
  3. Intelligent Combination

    • Matches speaker labels to transcript segments
    • Aligns timestamps
    • Generates formatted output

Technology:


Documentation

📚 SDK Reference - Python API for developers
🐛 Troubleshooting - Common issues & solutions
📝 Changelog - Version history
🚀 PyPI Release Guide - For maintainers


Roadmap

v2.0-beta (Current):

  • ✅ Modern CLI
  • ✅ Python SDK
  • ✅ Batch processing
  • ✅ Health checks

v2.1 (Next):

  • Interactive speaker labeling (replace SPEAKER_00 with names)
  • Progress bars for large files
  • Resume interrupted jobs
  • Audio quality analysis

v3.0 (Future):

  • Real-time transcription
  • Web interface
  • Docker support
  • Cloud sync (optional)

Contributing

Contributions welcome! Please:

  1. Check existing issues
  2. Fork the repository
  3. Create feature branch (git checkout -b feature/amazing-feature)
  4. Commit changes (git commit -m 'Add amazing feature')
  5. Push branch (git push origin feature/amazing-feature)
  6. Open Pull Request

License

MIT License - free for personal and commercial use.


Support

Need help?

  1. Run localtranscribe doctor to check your setup
  2. Check Troubleshooting Guide
  3. Search existing issues
  4. Open new issue with doctor output and error message

Credits

Built by the LocalTranscribe community.

Special thanks:

  • OpenAI - Whisper model
  • Apple - MLX framework
  • Pyannote team - Speaker diarization models
  • HuggingFace - Model hosting

⭐ Star on GitHub🐛 Report Bug💡 Request Feature

Made with ❤️ for privacy-conscious professionals

Transform audio to text. Know who said what. Keep it private.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

localtranscribe-2.0.1b0.tar.gz (57.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

localtranscribe-2.0.1b0-py3-none-any.whl (71.1 kB view details)

Uploaded Python 3

File details

Details for the file localtranscribe-2.0.1b0.tar.gz.

File metadata

  • Download URL: localtranscribe-2.0.1b0.tar.gz
  • Upload date:
  • Size: 57.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for localtranscribe-2.0.1b0.tar.gz
Algorithm Hash digest
SHA256 8f430a58e74ff6301c14d88fb9f25eb68fdf9c809de4c5088eb457db2d641dc0
MD5 cabeeb1c89b8cbdd48257574333ab135
BLAKE2b-256 2b45f59d12fa099c931806e42531c9b86519fcfef3573aaef14b9996d30e9321

See more details on using hashes here.

File details

Details for the file localtranscribe-2.0.1b0-py3-none-any.whl.

File metadata

File hashes

Hashes for localtranscribe-2.0.1b0-py3-none-any.whl
Algorithm Hash digest
SHA256 b312a968a8d08e845db74e6e5f5c1822a358ac4e83876af058f852246c9877a9
MD5 de75a2b44b8d7ecad58787ee233870d0
BLAKE2b-256 c07f5388b7f4a1293902fd108f3c96e8a1a271e5256538cf5f50d503a46b01e7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page