Offline audio transcription with speaker diarization, optimized for Apple Silicon

These details have not been verified by PyPI

Project links

Project description

LocalTranscribe

Turn audio into speaker-labeled transcripts. Entirely offline. One command.

Transform recordings into detailed transcripts showing who said what and when—all on your Mac, no cloud services required.

Version Python Platform License

Quick Start • Examples • SDK • Troubleshooting

Why LocalTranscribe?

Feature	LocalTranscribe	Cloud Services
Privacy	100% offline	Data uploaded to servers
Cost	Free forever	$10-50/month
Speaker ID	Automatic	Often extra cost
Speed (M1/M2)	Real-time to 2x	Depends on upload
Quality	OpenAI Whisper	Varies

Built for: Researchers, podcasters, journalists, lawyers, content creators—anyone who needs accurate transcripts with speaker labels and complete privacy.

Quick Start

1. Install

# Clone repository
git clone https://github.com/aporb/transcribe-diarization.git
cd transcribe-diarization

# Create virtual environment
python3 -m venv .venv
source .venv/bin/activate

# Install
pip install -e .

2. Setup HuggingFace Token (One-Time)

Required for speaker diarization:

Get token (free): https://huggingface.co/settings/tokens
Accept model licenses (required for both models):
- Main: https://huggingface.co/pyannote/speaker-diarization-3.1
- Dependency: https://huggingface.co/pyannote/segmentation-3.0

Add to project:

echo "HUGGINGFACE_TOKEN=hf_your_token_here" > .env

3. Process Audio

localtranscribe process your-audio.mp3

That's it! Results appear in ./output/ with:

Speaker labels (who spoke)
Timestamps (when they spoke)
Full transcript (what they said)

Examples

Basic Usage

# Transcribe with automatic settings
localtranscribe process meeting.mp3

# Know how many speakers? Tell it for better accuracy
localtranscribe process interview.wav --speakers 2

# Use larger model for higher quality
localtranscribe process podcast.m4a --model medium

# Save to custom location
localtranscribe process audio.mp3 --output ./results/

Batch Processing

# Process entire folder
localtranscribe batch ./audio-files/ --workers 2

# With custom settings
localtranscribe batch ./recordings/ --model small --output ./transcripts/

Single-Speaker Content

# Skip speaker detection for faster processing
localtranscribe process lecture.mp3 --skip-diarization

Advanced Options

localtranscribe process audio.mp3 \
  --model medium \              # Model size: tiny|base|small|medium|large
  --speakers 3 \                # Number of speakers (if known)
  --language en \               # Force language
  --format txt json srt \       # Output formats
  --output ./results/ \         # Output directory
  --verbose                     # Show detailed progress

Python SDK

Use programmatically in your Python projects:

from localtranscribe import LocalTranscribe

# Initialize
lt = LocalTranscribe(model_size="base", num_speakers=2)

# Process single file
result = lt.process("meeting.mp3")
print(result.transcript)
print(f"Found {result.num_speakers} speakers")

# Access detailed segments
for segment in result.segments:
    print(f"[{segment.speaker}] ({segment.start:.1f}s): {segment.text}")

# Batch processing
results = lt.process_batch("./audio-files/", max_workers=4)
print(f"Processed {results.successful}/{results.total} files")

# Handle errors
for failed in results.get_failed():
    print(f"Failed: {failed.audio_file} - {failed.error}")

→ Full SDK Documentation

Commands

Command	Description
`process`	Transcribe single audio file
`batch`	Process multiple files
`doctor`	Verify installation and system setup
`label`	Replace generic speaker IDs with real names
`version`	Show version and system info
`config`	Manage configuration

Get help: localtranscribe --help or localtranscribe <command> --help

Output Formats

Every run creates multiple files for different use cases:

Format	File	Best For
Markdown	`*_combined.md`	Reading, documentation, sharing
Plain Text	`*_transcript.txt`	Simple text analysis
JSON	`*_transcript.json`	Programming, data processing
SRT	`*_transcript.srt`	Video subtitles

Combined transcript includes:

Speaker labels (SPEAKER_00, SPEAKER_01, etc.)
Timestamp ranges for each speaker turn
Full transcript with proper formatting
Speaker statistics (who spoke most, how long)

System Requirements

Recommended:

Mac with Apple Silicon (M1/M2/M3/M4)
16GB RAM
10GB free space
macOS 12.0+

Minimum:

Any Mac with Python 3.9+
8GB RAM
5GB free space

Performance (10-minute audio on M2):

tiny model: ~30 seconds
base model: ~2 minutes
small model: ~5 minutes
medium model: ~10 minutes

Model Selection Guide

Model	Speed	Quality	RAM	Best For
tiny	Fastest	Basic	1GB	Quick drafts, testing
base	Fast	Good	1GB	Most use cases
small	Moderate	Better	2GB	Professional work
medium	Slow	Best	5GB	Publication-quality
large	Very slow	Best+	10GB	Maximum accuracy

Recommendation: Start with base, upgrade to medium if accuracy matters more than speed.

What's New in v2.0

Complete rewrite focused on usability:

Before (v1.x): Three manual steps

cd scripts
python3 diarization.py      # Step 1
python3 transcription.py    # Step 2
python3 combine.py          # Step 3

Now (v2.0): One command

localtranscribe process audio.mp3

Other improvements:

Professional CLI with helpful error messages
Python SDK for programmatic use
Batch processing support
Health check (doctor command)
Modular architecture
Beautiful terminal output

→ Full Changelog

Installation Options

Option 1: Development (Recommended)

git clone https://github.com/aporb/transcribe-diarization.git
cd transcribe-diarization
python3 -m venv .venv
source .venv/bin/activate
pip install -e .

Option 2: PyPI (Coming Soon)

# When published
pip install localtranscribe

# With Apple Silicon optimization
pip install localtranscribe[mlx]

Troubleshooting

Common Issues

Command not found:

source .venv/bin/activate  # Activate virtual environment first

HuggingFace token error:

# Check .env file exists and has correct format
cat .env
# Should show: HUGGINGFACE_TOKEN="hf_..."

Slow processing:

localtranscribe process audio.mp3 --model tiny  # Use faster model

Run health check:

localtranscribe doctor  # Diagnoses setup issues

→ Full Troubleshooting Guide

How It Works

Speaker Diarization (pyannote.audio)
- Analyzes audio waveforms
- Identifies when different speakers talk
- Creates speaker timeline
Speech-to-Text (Whisper)
- Converts speech to text
- Detects language automatically
- Creates timestamped segments
Intelligent Combination
- Matches speaker labels to transcript segments
- Aligns timestamps
- Generates formatted output

Technology:

Whisper - OpenAI's speech recognition
MLX-Whisper - Apple Silicon optimization
Pyannote - Speaker diarization
Typer - Modern CLI
Rich - Beautiful terminal output

Documentation

📚 SDK Reference - Python API for developers
🐛 Troubleshooting - Common issues & solutions
📝 Changelog - Version history
🚀 PyPI Release Guide - For maintainers

Roadmap

v2.0-beta (Current):

✅ Modern CLI
✅ Python SDK
✅ Batch processing
✅ Health checks

v2.1 (Next):

Interactive speaker labeling (replace SPEAKER_00 with names)
Progress bars for large files
Resume interrupted jobs
Audio quality analysis

v3.0 (Future):

Real-time transcription
Web interface
Docker support
Cloud sync (optional)

Contributing

Contributions welcome! Please:

Check existing issues
Fork the repository
Create feature branch (git checkout -b feature/amazing-feature)
Commit changes (git commit -m 'Add amazing feature')
Push branch (git push origin feature/amazing-feature)
Open Pull Request

License

MIT License - free for personal and commercial use.

Support

Need help?

Run localtranscribe doctor to check your setup
Check Troubleshooting Guide
Search existing issues
Open new issue with doctor output and error message

Credits

Built by the LocalTranscribe community.

Special thanks:

OpenAI - Whisper model
Apple - MLX framework
Pyannote team - Speaker diarization models
HuggingFace - Model hosting

⭐ Star on GitHub • 🐛 Report Bug • 💡 Request Feature

Made with ❤️ for privacy-conscious professionals

Transform audio to text. Know who said what. Keep it private.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

3.1.2

Oct 31, 2025

3.1.1

Oct 31, 2025

3.1.0

Oct 31, 2025

2.0.2b3 pre-release

Oct 14, 2025

2.0.2b1 pre-release

Oct 14, 2025

2.0.2b0 pre-release

Oct 14, 2025

This version

2.0.1b0 pre-release

Oct 14, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

localtranscribe-2.0.1b0.tar.gz (57.7 kB view details)

Uploaded Oct 14, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

localtranscribe-2.0.1b0-py3-none-any.whl (71.1 kB view details)

Uploaded Oct 14, 2025 Python 3

File details

Details for the file localtranscribe-2.0.1b0.tar.gz.

File metadata

Download URL: localtranscribe-2.0.1b0.tar.gz
Upload date: Oct 14, 2025
Size: 57.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for localtranscribe-2.0.1b0.tar.gz
Algorithm	Hash digest
SHA256	`8f430a58e74ff6301c14d88fb9f25eb68fdf9c809de4c5088eb457db2d641dc0`
MD5	`cabeeb1c89b8cbdd48257574333ab135`
BLAKE2b-256	`2b45f59d12fa099c931806e42531c9b86519fcfef3573aaef14b9996d30e9321`

See more details on using hashes here.

File details

Details for the file localtranscribe-2.0.1b0-py3-none-any.whl.

File metadata

Download URL: localtranscribe-2.0.1b0-py3-none-any.whl
Upload date: Oct 14, 2025
Size: 71.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for localtranscribe-2.0.1b0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b312a968a8d08e845db74e6e5f5c1822a358ac4e83876af058f852246c9877a9`
MD5	`de75a2b44b8d7ecad58787ee233870d0`
BLAKE2b-256	`c07f5388b7f4a1293902fd108f3c96e8a1a271e5256538cf5f50d503a46b01e7`

See more details on using hashes here.

localtranscribe 2.0.1b0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

LocalTranscribe

Why LocalTranscribe?

Quick Start

1. Install

2. Setup HuggingFace Token (One-Time)

3. Process Audio

Examples

Basic Usage

Batch Processing

Single-Speaker Content

Advanced Options

Python SDK

Commands

Output Formats

System Requirements

Model Selection Guide

What's New in v2.0

Installation Options

Option 1: Development (Recommended)

Option 2: PyPI (Coming Soon)

Troubleshooting

Common Issues

How It Works

Documentation

Roadmap

Contributing

License

Support

Credits

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes