Privacy-first audio transcription with speaker diarization. Entirely offline.

These details have not been verified by PyPI

Project links

Project description

LocalTranscribe

Privacy-first audio transcription with speaker diarization. Entirely offline.

Transform recordings into detailed transcripts showing who said what and when—all on your Mac, with complete privacy.

Quick Start • Installation • Examples • Documentation

Why LocalTranscribe?

Feature	LocalTranscribe	Cloud Services
Privacy	100% offline processing	Data uploaded to third-party servers
Cost	Free forever	$10-50/month subscription
Speaker Identification	Automatic speaker detection	Often extra cost or unavailable
Speed (Apple Silicon)	Real-time to 2x audio length	Depends on upload/download speed
Quality	OpenAI Whisper models	Varies by provider
Data Ownership	All files stay on your machine	Depends on provider terms

Perfect for: Researchers, podcasters, journalists, legal professionals, content creators—anyone who needs accurate transcripts with speaker labels and complete data privacy.

Features

🔒 Complete Privacy - All processing happens locally on your machine
🎯 Speaker Diarization - Automatic detection of who spoke when
📝 High Accuracy - Powered by OpenAI's Whisper models
⚡️ Apple Silicon Optimized - Blazing fast on M1/M2/M3/M4 Macs
🚀 Simple CLI - One command to transcribe any audio file
📦 Python SDK - Integrate transcription into your applications
🔄 Batch Processing - Process multiple files simultaneously
📊 Multiple Formats - Output as TXT, JSON, SRT, or Markdown

Quick Start

Install from PyPI

Package: pypi.org/project/localtranscribe

pip install localtranscribe

Setup HuggingFace Token (One-Time)

Speaker diarization requires a free HuggingFace account:

Create account & get token: https://huggingface.co/settings/tokens
Accept model licenses (click "Agree" on each):
- https://huggingface.co/pyannote/speaker-diarization-3.1
- https://huggingface.co/pyannote/segmentation-3.0

Configure token:

echo "HUGGINGFACE_TOKEN=hf_your_token_here" > .env

Transcribe Audio

localtranscribe process your-audio.mp3

Done! Results appear in ./output/ with speaker labels, timestamps, and full transcript.

Installation

Option 1: Install from PyPI (Recommended)

# Basic installation
pip install localtranscribe

# For Apple Silicon optimization (recommended for M1/M2/M3/M4)
pip install localtranscribe[mlx]

# For NVIDIA GPU support
pip install localtranscribe[faster]

# Install all optional dependencies
pip install localtranscribe[all]

Option 2: Install from Source

# Clone repository
git clone https://github.com/aporb/LocalTranscribe.git
cd LocalTranscribe

# Create virtual environment
python3 -m venv .venv
source .venv/bin/activate

# Install in development mode
pip install -e .

Verify Installation

localtranscribe doctor

This command checks your system configuration and reports any issues.

Usage Examples

Basic Transcription

# Transcribe with automatic settings
localtranscribe process meeting.mp3

# Specify number of speakers for better accuracy
localtranscribe process interview.wav --speakers 2

# Use larger model for higher quality
localtranscribe process podcast.m4a --model medium

# Save to custom location
localtranscribe process audio.mp3 --output ./results/

Batch Processing

# Process entire folder
localtranscribe batch ./audio-files/

# Process with multiple workers
localtranscribe batch ./recordings/ --workers 4

# With custom settings
localtranscribe batch ./files/ --model small --output ./transcripts/

Single-Speaker Content

# Skip speaker detection for faster processing
localtranscribe process lecture.mp3 --skip-diarization

Advanced Options

localtranscribe process audio.mp3 \
  --model medium \              # Model: tiny|base|small|medium|large
  --speakers 3 \                # Number of speakers (if known)
  --language en \               # Force specific language
  --format txt json srt \       # Output formats
  --output ./results/ \         # Output directory
  --verbose                     # Show detailed progress

Using the Python SDK

from localtranscribe import LocalTranscribe

# Initialize with options
lt = LocalTranscribe(
    model_size="base",
    num_speakers=2,
    output_dir="./transcripts"
)

# Process single file
result = lt.process("meeting.mp3")

# Access results
print(f"Transcript: {result.transcript}")
print(f"Speakers: {result.num_speakers}")
print(f"Duration: {result.duration}s")

# Access detailed segments
for segment in result.segments:
    print(f"[{segment.speaker}] {segment.text}")

# Batch processing
results = lt.process_batch("./audio-files/", max_workers=4)
print(f"Completed: {results.successful}/{results.total}")

→ Full SDK Documentation

Output Formats

LocalTranscribe generates multiple output files for different use cases:

Format	File	Description
Markdown	`*_combined.md`	Formatted transcript with speaker labels and timestamps
Plain Text	`*_transcript.txt`	Simple text output for analysis
JSON	`*_transcript.json`	Structured data for programming
SRT	`*_transcript.srt`	Subtitle format for video
Diarization	`*_diarization.md`	Speaker timeline and statistics

Example Output:

# Combined Transcript

**Audio File:** interview.mp3
**Processing Date:** 2025-10-13 22:30:00

## SPEAKER_00
**Time:** [0.0s - 5.2s]

Hello, welcome to the show. Thanks for joining us today.

## SPEAKER_01
**Time:** [5.5s - 12.8s]

Thanks for having me. I'm excited to discuss our new project.

Commands

Command	Description	Example
`process`	Transcribe single audio file	`localtranscribe process audio.mp3`
`batch`	Process multiple files	`localtranscribe batch ./folder/`
`doctor`	Verify system setup	`localtranscribe doctor`
`label`	Replace speaker IDs with names	`localtranscribe label output.md`
`version`	Show version information	`localtranscribe version`
`config`	Manage configuration	`localtranscribe config show`

Run localtranscribe --help or localtranscribe <command> --help for detailed options.

Model Selection Guide

Choose the right Whisper model for your needs:

Model	Speed	Quality	RAM	Use Case
tiny	Fastest	Basic	1GB	Quick drafts, testing
base	Fast	Good	1GB	Most use cases
small	Moderate	Better	2GB	Professional work
medium	Slow	Excellent	5GB	Publication quality
large	Very slow	Best	10GB	Maximum accuracy

Performance on M2 Mac (10-minute audio):

tiny: ~30 seconds
base: ~2 minutes ← Recommended starting point
small: ~5 minutes
medium: ~10 minutes

System Requirements

Recommended:

Mac with Apple Silicon (M1/M2/M3/M4)
16GB RAM
10GB free disk space
macOS 12.0 or later

Minimum:

Any Mac with Python 3.9+
8GB RAM
5GB free disk space
macOS 11.0 or later

Supported Audio Formats:

MP3, WAV, M4A, OGG, FLAC, AAC, WMA
Video files (MP4, MOV, AVI) - audio will be extracted

How It Works

LocalTranscribe uses a three-stage pipeline:

1. Speaker Diarization (pyannote.audio)

Analyzes audio waveform patterns
Identifies distinct speakers
Creates precise speaker timeline
Optimized for 2-10 speakers

2. Speech-to-Text (Whisper)

Converts speech to text using OpenAI's Whisper
Automatically detects language
Handles accents and background noise
Creates timestamped segments

3. Intelligent Combination

Aligns speaker labels with transcript
Matches timestamps accurately
Formats output for readability
Generates multiple export formats

Technologies:

Whisper - State-of-the-art speech recognition
MLX-Whisper - Apple Silicon optimization
Pyannote.audio - Speaker diarization
Typer - Modern CLI framework
Rich - Beautiful terminal output

Documentation

📚 SDK Reference - Python API documentation 🐛 Troubleshooting Guide - Common issues and solutions 📝 Changelog - Version history and updates 🚀 Contributing Guide - How to contribute

Troubleshooting

Common Issues

Command not found after installation:

# Ensure package is installed
pip install --upgrade localtranscribe

# If using virtual environment, activate it first
source .venv/bin/activate

HuggingFace authentication error:

# Verify token is correctly set
cat .env

# Should show: HUGGINGFACE_TOKEN=hf_...
# Make sure you accepted both model licenses

Slow processing:

# Use a faster model
localtranscribe process audio.mp3 --model tiny

# Skip diarization for single speaker
localtranscribe process audio.mp3 --skip-diarization

Run system check:

localtranscribe doctor

This command diagnoses common setup issues and suggests fixes.

→ Full Troubleshooting Guide

What's New

v2.0.2b1 (Current)

✅ Updated package description and metadata
✅ Enhanced README with PyPI link
✅ Professional documentation polish

v2.0.1-beta

✅ Published to PyPI - Install with pip install localtranscribe
✅ Fixed pyannote.audio 3.x API compatibility
✅ Updated documentation for model licenses

v2.0.0-beta

✅ Complete rewrite with modern CLI
✅ Python SDK for programmatic use
✅ Batch processing support
✅ System health checks with doctor command
✅ Modular architecture

→ Full Changelog

Roadmap

v2.1 (Next Release)

Interactive speaker labeling (replace SPEAKER_00 with real names)
Enhanced progress indicators for large files
Resume interrupted transcription jobs
Audio quality pre-analysis

v3.0 (Future)

Real-time transcription support
Web-based interface
Docker containerization
Optional cloud sync for results

Contributing

We welcome contributions! Here's how to get started:

Check existing issues at github.com/aporb/LocalTranscribe/issues
Fork the repository and create your feature branch
Make your changes following the existing code style
Add tests if applicable
Submit a pull request with a clear description

Development Setup:

git clone https://github.com/aporb/LocalTranscribe.git
cd LocalTranscribe
python3 -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"

License

MIT License - Free for personal and commercial use.

See LICENSE for full details.

Support

Need help?

Run localtranscribe doctor to check your setup
Check the Troubleshooting Guide
Search existing issues
Open a new issue with:
- Output from localtranscribe doctor
- Error message or unexpected behavior
- Your system info (OS, Python version)

Acknowledgments

LocalTranscribe builds on excellent open-source work:

OpenAI - Whisper speech recognition model
Apple - MLX framework for Metal acceleration
Pyannote team - Speaker diarization models
HuggingFace - Model hosting and distribution

⭐ Star on GitHub • 🐛 Report Bug • 💡 Request Feature

Made for privacy-conscious professionals who value data ownership.

Transform audio to text. Know who said what. Keep it private.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

3.1.2

Oct 31, 2025

3.1.1

Oct 31, 2025

3.1.0

Oct 31, 2025

2.0.2b3 pre-release

Oct 14, 2025

This version

2.0.2b1 pre-release

Oct 14, 2025

2.0.2b0 pre-release

Oct 14, 2025

2.0.1b0 pre-release

Oct 14, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

localtranscribe-2.0.2b1.tar.gz (59.1 kB view details)

Uploaded Oct 14, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

localtranscribe-2.0.2b1-py3-none-any.whl (72.0 kB view details)

Uploaded Oct 14, 2025 Python 3

File details

Details for the file localtranscribe-2.0.2b1.tar.gz.

File metadata

Download URL: localtranscribe-2.0.2b1.tar.gz
Upload date: Oct 14, 2025
Size: 59.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for localtranscribe-2.0.2b1.tar.gz
Algorithm	Hash digest
SHA256	`52a7d4b1fda2e5049078200e0a812703f47a555bc989251dd2a38bc970b2c091`
MD5	`662705b5f89c75732bf7083e66d29cf1`
BLAKE2b-256	`47b2336fda77e96b05ebde23fe2d7fa79fb1190bf53d12ca210c3f05d1e8f74a`

See more details on using hashes here.

File details

Details for the file localtranscribe-2.0.2b1-py3-none-any.whl.

File metadata

Download URL: localtranscribe-2.0.2b1-py3-none-any.whl
Upload date: Oct 14, 2025
Size: 72.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for localtranscribe-2.0.2b1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5d83c742bd4e7281856f6a2b6fa26a7812340145d528023fec5f084dcb865c81`
MD5	`dc9094eb62afbb78d8ffa7e1da13117d`
BLAKE2b-256	`931524db06012aaaf47fc8b119b2d1221a7bfa618067c6a12e6ffed41f126b17`

See more details on using hashes here.

localtranscribe 2.0.2b1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

LocalTranscribe

Why LocalTranscribe?

Features

Quick Start

Install from PyPI

Setup HuggingFace Token (One-Time)

Transcribe Audio

Installation

Option 1: Install from PyPI (Recommended)

Option 2: Install from Source

Verify Installation

Usage Examples

Basic Transcription

Batch Processing

Single-Speaker Content

Advanced Options

Using the Python SDK

Output Formats

Commands

Model Selection Guide

System Requirements

How It Works

1. Speaker Diarization (pyannote.audio)

2. Speech-to-Text (Whisper)

3. Intelligent Combination

Documentation

Troubleshooting

Common Issues

What's New

v2.0.2b1 (Current)

v2.0.1-beta

v2.0.0-beta

Roadmap

v2.1 (Next Release)

v3.0 (Future)

Contributing

License

Support

Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes