Skip to main content

A local/offline-capable voice assistant with speech recognition, LLM processing, and text-to-speech

Project description

💻🎤🔊 localtalk

A privacy-first voice assistant that runs entirely offline on Apple Silicon, perfect for travelers, privacy-conscious users, and anyone who values their data sovereignty. No accounts, no cloud services, no tracking - just powerful AI that respects your privacy.

Currently, this library needs immediate work in the following areas before I can recommend usage.

  • Develop a "System Prompt" with various personas
  • Augment with local system knowledge (date/time, username, etc)

Why This Project Exists

  1. Technology preview - While the tech isn't perfect yet, we can build something functional right now that respects your privacy and runs entirely offline.

  2. As a vibe check on offline-first AI - How realistic is it to avoid cloud services like OpenAI and ElevenLabs? This project explores what's possible with local models and helps identify the gaps.

  3. Future-proofing for real-time local AI - One day soon, these models and consumer computers will be capable of real-time TTS that rivals cloud services. When that day comes, this library will be ready to leverage those improvements immediately.

Why Not Use Apple's Built-in "Say" Command?

We deliberately chose not to use macOS's built-in say command for text-to-speech. While it's readily available and requires no setup, the voice quality is too robotic to meet today's user expectations. After being exposed to natural-sounding AI voices from services like ElevenLabs and OpenAI, users expect conversational AI to sound human-like. The say command's 1990s-era voice synthesis would make the assistant feel outdated and diminish the user experience, so it wasn't worth implementing as an option.

Apple's newer Speech Synthesis API offers much higher quality voices that could be a great fit for this project. However, we're waiting for proper Python library support to integrate it. Once Python bindings become available, we'll add support for these modern Apple voices as another local TTS option.

Built with speech recognition (Whisper), language model processing (Gemma3/MLX), and text-to-speech synthesis (Kokoro/ChatterBox), LocalTalk gives you the convenience of modern AI assistants without sacrificing your privacy or requiring internet connectivity.

Why "LocalTalk"?

The name "LocalTalk" is a playful homage to Apple's classic LocalTalk networking protocol from the 1980s. Just as the original LocalTalk enabled local network communication between Apple devices without needing external infrastructure, our LocalTalk enables local AI conversations without needing external cloud services.

The name works on two levels:

  • Local: Everything runs locally on your Mac - no internet required after initial setup
  • Talk: It's literally a talking app that listens and responds with voice

It's the perfect name for an offline voice assistant that embodies Apple's tradition of making powerful technology accessible and self-contained.

Features

  • 🎤 Speech Recognition: Convert speech to text using OpenAI Whisper
  • 🤖 Native Audio Processing: Gemma3 model with direct audio understanding
  • 🚀 Fast TTS: MLX-Audio Kokoro for near real-time speech synthesis
  • 🔊 Multiple TTS Options: Choose between fast Kokoro or high-quality ChatterBox
  • 💬 Dual Input Modes: Type or speak your queries
  • 🎭 Voice Options: Multiple voice personalities with Kokoro
  • 💾 Fully Offline: No internet connection required after setup
  • 🔒 100% Private: Your conversations never leave your device

Requirements

  • Python 3.11+
  • macOS with Apple Silicon (M1/M2/M3)
  • Microphone for voice input
  • MLX framework (installed automatically)

Platform Support:

  • macOS (Apple Silicon): ✅ Fully supported as first class platform.
  • Linux / CUDA backend: 🚧 Planned (see roadmap below).
  • Windows: 🤷🏼‍♂️ Would consider, but not seriously.

Installation - with uv

Recommended: install the CLI as a uv tool

uv tool install localtalk

# uvx also works, nice demo one-liner
uvx localtalk

Contributor/Developer Setup

  1. Clone the repository:
git clone https://github.com/anthonywu/localtalk
cd localtalk
  1. Create a virtual environment (using uv recommended):
uv venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
  1. Install the package:
uv pip install -e .
  1. Download NLTK data (required for sentence tokenization):
python -c "import nltk; nltk.download('punkt')"
  1. MLX-VLM will automatically download models on first run
    • No additional setup required
    • Models are cached locally for offline use

Quick Start (Hello World)

Basic Usage

Run the voice assistant with default settings:

localtalk

This will:

  1. Start with fast Kokoro TTS (MLX-Audio)
  2. Use the mlx-community/gemma-3n-E2B-it-4bit model
  3. Enable dual-modal input (type or speak)
  4. Use base.en Whisper model for speech recognition

Complete Hello World Example

# 1. Run the voice assistant
localtalk

# 2. You'll see: "💬 Type your message or press Enter to record audio:"
# 3. Either:
#    - Type "Hello, how are you?" and press Enter
#    - OR press Enter, speak, then press Enter again
# 4. Listen to the AI's response with fast Kokoro TTS!

Different TTS Backends

# Fast mode (default) - Kokoro TTS with audio output
localtalk

# Different Kokoro voices: American female "nova"
localtalk --kokoro-voice af_nova --kokoro-speed 1.2

# Different Kokoro voices: Engish female "bella"
localtalk --kokoro-voice bf_bella --kokoro-speed 1.2

# High-quality mode - ChatterBox TTS (experimental, slow)
localtalk --use-chatterbox

Configuration Options

Command-Line Arguments

Primary AI Model Options:

  • --model NAME: MLX model from Huggingface Hub (default: mlx-community/gemma-3n-E2B-it-4bit)
  • --whisper-model SIZE: Whisper model size (default: base.en)
  • --temperature FLOAT: Temperature for text generation (default: 0.7)
  • --top-p FLOAT: Top-p sampling parameter (default: 1.0)
  • --max-tokens INT: Maximum tokens to generate (default: 100)

TTS Options:

  • --kokoro-model: Choose Kokoro model (4bit/6bit/8bit/bf16, default: 4bit)
  • --kokoro-voice: Voice personality (af_heart/af_nova/af_bella/bf_emma)
  • --kokoro-speed: Speech speed 0.5-2.0 (default: 1.0)
  • --no-tts: Disable TTS for text-only mode
  • --use-chatterbox: Use experimental ChatterBox TTS (slow but high quality)

ChatterBox Options (requires --use-chatterbox):

  • --exaggeration FLOAT: Emotion intensity (0.0-1.0, default: 0.5)
  • --cfg-weight FLOAT: Pacing control (0.0-1.0, default: 0.5)
  • --tts-quality: Use quality mode instead of fast mode

Other Options:

  • --save-voice: Save generated audio responses
  • --system-prompt: Custom system prompt for the LLM

Example Configurations

Calm, professional assistant (ChatterBox):

localtalk --use-chatterbox --exaggeration 0.3 --cfg-weight 0.7 --temperature 0.5

Expressive, dynamic assistant (ChatterBox):

localtalk --use-chatterbox --exaggeration 0.8 --cfg-weight 0.3 --temperature 0.9

Using a different model:

localtalk --model mlx-community/Llama-3.2-3B-Instruct-4bit --whisper-model small.en

Secrets and API Keys

Good news! This application requires NO API keys or secrets to run.

Everything runs locally on your Mac!

  • Whisper: Runs locally, no API key needed
  • MLX-LM: Runs locally on Apple Silicon, no API key needed
  • ChatterBox: Runs locally, no API key needed

Advanced Usage

Programmatic Usage

You can also use the voice assistant programmatically:

from localtalk import VoiceAssistant, AppConfig

# Create custom configuration
config = AppConfig()
config.mlx_lm.model = "mlx-community/Llama-3.2-3B-Instruct-4bit"
config.chatterbox.exaggeration = 0.7

# Create and run assistant
assistant = VoiceAssistant(config)
assistant.run()

Custom System Prompts

localtalk --system-prompt "You are a pirate. Respond in pirate speak, matey!"

Troubleshooting

Common Issues

  1. "Model not found" error:

    • The model will be automatically downloaded on first use
    • Ensure you have a stable internet connection for the initial download
    • Check that you have sufficient disk space (~4-8GB per model)
  2. "No microphone found" error:

    • Check your system's audio permissions
    • Ensure your microphone is properly connected
    • Try specifying a different audio device
  3. "Out of memory" error:

    • MLX is optimized for Apple Silicon but large models may still require significant RAM
    • Try using a smaller/quantized model
    • Close other applications to free up memory
  4. Poor voice cloning quality:

    • Use a longer, clearer voice sample (10-30 seconds)
    • Ensure the sample has minimal background noise
    • Try adjusting exaggeration and cfg-weight parameters

Development

Running Tests

# Install dev dependencies
uv pip install -e ".[dev]"

# Run tests
pytest

# Run with coverage
pytest --cov

Code Style

# Format code
ruff format

# Lint code
ruff check --fix

License

MIT License - see LICENSE file for details.

Acknowledgments

  • Apple MLX team for the efficient ML framework for Apple Silicon
  • MLX-LM community for providing quantized models
  • OpenAI Whisper for speech recognition
  • Resemble AI for ChatterBox TTS

Future Plans & Roadmap

Language Support

Currently, LocalTalk supports English (American and British accents). Chinese language support is coming next, with other major world languages to follow. The underlying models (Whisper, Gemma3, and Kokoro) already have multilingual capabilities - we just need to wire up the language detection and configuration.

Contributors welcome! If you'd like to help add support for your language, please check our Issues page or submit a PR. Language additions mainly involve:

  • Configuring Whisper for the target language
  • Testing Gemma3's response quality in that language
  • Setting up Kokoro TTS with appropriate voice models
  • Adding language-specific prompts and examples

Offline Knowledge Base

We're planning to add support for offline data sources to augment the LLM's knowledge while maintaining complete privacy:

  • Offline Wikipedia: Full-text search and retrieval from Wikipedia dumps
  • Personal Documents: Index and query your own documents, notes, and PDFs
  • Technical Documentation: Offline access to programming docs, manuals, and references
  • Custom Knowledge Bases: Import and index any structured data source

This will enable LocalTalk to provide informed responses about current events, technical topics, and personal information - all while keeping everything local and private on your device. The RAG (Retrieval Augmented Generation) pipeline will seamlessly integrate with the voice interface.

Other Planned Features

  • Real-time streaming: Stream responses as they're generated
  • Multi-turn conversations: Better context management for longer discussions
  • Voice activity detection: Automatic recording start/stop
  • Custom wake words: "Hey LocalTalk" activation
  • Model hot-swapping: Switch between models without restarting
  • Voice profiles: Save and switch between different voice configurations
  • Plugin system: Extend functionality with custom modules
  • Platform support: Linux support (P2), Windows consideration (P3)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

localtalk-0.1.0a2.tar.gz (23.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

localtalk-0.1.0a2-py3-none-any.whl (33.8 kB view details)

Uploaded Python 3

File details

Details for the file localtalk-0.1.0a2.tar.gz.

File metadata

  • Download URL: localtalk-0.1.0a2.tar.gz
  • Upload date:
  • Size: 23.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.8

File hashes

Hashes for localtalk-0.1.0a2.tar.gz
Algorithm Hash digest
SHA256 01a249923ac45d31f0e0e53579707519b96a1e2dcb598d0c4de62af00e80855e
MD5 71d03dd33e84cf3673e26cf458acbc34
BLAKE2b-256 1ac826992405e2b177a38be68a8bc5084ad7da06d090575607bff4d9ca277ff8

See more details on using hashes here.

File details

Details for the file localtalk-0.1.0a2-py3-none-any.whl.

File metadata

File hashes

Hashes for localtalk-0.1.0a2-py3-none-any.whl
Algorithm Hash digest
SHA256 c05d781c88ea6c18eead8d2da45e437da0e241cbb1a01800c49084f0b1bada3a
MD5 1f9afce11e787a692b6ff6d7c74096cd
BLAKE2b-256 2687ee1b0ed3abcf74af6cbddd4318c06ebf56b719ec61e68110bebde5f63429

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page