A local/offline-capable voice assistant with speech recognition, LLM processing, and text-to-speech

These details have not been verified by PyPI

Project links

Project description

💻🎤🔊 localtalk

A privacy-first voice assistant that runs entirely offline on Apple Silicon, perfect for travelers, privacy-conscious users, and anyone who values their data sovereignty. No accounts, no cloud services, no tracking - just powerful AI that respects your privacy.

Currently, this library needs immediate work in the following areas before I can recommend usage.

Develop a "System Prompt" with various personas
Augment with local system knowledge (date/time, username, etc)

Why This Project Exists

Technology preview - While the tech isn't perfect yet, we can build something functional right now that respects your privacy and runs entirely offline.
As a vibe check on offline-first AI - How realistic is it to avoid cloud services like OpenAI and ElevenLabs? This project explores what's possible with local models and helps identify the gaps.
Future-proofing for real-time local AI - One day soon, these models and consumer computers will be capable of real-time TTS that rivals cloud services. When that day comes, this library will be ready to leverage those improvements immediately.

Why Not Use Apple's Built-in "Say" Command?

We deliberately chose not to use macOS's built-in say command for text-to-speech. While it's readily available and requires no setup, the voice quality is too robotic to meet today's user expectations. After being exposed to natural-sounding AI voices from services like ElevenLabs and OpenAI, users expect conversational AI to sound human-like. The say command's 1990s-era voice synthesis would make the assistant feel outdated and diminish the user experience, so it wasn't worth implementing as an option.

Apple's newer Speech Synthesis API offers much higher quality voices that could be a great fit for this project. However, we're waiting for proper Python library support to integrate it. Once Python bindings become available, we'll add support for these modern Apple voices as another local TTS option.

Built with speech recognition (Whisper), language model processing (Gemma3/MLX), and text-to-speech synthesis (Kokoro/ChatterBox), LocalTalk gives you the convenience of modern AI assistants without sacrificing your privacy or requiring internet connectivity.

Why "LocalTalk"?

The name "LocalTalk" is a playful homage to Apple's classic LocalTalk networking protocol from the 1980s. Just as the original LocalTalk enabled local network communication between Apple devices without needing external infrastructure, our LocalTalk enables local AI conversations without needing external cloud services.

The name works on two levels:

Local: Everything runs locally on your Mac - no internet required after initial setup
Talk: It's literally a talking app that listens and responds with voice

It's the perfect name for an offline voice assistant that embodies Apple's tradition of making powerful technology accessible and self-contained.

Features

🎤 Speech Recognition: Convert speech to text using OpenAI Whisper
🤖 Native Audio Processing: Gemma3 model with direct audio understanding
🚀 Fast TTS: MLX-Audio Kokoro for near real-time speech synthesis
🔊 Multiple TTS Options: Choose between fast Kokoro or high-quality ChatterBox
💬 Dual Input Modes: Type or speak your queries
🎭 Voice Options: Multiple voice personalities with Kokoro
💾 Fully Offline: No internet connection required after setup
🔒 100% Private: Your conversations never leave your device

Requirements

Python 3.11+
macOS with Apple Silicon (M1/M2/M3)
Microphone for voice input
MLX framework (installed automatically)

Platform Support:

macOS (Apple Silicon): ✅ Fully supported as first class platform.
Linux / CUDA backend: 🚧 Planned (see roadmap below).
Windows: 🤷🏼‍♂️ Would consider, but not seriously.

Installation - with uv

Recommended: install the CLI as a uv tool

uv tool install localtalk

# uvx also works, nice demo one-liner
uvx localtalk

Contributor/Developer Setup

Clone the repository:

git clone https://github.com/anthonywu/localtalk
cd localtalk

Create a virtual environment (using uv recommended):

uv venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

Install the package:

uv pip install -e .

Download NLTK data (required for sentence tokenization):

python -c "import nltk; nltk.download('punkt')"

MLX-VLM will automatically download models on first run
- No additional setup required
- Models are cached locally for offline use

Quick Start (Hello World)

Basic Usage

Run the voice assistant with default settings:

localtalk

This will:

Start with fast Kokoro TTS (MLX-Audio)
Use the mlx-community/gemma-3n-E2B-it-4bit model
Enable dual-modal input (type or speak)
Use base.en Whisper model for speech recognition

Complete Hello World Example

# 1. Run the voice assistant
localtalk

# 2. You'll see: "💬 Type your message or press Enter to record audio:"
# 3. Either:
#    - Type "Hello, how are you?" and press Enter
#    - OR press Enter, speak, then press Enter again
# 4. Listen to the AI's response with fast Kokoro TTS!

Different TTS Backends

# Fast mode (default) - Kokoro TTS with audio output
localtalk

# Different Kokoro voices: American female "nova"
localtalk --kokoro-voice af_nova --kokoro-speed 1.2

# Different Kokoro voices: Engish female "bella"
localtalk --kokoro-voice bf_bella --kokoro-speed 1.2

# High-quality mode - ChatterBox TTS (experimental, slow)
localtalk --use-chatterbox

Configuration Options

Command-Line Arguments

Primary AI Model Options:

--model NAME: MLX model from Huggingface Hub (default: mlx-community/gemma-3n-E2B-it-4bit)
--whisper-model SIZE: Whisper model size (default: base.en)
--temperature FLOAT: Temperature for text generation (default: 0.7)
--top-p FLOAT: Top-p sampling parameter (default: 1.0)
--max-tokens INT: Maximum tokens to generate (default: 100)

TTS Options:

--kokoro-model: Choose Kokoro model (4bit/6bit/8bit/bf16, default: 4bit)
--kokoro-voice: Voice personality (af_heart/af_nova/af_bella/bf_emma)
--kokoro-speed: Speech speed 0.5-2.0 (default: 1.0)
--no-tts: Disable TTS for text-only mode
--use-chatterbox: Use experimental ChatterBox TTS (slow but high quality)

ChatterBox Options (requires --use-chatterbox):

--exaggeration FLOAT: Emotion intensity (0.0-1.0, default: 0.5)
--cfg-weight FLOAT: Pacing control (0.0-1.0, default: 0.5)
--tts-quality: Use quality mode instead of fast mode

Other Options:

--save-voice: Save generated audio responses
--system-prompt: Custom system prompt for the LLM

Example Configurations

Calm, professional assistant (ChatterBox):

localtalk --use-chatterbox --exaggeration 0.3 --cfg-weight 0.7 --temperature 0.5

Expressive, dynamic assistant (ChatterBox):

localtalk --use-chatterbox --exaggeration 0.8 --cfg-weight 0.3 --temperature 0.9

Using a different model:

localtalk --model mlx-community/Llama-3.2-3B-Instruct-4bit --whisper-model small.en

Secrets and API Keys

Good news! This application requires NO API keys or secrets to run.

Everything runs locally on your Mac!

✅ Whisper: Runs locally, no API key needed
✅ MLX-LM: Runs locally on Apple Silicon, no API key needed
✅ ChatterBox: Runs locally, no API key needed

Advanced Usage

Programmatic Usage

You can also use the voice assistant programmatically:

from localtalk import VoiceAssistant, AppConfig

# Create custom configuration
config = AppConfig()
config.mlx_lm.model = "mlx-community/Llama-3.2-3B-Instruct-4bit"
config.chatterbox.exaggeration = 0.7

# Create and run assistant
assistant = VoiceAssistant(config)
assistant.run()

Custom System Prompts

localtalk --system-prompt "You are a pirate. Respond in pirate speak, matey!"

Troubleshooting

Common Issues

"Model not found" error:
- The model will be automatically downloaded on first use
- Ensure you have a stable internet connection for the initial download
- Check that you have sufficient disk space (~4-8GB per model)
"No microphone found" error:
- Check your system's audio permissions
- Ensure your microphone is properly connected
- Try specifying a different audio device
"Out of memory" error:
- MLX is optimized for Apple Silicon but large models may still require significant RAM
- Try using a smaller/quantized model
- Close other applications to free up memory
Poor voice cloning quality:
- Use a longer, clearer voice sample (10-30 seconds)
- Ensure the sample has minimal background noise
- Try adjusting exaggeration and cfg-weight parameters

Development

Running Tests

# Install dev dependencies
uv pip install -e ".[dev]"

# Run tests
pytest

# Run with coverage
pytest --cov

Code Style

# Format code
ruff format

# Lint code
ruff check --fix

License

MIT License - see LICENSE file for details.

Acknowledgments

Apple MLX team for the efficient ML framework for Apple Silicon
MLX-LM community for providing quantized models
OpenAI Whisper for speech recognition
Resemble AI for ChatterBox TTS

Future Plans & Roadmap

Language Support

Currently, LocalTalk supports English (American and British accents). Chinese language support is coming next, with other major world languages to follow. The underlying models (Whisper, Gemma3, and Kokoro) already have multilingual capabilities - we just need to wire up the language detection and configuration.

Contributors welcome! If you'd like to help add support for your language, please check our Issues page or submit a PR. Language additions mainly involve:

Configuring Whisper for the target language
Testing Gemma3's response quality in that language
Setting up Kokoro TTS with appropriate voice models
Adding language-specific prompts and examples

Offline Knowledge Base

We're planning to add support for offline data sources to augment the LLM's knowledge while maintaining complete privacy:

Offline Wikipedia: Full-text search and retrieval from Wikipedia dumps
Personal Documents: Index and query your own documents, notes, and PDFs
Technical Documentation: Offline access to programming docs, manuals, and references
Custom Knowledge Bases: Import and index any structured data source

This will enable LocalTalk to provide informed responses about current events, technical topics, and personal information - all while keeping everything local and private on your device. The RAG (Retrieval Augmented Generation) pipeline will seamlessly integrate with the voice interface.

Other Planned Features

Real-time streaming: Stream responses as they're generated
Multi-turn conversations: Better context management for longer discussions
Voice activity detection: Automatic recording start/stop
Custom wake words: "Hey LocalTalk" activation
Model hot-swapping: Switch between models without restarting
Voice profiles: Save and switch between different voice configurations
Plugin system: Extend functionality with custom modules
Platform support: Linux support (P2), Windows consideration (P3)

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.3.0

Dec 31, 2025

0.2.0

Dec 2, 2025

0.1.0a4 pre-release

Jul 30, 2025

0.1.0a3 pre-release

Jul 27, 2025

This version

0.1.0a2 pre-release

Jul 26, 2025

0.1.0a1 pre-release

Jul 26, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

localtalk-0.1.0a2.tar.gz (23.7 kB view details)

Uploaded Jul 26, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

localtalk-0.1.0a2-py3-none-any.whl (33.8 kB view details)

Uploaded Jul 26, 2025 Python 3

File details

Details for the file localtalk-0.1.0a2.tar.gz.

File metadata

Download URL: localtalk-0.1.0a2.tar.gz
Upload date: Jul 26, 2025
Size: 23.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.7.8

File hashes

Hashes for localtalk-0.1.0a2.tar.gz
Algorithm	Hash digest
SHA256	`01a249923ac45d31f0e0e53579707519b96a1e2dcb598d0c4de62af00e80855e`
MD5	`71d03dd33e84cf3673e26cf458acbc34`
BLAKE2b-256	`1ac826992405e2b177a38be68a8bc5084ad7da06d090575607bff4d9ca277ff8`

See more details on using hashes here.

File details

Details for the file localtalk-0.1.0a2-py3-none-any.whl.

File metadata

Download URL: localtalk-0.1.0a2-py3-none-any.whl
Upload date: Jul 26, 2025
Size: 33.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.7.8

File hashes

Hashes for localtalk-0.1.0a2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c05d781c88ea6c18eead8d2da45e437da0e241cbb1a01800c49084f0b1bada3a`
MD5	`1f9afce11e787a692b6ff6d7c74096cd`
BLAKE2b-256	`2687ee1b0ed3abcf74af6cbddd4318c06ebf56b719ec61e68110bebde5f63429`

See more details on using hashes here.

localtalk 0.1.0a2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

💻🎤🔊 localtalk

Why This Project Exists

Why Not Use Apple's Built-in "Say" Command?

Why "LocalTalk"?

Features

Requirements

Installation - with uv

Contributor/Developer Setup

Quick Start (Hello World)

Basic Usage

Complete Hello World Example

Different TTS Backends

Configuration Options

Command-Line Arguments

Example Configurations

Secrets and API Keys

Advanced Usage

Programmatic Usage

Custom System Prompts

Troubleshooting

Common Issues

Development

Running Tests

Code Style

License

Acknowledgments

Future Plans & Roadmap

Language Support

Offline Knowledge Base

Other Planned Features

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes