Multi-provider TTS tool compatible with macOS say command. Supports Chatterbox TTS for local; ElevenLabs, OpenAI, AWS Polly for cloud APIs
Project description
gensay
A multi-provider text-to-speech (TTS) tool that implements the Apple macOS /usr/bin/say command interface while supporting multiple TTS backends including Chatterbox (local AI), OpenAI, ElevenLabs, and Amazon Polly.
Features
- macOS
sayCompatible: Drop-in replacement for the macOSsaycommand with identical CLI interface - Multiple TTS Providers: Extensible provider system with support for:
- macOS native
saycommand (default on macOS) - Chatterbox (local AI TTS, default on other platforms)
- ElevenLabs (cloud API)
- OpenAI TTS (cloud API)
- Amazon Polly (cloud API)
- Mock provider for testing
- macOS native
- Smart Text Chunking: Intelligently splits long text for optimal TTS processing
- Audio Caching: Automatic caching with LRU eviction to speed up repeated synthesis
- Progress Tracking: Built-in progress bars with tqdm and customizable callbacks
- Multiple Audio Formats: Support for AIFF, WAV, M4A, MP3, CAF, FLAC, AAC, OGG
- Background Pre-caching: Queue and cache audio chunks in the background (Chatterbox only)
- Interactive REPL Mode: Start an interactive session with provider initialized once for repeated use
- Named Pipe Listener: Listen on a FIFO for text input from other processes
Table of Contents
- Installation
- Quick Start
- Command Line Usage
- Python API
- Provider Configurations
- Advanced Features
- Development
- License
Installation
It's 2026, use uv
gensay is intended to be used as a CLI tool that is a drop-in replacement to the macOS say CLI.
System Dependencies (ElevenLabs provider only)
PortAudio is required if you plan to use the ElevenLabs provider. The pyaudio dependency needs the PortAudio C library to compile successfully.
Other providers (macOS, OpenAI, Amazon Polly, Chatterbox) do not require PortAudio.
Homebrew (macOS):
brew install portaudio
Nix:
nix-env -iA nixpkgs.portaudio
Install gensay
# Install as a tool
uv tool install gensay
# With extras: ElevenLabs provider (requires PortAudio, see above)
pip install 'gensay[elevenlabs]'
# With extras: Chatterbox provider (local Text-to-Speech model, ~2GB PyTorch dependencies)
uv tool install 'gensay[chatterbox]' \
--with git+https://github.com/anthonywu/chatterbox.git@allow-dep-updates
# Or add to your project
uv add gensay
# From source (with automatic PortAudio path configuration)
git clone https://github.com/anthonywu/gensay
cd gensay
just setup
Optional Dependencies
# Audio format conversion (for non-native formats like MP3, OGG, FLAC)
# Requires ffmpeg installed on system
pip install 'gensay[audio-formats]'
# Install all optional dependencies
pip install 'gensay[all]'
DInstallation Help:
- PyAudio documentation - For PortAudio/PyAudio installation issues
- ElevenLabs Python library docs - Official ElevenLabs Python documentation
For developer/maintainer installation, just setup automatically configures PortAudio and FFmpeg paths for both Nix and Homebrew.
Developer/Maintainer Build Dependencies
PortAudio Paths (for ElevenLabs)
Homebrew:
export C_INCLUDE_PATH="$(brew --prefix portaudio)/include:$C_INCLUDE_PATH"
export LIBRARY_PATH="$(brew --prefix portaudio)/lib:$LIBRARY_PATH"
Nix:
export C_INCLUDE_PATH="$(nix-build '<nixpkgs>' -A portaudio --no-out-link)/include:$C_INCLUDE_PATH"
export LIBRARY_PATH="$(nix-build '<nixpkgs>' -A portaudio --no-out-link)/lib:$LIBRARY_PATH"
Then install into local venv:
uv sync --all-extras
# temporarily, we have to use a special release of chatterbox library to allow for dependency resolution
uv pip install git+https://github.com/anthonywu/chatterbox.git@allow-dep-updates
FFmpeg Library Path (for Chatterbox on macOS)
Chatterbox uses TorchCodec which requires FFmpeg libraries at runtime. On macOS, set DYLD_LIBRARY_PATH before running gensay:
Homebrew:
export DYLD_LIBRARY_PATH="$(brew --prefix ffmpeg)/lib:$DYLD_LIBRARY_PATH"
gensay --provider chatterbox "Hello"
Nix:
# Find the ffmpeg-lib output in the Nix store
FFMPEG_LIB=$(nix-store -qR "$(which ffmpeg)" | grep 'ffmpeg.*-lib$')
export DYLD_LIBRARY_PATH="$FFMPEG_LIB/lib:$DYLD_LIBRARY_PATH"
gensay --provider chatterbox "Hello"
Note: DYLD_LIBRARY_PATH must be set before the Python process starts; it cannot be set from within Python.
Quick Start
# Basic usage - speaks the text
gensay "Hello, world!"
# Use specific voice
gensay -v Samantha "Hello from Samantha"
# Save to audio file
gensay -o greeting.m4a "Welcome to gensay"
# List available voices (two ways)
gensay -v '?'
gensay --list-voices
Command Line Usage
Basic Options
# Speak text
gensay "Hello, world!"
# Read from file
gensay -f document.txt
# Read from stdin
echo "Hello from pipe" | gensay -f -
# Specify voice
gensay -v Alex "Hello from Alex"
# Adjust speech rate (words per minute)
gensay -r 200 "Speaking faster"
# Save to file
gensay -o output.m4a "Save this speech"
# Specify audio format
gensay -o output.wav --format wav "Different format"
Provider Selection
# Use macOS native say command
gensay --provider macos "Using system TTS"
# List voices for specific provider
gensay --provider macos --list-voices
gensay --provider mock --list-voices
# Use mock provider for testing
gensay --provider mock "Testing without real TTS"
# Use Chatterbox explicitly
gensay --provider chatterbox "Local AI voice"
# Default provider depends on platform
gensay "Hello" # Uses 'macos' on macOS, 'chatterbox' on other platforms
Advanced Options
# Show progress bar
gensay --progress "Long text with progress tracking"
# Pre-cache audio chunks in background
gensay --provider chatterbox --cache-ahead "Pre-process this text"
# Adjust chunk size
gensay --chunk-size 1000 "Process in larger chunks"
# Cache management
gensay --cache-stats # Show cache statistics
gensay --clear-cache # Clear all cached audio
gensay --no-cache "Text" # Disable cache for this run
Interactive Modes and Performance Optimization
REPL Mode
Start an interactive session where the provider is initialized once and reused for each prompt. This avoids the overhead of re-initializing the provider.
Tip: For Chatterbox and other local AI models, model loading from disk to memory is expensive (several seconds). Use
--replor--listenmode to load the model once and process many prompts without reloading.
# Start REPL mode (--repl, --interactive, and -i are all equivalent)
gensay --repl
gensay --interactive
gensay -i
# With a specific provider and voice
gensay --provider openai -v nova --repl
# Chatterbox with REPL (recommended - keeps model loaded)
gensay -p chatterbox -i
In REPL mode:
- Type text and press Enter to speak it
- Type
exitorquitto exit - Press Ctrl+C or Ctrl+D to exit
Named Pipe (FIFO) Listener
Listen on a named pipe for text input, allowing other processes to send text to be spoken. Useful for integrating TTS into scripts or other applications.
Tip: Like REPL mode,
--listenkeeps the provider loaded between requests—ideal for Chatterbox and other local models where initialization is slow.
# Start listening on default pipe (/tmp/gensay.pipe)
gensay --listen
# Use a custom pipe path
gensay --listen /tmp/my-tts.pipe
# With a specific provider (Chatterbox benefits most from persistent mode)
gensay --provider chatterbox --listen
gensay --provider polly -v Joanna --listen
From another terminal or script, send text to the pipe:
echo "Hello from another process" > /tmp/gensay.pipe
The listener runs until interrupted with Ctrl+C. The named pipe is created automatically if it doesn't exist.
Python API
Basic Usage
from gensay import ChatterboxProvider, TTSConfig, AudioFormat
# Create provider
provider = ChatterboxProvider()
# Speak text
provider.speak("Hello from Python")
# Save to file
provider.save_to_file("Save this", "output.m4a")
# List voices
voices = provider.list_voices()
for voice in voices:
print(f"{voice['id']}: {voice['name']}")
Advanced Configuration
from gensay import ChatterboxProvider, TTSConfig, AudioFormat
# Configure TTS
config = TTSConfig(
voice="default",
rate=150,
format=AudioFormat.M4A,
cache_enabled=True,
extra={
'show_progress': True,
'chunk_size': 500
}
)
# Create provider with config
provider = ChatterboxProvider(config)
# Add progress callback
def on_progress(progress: float, message: str):
print(f"Progress: {progress:.0%} - {message}")
config.progress_callback = on_progress
# Use the configured provider
provider.speak("Text with all options configured")
Text Chunking
from gensay import chunk_text_for_tts, TextChunker
# Simple chunking
chunks = chunk_text_for_tts(long_text, max_chunk_size=500)
# Advanced chunking with custom strategy
chunker = TextChunker(
max_chunk_size=1000,
strategy="paragraph", # or "sentence", "word", "character"
overlap_size=50
)
chunks = chunker.chunk_text(document)
Provider Configurations
ElevenLabs
- Install the optional dependency (requires PortAudio):
pip install 'gensay[elevenlabs]'
- Get an API key from ElevenLabs
- Set the environment variable:
export ELEVENLABS_API_KEY="your-api-key"
# List ElevenLabs voices
gensay --provider elevenlabs --list-voices
# Use a specific ElevenLabs voice
gensay --provider elevenlabs -v Rachel "Hello from ElevenLabs"
# Save to file with high quality
gensay --provider elevenlabs -o speech.mp3 "High quality AI speech"
OpenAI TTS
- Get an API key from OpenAI Platform
- Set the environment variable:
export OPENAI_API_KEY="sk-..."
# List OpenAI voices
gensay --provider openai --list-voices
# Use a specific voice (alloy, ash, ballad, coral, echo, fable, onyx, nova, sage, shimmer)
gensay --provider openai -v nova "Hello from OpenAI"
# Save to file
gensay --provider openai -o speech.mp3 "OpenAI TTS output"
OpenAI offers two models via config.extra['model']:
tts-1(default): Faster, lower latencytts-1-hd: Higher quality audio
Amazon Polly
Option A - Environment variables:
- Sign in to AWS Console
- Go to IAM → Users → Create user
- Attach the
AmazonPollyReadOnlyAccesspolicy - Create access keys under Security credentials → Access keys
- Configure credentials (choose one method):
export AWS_ACCESS_KEY_ID="AKIA..."
export AWS_SECRET_ACCESS_KEY="..."
export AWS_DEFAULT_REGION="us-west-2"
Option B - AWS CLI v2:
This easy lets you sign in through the AWS Command Line Interface
export AWS_DEFAULT_REGION=us-west-2
# on your desktop with a browser
aws login --region
# in an env without a browser
aws login --region --remote
# List Polly voices (60+ voices in many languages)
gensay --provider polly --list-voices
# Use a specific voice
gensay --provider polly -v Joanna "Hello from Amazon Polly"
# Save to file
gensay --provider polly -o speech.mp3 "Polly TTS output"
Polly supports multiple engines via config.extra['engine']:
neural(default): Higher quality, natural-soundingstandard: Lower cost, available for all voices
Advanced Features
Caching System
The caching system automatically stores generated audio to speed up repeated synthesis:
from gensay import TTSCache
# Create cache instance
cache = TTSCache(
enabled=True,
max_size_mb=10000,
max_items=1000
)
# Get cache statistics
stats = cache.get_stats()
print(f"Cache size: {stats['size_mb']:.2f} MB")
print(f"Cached items: {stats['items']}")
# Clear cache
cache.clear()
Cache Location
Cache files are stored in platform-specific user cache directories:
- macOS:
~/Library/Caches/gensay - Linux:
~/.cache/gensay - Windows:
%LOCALAPPDATA%\gensay\gensay\Cache
Managing Cache
# Show cache statistics
gensay --cache-stats
# Clear all cached audio
gensay --clear-cache
# Disable caching for a specific command
gensay --no-cache "Text to synthesize without caching"
Manual Deletion
To manually delete the cache, remove the cache directory:
# macOS/Linux
rm -rf ~/Library/Caches/gensay # macOS
rm -rf ~/.cache/gensay # Linux
# Windows (PowerShell)
Remove-Item -Recurse -Force $env:LOCALAPPDATA\gensay\gensay\Cache
Creating Custom Providers
from gensay.providers import TTSProvider, TTSConfig, AudioFormat
from typing import Optional, Union, Any
from pathlib import Path
class MyCustomProvider(TTSProvider):
def speak(self, text: str, voice: Optional[str] = None,
rate: Optional[int] = None) -> None:
# Your implementation
self.update_progress(0.5, "Halfway done")
# ... generate and play audio ...
self.update_progress(1.0, "Complete")
def save_to_file(self, text: str, output_path: Union[str, Path],
voice: Optional[str] = None, rate: Optional[int] = None,
format: Optional[AudioFormat] = None) -> Path:
# Your implementation
return Path(output_path)
def list_voices(self) -> list[dict[str, Any]]:
return [
{'id': 'voice1', 'name': 'Voice One', 'language': 'en-US'}
]
def get_supported_formats(self) -> list[AudioFormat]:
return [AudioFormat.WAV, AudioFormat.MP3]
Async Support
All providers support async operations:
import asyncio
from gensay import ChatterboxProvider
async def main():
provider = ChatterboxProvider()
# Async speak
await provider.speak_async("Async speech")
# Async save
await provider.save_to_file_async("Async save", "output.m4a")
asyncio.run(main())
Development
This project uses just for common development tasks. First, install just:
# macOS (using Nix which you already have)
nix-env -iA nixpkgs.just
# Or using Homebrew
brew install just
# Or using cargo
cargo install just
Getting Started
# Setup development environment
just setup
# Run tests
just test
# Run all quality checks
just check
# See all available commands
just
Common Development Commands
Testing
# Run all tests
just test
# Run tests with coverage
just test-cov
# Run specific test
just test-specific tests/test_providers.py::test_mock_provider_speak
# Quick test (mock provider only)
just quick-test
Code Quality
# Run linter
just lint
# Auto-fix linting issues
just lint-fix
# Format code
just format
# Type checking
just typecheck
# Run all checks (lint, format, typecheck)
just check
# Pre-commit checks (format, lint, test)
just pre-commit
Running the CLI
# Run with mock provider
just run-mock "Hello, world!"
just run-mock -v '?'
# Run with macOS provider
just run-macos "Hello from macOS"
# Cache management
just cache-stats
just cache-clear
Development Utilities
# Run example script
just demo
# Clean build artifacts
just clean
# Build package
just build
Manual Setup (without just)
If you prefer not to use just, here are the equivalent commands:
# Setup
uv venv
uv pip install -e ".[dev]"
# Testing
uv run pytest -v
uv run pytest --cov=gensay --cov-report=term-missing
# Linting and formatting
uv run ruff check src tests
uv run ruff format src tests
# Type checking
uvx ty check src
Project Structure
gensay/
├── src/gensay/
│ ├── __init__.py
│ ├── main.py # CLI entry point
│ ├── providers/ # TTS provider implementations
│ │ ├── base.py # Abstract base provider
│ │ ├── chatterbox.py # Chatterbox provider
│ │ ├── macos_say.py # macOS say wrapper
│ │ └── ... # Other providers
│ ├── cache.py # Caching system
│ └── text_chunker.py # Text chunking logic
├── tests/ # Test suite
├── examples/ # Example scripts
├── justfile # Development commands
└── README.md
Code Style Guide
- Python 3.11+ with type hints
- Follow PEP8 and Google Python Style Guide
- Use
rufffor linting and formatting - Keep docstrings concise but informative
- Prefer
pathlib.Pathoveros.path - Use
pytestfor testing
License
gensay is distributed under the terms of the MIT license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gensay-0.4.2.tar.gz.
File metadata
- Download URL: gensay-0.4.2.tar.gz
- Upload date:
- Size: 28.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.20 {"installer":{"name":"uv","version":"0.9.20","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
78637bc6a42f5c7be16c48c5bdc2d968a2f5ab9bf7f1a3e9d20855d4f2bb95d9
|
|
| MD5 |
15c1b5fb591f1bc40c5a9b4aadd90a2a
|
|
| BLAKE2b-256 |
89c84b8602177f638682ffa24b6f1759daa8e7af92a23c2658835c11983ba615
|
File details
Details for the file gensay-0.4.2-py3-none-any.whl.
File metadata
- Download URL: gensay-0.4.2-py3-none-any.whl
- Upload date:
- Size: 36.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.20 {"installer":{"name":"uv","version":"0.9.20","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ab989e36666e5bfaba04ac63927cc7b85bc9910863e7d7f760dcc6cdcc348e8f
|
|
| MD5 |
94ef42b0f0eb567772f889f16e1ebcae
|
|
| BLAKE2b-256 |
fe5c008b9222f811380e4d79d6c40259f429d179e2f3a5ba58bda7f400caf33e
|