A CLI that provides TTS using Amazon Polly
Project description
aws-polly-tts-tool
Professional AWS Polly TTS CLI and library for text-to-speech synthesis with agent-friendly design.
Table of Contents
- About
- Why CLI-First?
- Features
- Installation
- Configuration
- Usage
- Library Usage
- Commands
- Known Issues
- Development
- Resources
- License
About
aws-polly-tts-tool is a comprehensive CLI tool and Python library for Amazon Polly text-to-speech synthesis. Built with a CLI-first philosophy, it provides both command-line convenience and programmatic access to AWS Polly's full feature set.
What is Amazon Polly?
Amazon Polly is AWS's fully-managed text-to-speech service that converts text into lifelike speech using deep learning. It offers 60+ voices in 30+ languages with multiple quality tiers.
Why This Tool?
- Agent-Friendly: Designed for Claude Code and AI agents with self-documenting help and structured errors
- Composable: JSON output to stdout, logs to stderr - perfect for Unix piping
- Dual-Mode: Use as CLI or import as Python library
- Production-Ready: Type-safe, tested, linted with comprehensive error handling
- Cost-Transparent: Real-time cost estimates and AWS billing integration
Why CLI-First?
This tool prioritizes CLI design to enable:
- ๐ค AI Agent Integration: Claude Code and other AI tools can use structured commands and parse outputs
- ๐ ReAct Loops: Clear error messages help agents self-correct and retry operations
- ๐ Composability: Standard Unix patterns (stdin/stdout/stderr) enable piping and automation
- ๐งฑ Building Blocks: Commands serve as reusable components for skills, MCP servers, and scripts
- ๐ Predictability: Type-safe implementation ensures consistent behavior in automated workflows
Features
Voice Engines
- โ Standard - Cost-effective traditional TTS ($4/1M chars)
- โ Neural - Natural, human-like voices ($16/1M chars)
- โ Generative - Most advanced, emotionally engaged ($30/1M chars)
- โ Long-form - Optimized for audiobooks ($100/1M chars)
Voice Selection
- ๐ข 60+ voices across 30+ languages
- ๐ Dynamic fetching from Polly API (always up-to-date)
- ๐๏ธ Filter by engine, language, gender
- ๐ Multiple accents and speaking styles
Output Options
- ๐ต mp3 - General purpose (default)
- ๐ถ ogg_vorbis - Open format for web
- ๐๏ธ pcm - Raw audio, lowest latency
Advanced Features
- ๐ Full SSML support (prosody, breaks, emphasis, phonemes)
- ๐ฐ Dual cost tracking (estimates + AWS Cost Explorer)
- ๐ Billing queries with engine breakdown
- ๐ AWS environment variable authentication
- ๐ค Stdin support for piping
Installation
Prerequisites
- Python 3.12+ (Python 3.13+ has pydub compatibility issues - see Known Issues)
- uv package manager (recommended)
- AWS credentials configured
- ffmpeg (for audio playback - not required for file output)
Note: For a detailed explanation of how the TTS pipeline works and why these dependencies are needed, see TTS Pipeline Architecture
Install from Source
# Clone repository
git clone https://github.com/dnvriend/aws-polly-tts-tool.git
cd aws-polly-tts-tool
# Install with uv (Python 3.12)
uv tool install . --python 3.12
# Verify installation
aws-polly-tts-tool --version
Install with mise (Development)
cd aws-polly-tts-tool
mise use python@3.12
uv sync
uv tool install .
Configuration
AWS Credentials
Configure AWS credentials using any of these methods:
# Method 1: AWS CLI configuration
aws configure
# Method 2: Environment variables
export AWS_ACCESS_KEY_ID="your-access-key"
export AWS_SECRET_ACCESS_KEY="your-secret-key"
export AWS_DEFAULT_REGION="us-east-1"
# Verify credentials
aws-polly-tts-tool info
IAM Permissions Required
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"polly:DescribeVoices",
"polly:SynthesizeSpeech"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": ["ce:GetCostAndUsage"],
"Resource": "*"
}
]
}
Usage
Basic Synthesis
# Play text with default voice (Joanna, neural engine)
aws-polly-tts-tool synthesize "Hello world"
# Save to file instead of playing
aws-polly-tts-tool synthesize "Hello world" --output speech.mp3
# Read from stdin
echo "Hello world" | aws-polly-tts-tool synthesize --stdin
# Read from file
cat article.txt | aws-polly-tts-tool synthesize --stdin --output article.mp3
Voice Selection
# List all available voices
aws-polly-tts-tool list-voices
# Filter by language
aws-polly-tts-tool list-voices --language en-US
# Filter by engine and gender
aws-polly-tts-tool list-voices --engine neural --gender Female
# Use specific voice
aws-polly-tts-tool synthesize "Hello" --voice Matthew
aws-polly-tts-tool synthesize "Bonjour" --voice Celine # French
Engine Selection
# List all engines with pricing
aws-polly-tts-tool list-engines
# Use standard engine (cheapest)
aws-polly-tts-tool synthesize "Hello" --engine standard
# Use neural engine (recommended)
aws-polly-tts-tool synthesize "Hello" --engine neural
# Use generative engine (highest quality)
aws-polly-tts-tool synthesize "Hello" --engine generative
# Use long-form for audiobooks
aws-polly-tts-tool synthesize "$(cat book.txt)" --engine long-form --output book.mp3
SSML Support
# Basic SSML with pauses
aws-polly-tts-tool synthesize '<speak>Hello <break time="500ms"/> world</speak>' --ssml
# Prosody control (speed, pitch, volume)
aws-polly-tts-tool synthesize '<speak><prosody rate="slow" pitch="low">Deep voice</prosody></speak>' --ssml
# Emphasis
aws-polly-tts-tool synthesize '<speak>I <emphasis level="strong">really</emphasis> like this</speak>' --ssml
# Newscaster style (select voices only)
aws-polly-tts-tool synthesize '<speak><amazon:domain name="news">Breaking news today</amazon:domain></speak>' --ssml --voice Matthew
Cost Tracking
# Show cost estimate after synthesis
aws-polly-tts-tool synthesize "Hello world" --show-cost
# View pricing for all engines
aws-polly-tts-tool pricing
# Query AWS billing (last 30 days)
aws-polly-tts-tool billing
# Custom date range
aws-polly-tts-tool billing --start-date 2025-01-01 --end-date 2025-01-31
# Last 7 days
aws-polly-tts-tool billing --days 7
Verbosity and Debugging
Multi-level verbosity for progressive debugging detail:
# Default: No verbose output (errors/warnings only)
aws-polly-tts-tool synthesize "Hello world" --output test.mp3
# -V: INFO level (high-level operations)
aws-polly-tts-tool synthesize "Hello world" -V --output test.mp3
[INFO] Using voice: Joanna (neural engine)
[INFO] Synthesizing audio to file: test.mp3
# -VV: DEBUG level (detailed operations, validation, character counts)
aws-polly-tts-tool synthesize "Hello world" -VV --output test.mp3
[DEBUG] Validating engine: neural
[DEBUG] Validating output format: mp3
[DEBUG] Initializing AWS Polly client
[DEBUG] Resolving voice ID for: Joanna
[INFO] Using voice: Joanna (neural engine)
[INFO] Synthesizing audio to file: test.mp3
[DEBUG] Synthesized 11 characters
# -VVV: TRACE level (full AWS SDK details, API requests/responses)
aws-polly-tts-tool synthesize "Hello world" -VVV --output test.mp3
[DEBUG] Validating engine: neural
[DEBUG] Validating output format: mp3
[DEBUG] Initializing AWS Polly client
[DEBUG] Looking for credentials via: env
[DEBUG] Looking for credentials via: shared-credentials-file
[INFO] Found credentials in shared credentials file: ~/.aws/credentials
[DEBUG] Event creating-client-class.polly: calling handler
[DEBUG] Starting new HTTPS connection (1): polly.eu-central-1.amazonaws.com:443
[DEBUG] https://polly.eu-central-1.amazonaws.com:443 "POST /v1/speech HTTP/1.1" 200 None
[INFO] Using voice: Joanna (neural engine)
[INFO] Synthesizing audio to file: test.mp3
[DEBUG] Synthesized 11 characters
# Works with all commands
aws-polly-tts-tool list-voices -V --engine neural
aws-polly-tts-tool billing -VV --days 7
Verbosity Levels:
- Default: Errors and warnings only - clean output
-V(INFO): High-level operations (voice selection, file operations)-VV(DEBUG): Detailed steps (validation, API calls, character counts)-VVV(TRACE): Full AWS SDK internals (credentials, HTTP requests, boto3 events)
Note: All log output goes to stderr, keeping stdout clean for data/piping.
Shell Completion
Enable tab completion for bash, zsh, or fish shells to autocomplete commands, options, and arguments:
# View installation instructions
aws-polly-tts-tool completion --help
# Bash - add to ~/.bashrc for persistent completion
eval "$(aws-polly-tts-tool completion bash)"
# Zsh - add to ~/.zshrc for persistent completion
eval "$(aws-polly-tts-tool completion zsh)"
# Fish - one-time installation
aws-polly-tts-tool completion fish > ~/.config/fish/completions/aws-polly-tts-tool.fish
# File-based installation (recommended for better performance)
aws-polly-tts-tool completion bash > ~/.aws-polly-tts-tool-complete.bash
echo 'source ~/.aws-polly-tts-tool-complete.bash' >> ~/.bashrc
After installation, restart your shell or source the config file:
source ~/.bashrc # for bash
source ~/.zshrc # for zsh
Shell completion enables:
- Command completion: Type
aws-polly-tts-tool <TAB>to see all commands - Option completion: Type
--<TAB>to see available options - Value completion: Auto-complete for choices like engines (standard, neural, generative)
Library Usage
Import and use as a Python library:
from aws_polly_tts_tool import (
get_polly_client,
synthesize_audio,
save_speech,
VoiceManager,
calculate_cost,
)
# Initialize client
client = get_polly_client(region="us-east-1")
# Synthesize audio
audio_bytes, char_count = synthesize_audio(
client=client,
text="Hello world",
voice_id="Joanna",
output_format="mp3",
engine="neural"
)
# Save to file
save_speech(
client=client,
text="Hello world",
voice_id="Joanna",
output_path=Path("output.mp3"),
engine="neural"
)
# List voices
voice_manager = VoiceManager(client)
voices = voice_manager.list_voices(engine="neural", language="en")
# Calculate cost
cost = calculate_cost(character_count=5000, engine="neural")
print(f"Estimated cost: ${cost:.4f}")
Commands
synthesize
Convert text to speech with full control over voice, engine, and output.
aws-polly-tts-tool synthesize [TEXT] [OPTIONS]
-s, --stdin Read from stdin
--voice TEXT Voice ID (default: Joanna)
-o, --output PATH Save to file
-f, --format TEXT mp3, ogg_vorbis, pcm
-e, --engine TEXT standard, neural, generative, long-form
--ssml Treat input as SSML
--show-cost Display cost estimate
-r, --region TEXT AWS region override
-V, --verbose Verbosity (-V, -VV, -VVV for progressive detail)
list-voices
List and filter available Polly voices.
aws-polly-tts-tool list-voices [OPTIONS]
-e, --engine TEXT Filter by engine
-l, --language TEXT Filter by language
-g, --gender TEXT Filter by gender
-r, --region TEXT AWS region override
-V, --verbose Verbosity (-V, -VV, -VVV for progressive detail)
list-engines
Display all voice engines with features and pricing.
aws-polly-tts-tool list-engines
billing
Query AWS Cost Explorer for actual Polly usage costs.
aws-polly-tts-tool billing [OPTIONS]
-d, --days INT Number of days (default: 30)
--start-date TEXT Custom start date (YYYY-MM-DD)
--end-date TEXT Custom end date (YYYY-MM-DD)
-r, --region TEXT AWS region override
-V, --verbose Verbosity (-V, -VV, -VVV for progressive detail)
pricing
Show Polly pricing information and examples.
aws-polly-tts-tool pricing
info
Display AWS credentials and tool configuration.
aws-polly-tts-tool info
completion
Generate shell completion scripts for bash, zsh, or fish.
aws-polly-tts-tool completion [bash|zsh|fish]
# Install for bash
eval "$(aws-polly-tts-tool completion bash)"
# Install for zsh
eval "$(aws-polly-tts-tool completion zsh)"
# Install for fish
aws-polly-tts-tool completion fish > ~/.config/fish/completions/aws-polly-tts-tool.fish
See Shell Completion section for detailed installation instructions.
Known Issues
pydub Python 3.13+ Compatibility
Issue: The pydub library depends on Python's audioop module, which was removed in Python 3.13.
Impact: Audio playback through speakers fails on Python 3.13+. File output (--output) works fine.
Workarounds:
-
Use Python 3.12 (recommended)
mise use python@3.12 uv tool install . --python 3.12
-
Save to file instead of playback
# This works on any Python version aws-polly-tts-tool synthesize "Hello" --output speech.mp3
-
Future fix: We plan to replace pydub with a Python 3.13+ compatible library (pygame or sounddevice)
Development
Setup
# Clone and setup
git clone https://github.com/dnvriend/aws-polly-tts-tool.git
cd aws-polly-tts-tool
# Install with Python 3.12
mise use python@3.12
uv sync
# Run quality checks
make check
Available Commands
make install # Install dependencies
make format # Format with ruff
make lint # Lint with ruff
make typecheck # Type check with mypy
make test # Run tests with pytest
make security-bandit # Run bandit security linter
make security-pip-audit # Run pip-audit for vulnerabilities
make security-gitleaks # Run gitleaks secret scanner
make security # Run all security checks
make check # Run all checks (lint, typecheck, test, security)
make pipeline # Full pipeline (format, lint, typecheck, test, security, build, install)
make build # Build package
make clean # Remove artifacts
Security Checks
The project includes three security tools integrated into the development pipeline:
- bandit - Python security linter that scans for common security issues
- pip-audit - Dependency vulnerability scanner checking for known CVEs
- gitleaks - Secret detection tool that scans git history for leaked credentials
Note: gitleaks requires separate installation via brew install gitleaks (macOS) or from GitHub releases
Architecture
aws-polly-tts-tool/
โโโ aws_polly_tts_tool/
โ โโโ __init__.py # Public API exports
โ โโโ cli.py # CLI entry point
โ โโโ voices.py # VoiceManager (dynamic API)
โ โโโ engines.py # Engine metadata & validation
โ โโโ billing.py # Cost calculations
โ โโโ utils.py # Shared utilities
โ โโโ core/ # Core library (CLI-independent)
โ โ โโโ client.py # AWS client initialization
โ โ โโโ synthesize.py # TTS functions
โ โ โโโ cost_explorer.py # Billing queries
โ โโโ commands/ # CLI command implementations
โ โโโ synthesize_commands.py
โ โโโ voice_commands.py
โ โโโ engine_commands.py
โ โโโ billing_commands.py
โ โโโ info_commands.py
โโโ tests/
โโโ pyproject.toml
โโโ Makefile
Resources
License
MIT License - see LICENSE file for details.
Author
Dennis Vriend
- GitHub: @dnvriend
Built with Claude Code
This project was created using Claude Code, featuring AI-assisted development with human review and testing.
Made with โค๏ธ and AI โข Python 3.12+
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file aws_polly_tts_tool-0.2.0.tar.gz.
File metadata
- Download URL: aws_polly_tts_tool-0.2.0.tar.gz
- Upload date:
- Size: 283.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8103ac1a6066b4efd39291493aa7158bbbfabadfb6607ab6680510835d2a3c22
|
|
| MD5 |
ccb88e194c5e6f7ba5976b5d9501c413
|
|
| BLAKE2b-256 |
a9bf13816b1b3430f9e63c1cc362928608daae778984191141716a046787fd57
|
Provenance
The following attestation bundles were made for aws_polly_tts_tool-0.2.0.tar.gz:
Publisher:
publish.yml on dnvriend/aws-polly-tts-tool
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
aws_polly_tts_tool-0.2.0.tar.gz -
Subject digest:
8103ac1a6066b4efd39291493aa7158bbbfabadfb6607ab6680510835d2a3c22 - Sigstore transparency entry: 742675192
- Sigstore integration time:
-
Permalink:
dnvriend/aws-polly-tts-tool@9626e4bcc30cfeded89e7380da02344f113d40d7 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/dnvriend
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@9626e4bcc30cfeded89e7380da02344f113d40d7 -
Trigger Event:
push
-
Statement type:
File details
Details for the file aws_polly_tts_tool-0.2.0-py3-none-any.whl.
File metadata
- Download URL: aws_polly_tts_tool-0.2.0-py3-none-any.whl
- Upload date:
- Size: 39.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2a3ccaf0f12c6c5e60f78fbdc31cd8eb3470b5b6a920cd362fdf2e87573bfa79
|
|
| MD5 |
adcd8045aae227d63b1557a0093d876f
|
|
| BLAKE2b-256 |
e7eeeef07aa4ac0b34389700ae991b1e0daff5f93f7b897a0a3dcea2f171011b
|
Provenance
The following attestation bundles were made for aws_polly_tts_tool-0.2.0-py3-none-any.whl:
Publisher:
publish.yml on dnvriend/aws-polly-tts-tool
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
aws_polly_tts_tool-0.2.0-py3-none-any.whl -
Subject digest:
2a3ccaf0f12c6c5e60f78fbdc31cd8eb3470b5b6a920cd362fdf2e87573bfa79 - Sigstore transparency entry: 742675199
- Sigstore integration time:
-
Permalink:
dnvriend/aws-polly-tts-tool@9626e4bcc30cfeded89e7380da02344f113d40d7 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/dnvriend
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@9626e4bcc30cfeded89e7380da02344f113d40d7 -
Trigger Event:
push
-
Statement type: