Skip to main content

A CLI that provides TTS using Amazon Polly

Project description

AWS Polly TTS Tool Logo

aws-polly-tts-tool

Python Version License: MIT Code style: ruff Type checked: mypy Built with Claude Code

Professional AWS Polly TTS CLI and library for text-to-speech synthesis with agent-friendly design.

Table of Contents

About

aws-polly-tts-tool is a comprehensive CLI tool and Python library for Amazon Polly text-to-speech synthesis. Built with a CLI-first philosophy, it provides both command-line convenience and programmatic access to AWS Polly's full feature set.

What is Amazon Polly?

Amazon Polly is AWS's fully-managed text-to-speech service that converts text into lifelike speech using deep learning. It offers 60+ voices in 30+ languages with multiple quality tiers.

Why This Tool?

  • Agent-Friendly: Designed for Claude Code and AI agents with self-documenting help and structured errors
  • Composable: JSON output to stdout, logs to stderr - perfect for Unix piping
  • Dual-Mode: Use as CLI or import as Python library
  • Production-Ready: Type-safe, tested, linted with comprehensive error handling
  • Cost-Transparent: Real-time cost estimates and AWS billing integration

Why CLI-First?

This tool prioritizes CLI design to enable:

  • ๐Ÿค– AI Agent Integration: Claude Code and other AI tools can use structured commands and parse outputs
  • ๐Ÿ”„ ReAct Loops: Clear error messages help agents self-correct and retry operations
  • ๐Ÿ”— Composability: Standard Unix patterns (stdin/stdout/stderr) enable piping and automation
  • ๐Ÿงฑ Building Blocks: Commands serve as reusable components for skills, MCP servers, and scripts
  • ๐Ÿ“Š Predictability: Type-safe implementation ensures consistent behavior in automated workflows

Features

Voice Engines

  • โœ… Standard - Cost-effective traditional TTS ($4/1M chars)
  • โœ… Neural - Natural, human-like voices ($16/1M chars)
  • โœ… Generative - Most advanced, emotionally engaged ($30/1M chars)
  • โœ… Long-form - Optimized for audiobooks ($100/1M chars)

Voice Selection

  • ๐Ÿ“ข 60+ voices across 30+ languages
  • ๐Ÿ” Dynamic fetching from Polly API (always up-to-date)
  • ๐ŸŽš๏ธ Filter by engine, language, gender
  • ๐ŸŒ Multiple accents and speaking styles

Output Options

  • ๐ŸŽต mp3 - General purpose (default)
  • ๐ŸŽถ ogg_vorbis - Open format for web
  • ๐ŸŽ™๏ธ pcm - Raw audio, lowest latency

Advanced Features

  • ๐Ÿ“ Full SSML support (prosody, breaks, emphasis, phonemes)
  • ๐Ÿ’ฐ Dual cost tracking (estimates + AWS Cost Explorer)
  • ๐Ÿ“Š Billing queries with engine breakdown
  • ๐Ÿ” AWS environment variable authentication
  • ๐Ÿ“ค Stdin support for piping

Installation

Prerequisites

  • Python 3.12+ (Python 3.13+ has pydub compatibility issues - see Known Issues)
  • uv package manager (recommended)
  • AWS credentials configured
  • ffmpeg (for audio playback - not required for file output)

Note: For a detailed explanation of how the TTS pipeline works and why these dependencies are needed, see TTS Pipeline Architecture

Install from Source

# Clone repository
git clone https://github.com/dnvriend/aws-polly-tts-tool.git
cd aws-polly-tts-tool

# Install with uv (Python 3.12)
uv tool install . --python 3.12

# Verify installation
aws-polly-tts-tool --version

Install with mise (Development)

cd aws-polly-tts-tool
mise use python@3.12
uv sync
uv tool install .

Configuration

AWS Credentials

Configure AWS credentials using any of these methods:

# Method 1: AWS CLI configuration
aws configure

# Method 2: Environment variables
export AWS_ACCESS_KEY_ID="your-access-key"
export AWS_SECRET_ACCESS_KEY="your-secret-key"
export AWS_DEFAULT_REGION="us-east-1"

# Verify credentials
aws-polly-tts-tool info

IAM Permissions Required

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "polly:DescribeVoices",
        "polly:SynthesizeSpeech"
      ],
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": ["ce:GetCostAndUsage"],
      "Resource": "*"
    }
  ]
}

Usage

Basic Synthesis

# Play text with default voice (Joanna, neural engine)
aws-polly-tts-tool synthesize "Hello world"

# Save to file instead of playing
aws-polly-tts-tool synthesize "Hello world" --output speech.mp3

# Read from stdin
echo "Hello world" | aws-polly-tts-tool synthesize --stdin

# Read from file
cat article.txt | aws-polly-tts-tool synthesize --stdin --output article.mp3

Voice Selection

# List all available voices
aws-polly-tts-tool list-voices

# Filter by language
aws-polly-tts-tool list-voices --language en-US

# Filter by engine and gender
aws-polly-tts-tool list-voices --engine neural --gender Female

# Use specific voice
aws-polly-tts-tool synthesize "Hello" --voice Matthew
aws-polly-tts-tool synthesize "Bonjour" --voice Celine  # French

Engine Selection

# List all engines with pricing
aws-polly-tts-tool list-engines

# Use standard engine (cheapest)
aws-polly-tts-tool synthesize "Hello" --engine standard

# Use neural engine (recommended)
aws-polly-tts-tool synthesize "Hello" --engine neural

# Use generative engine (highest quality)
aws-polly-tts-tool synthesize "Hello" --engine generative

# Use long-form for audiobooks
aws-polly-tts-tool synthesize "$(cat book.txt)" --engine long-form --output book.mp3

SSML Support

# Basic SSML with pauses
aws-polly-tts-tool synthesize '<speak>Hello <break time="500ms"/> world</speak>' --ssml

# Prosody control (speed, pitch, volume)
aws-polly-tts-tool synthesize '<speak><prosody rate="slow" pitch="low">Deep voice</prosody></speak>' --ssml

# Emphasis
aws-polly-tts-tool synthesize '<speak>I <emphasis level="strong">really</emphasis> like this</speak>' --ssml

# Newscaster style (select voices only)
aws-polly-tts-tool synthesize '<speak><amazon:domain name="news">Breaking news today</amazon:domain></speak>' --ssml --voice Matthew

Cost Tracking

# Show cost estimate after synthesis
aws-polly-tts-tool synthesize "Hello world" --show-cost

# View pricing for all engines
aws-polly-tts-tool pricing

# Query AWS billing (last 30 days)
aws-polly-tts-tool billing

# Custom date range
aws-polly-tts-tool billing --start-date 2025-01-01 --end-date 2025-01-31

# Last 7 days
aws-polly-tts-tool billing --days 7

Verbosity and Debugging

Multi-level verbosity for progressive debugging detail:

# Default: No verbose output (errors/warnings only)
aws-polly-tts-tool synthesize "Hello world" --output test.mp3

# -V: INFO level (high-level operations)
aws-polly-tts-tool synthesize "Hello world" -V --output test.mp3
[INFO] Using voice: Joanna (neural engine)
[INFO] Synthesizing audio to file: test.mp3

# -VV: DEBUG level (detailed operations, validation, character counts)
aws-polly-tts-tool synthesize "Hello world" -VV --output test.mp3
[DEBUG] Validating engine: neural
[DEBUG] Validating output format: mp3
[DEBUG] Initializing AWS Polly client
[DEBUG] Resolving voice ID for: Joanna
[INFO] Using voice: Joanna (neural engine)
[INFO] Synthesizing audio to file: test.mp3
[DEBUG] Synthesized 11 characters

# -VVV: TRACE level (full AWS SDK details, API requests/responses)
aws-polly-tts-tool synthesize "Hello world" -VVV --output test.mp3
[DEBUG] Validating engine: neural
[DEBUG] Validating output format: mp3
[DEBUG] Initializing AWS Polly client
[DEBUG] Looking for credentials via: env
[DEBUG] Looking for credentials via: shared-credentials-file
[INFO] Found credentials in shared credentials file: ~/.aws/credentials
[DEBUG] Event creating-client-class.polly: calling handler
[DEBUG] Starting new HTTPS connection (1): polly.eu-central-1.amazonaws.com:443
[DEBUG] https://polly.eu-central-1.amazonaws.com:443 "POST /v1/speech HTTP/1.1" 200 None
[INFO] Using voice: Joanna (neural engine)
[INFO] Synthesizing audio to file: test.mp3
[DEBUG] Synthesized 11 characters

# Works with all commands
aws-polly-tts-tool list-voices -V --engine neural
aws-polly-tts-tool billing -VV --days 7

Verbosity Levels:

  • Default: Errors and warnings only - clean output
  • -V (INFO): High-level operations (voice selection, file operations)
  • -VV (DEBUG): Detailed steps (validation, API calls, character counts)
  • -VVV (TRACE): Full AWS SDK internals (credentials, HTTP requests, boto3 events)

Note: All log output goes to stderr, keeping stdout clean for data/piping.

Shell Completion

Enable tab completion for bash, zsh, or fish shells to autocomplete commands, options, and arguments:

# View installation instructions
aws-polly-tts-tool completion --help

# Bash - add to ~/.bashrc for persistent completion
eval "$(aws-polly-tts-tool completion bash)"

# Zsh - add to ~/.zshrc for persistent completion
eval "$(aws-polly-tts-tool completion zsh)"

# Fish - one-time installation
aws-polly-tts-tool completion fish > ~/.config/fish/completions/aws-polly-tts-tool.fish

# File-based installation (recommended for better performance)
aws-polly-tts-tool completion bash > ~/.aws-polly-tts-tool-complete.bash
echo 'source ~/.aws-polly-tts-tool-complete.bash' >> ~/.bashrc

After installation, restart your shell or source the config file:

source ~/.bashrc  # for bash
source ~/.zshrc   # for zsh

Shell completion enables:

  • Command completion: Type aws-polly-tts-tool <TAB> to see all commands
  • Option completion: Type --<TAB> to see available options
  • Value completion: Auto-complete for choices like engines (standard, neural, generative)

Library Usage

Import and use as a Python library:

from aws_polly_tts_tool import (
    get_polly_client,
    synthesize_audio,
    save_speech,
    VoiceManager,
    calculate_cost,
)

# Initialize client
client = get_polly_client(region="us-east-1")

# Synthesize audio
audio_bytes, char_count = synthesize_audio(
    client=client,
    text="Hello world",
    voice_id="Joanna",
    output_format="mp3",
    engine="neural"
)

# Save to file
save_speech(
    client=client,
    text="Hello world",
    voice_id="Joanna",
    output_path=Path("output.mp3"),
    engine="neural"
)

# List voices
voice_manager = VoiceManager(client)
voices = voice_manager.list_voices(engine="neural", language="en")

# Calculate cost
cost = calculate_cost(character_count=5000, engine="neural")
print(f"Estimated cost: ${cost:.4f}")

Commands

synthesize

Convert text to speech with full control over voice, engine, and output.

aws-polly-tts-tool synthesize [TEXT] [OPTIONS]
  -s, --stdin         Read from stdin
  --voice TEXT        Voice ID (default: Joanna)
  -o, --output PATH   Save to file
  -f, --format TEXT   mp3, ogg_vorbis, pcm
  -e, --engine TEXT   standard, neural, generative, long-form
  --ssml              Treat input as SSML
  --show-cost         Display cost estimate
  -r, --region TEXT   AWS region override
  -V, --verbose       Verbosity (-V, -VV, -VVV for progressive detail)

list-voices

List and filter available Polly voices.

aws-polly-tts-tool list-voices [OPTIONS]
  -e, --engine TEXT    Filter by engine
  -l, --language TEXT  Filter by language
  -g, --gender TEXT    Filter by gender
  -r, --region TEXT    AWS region override
  -V, --verbose        Verbosity (-V, -VV, -VVV for progressive detail)

list-engines

Display all voice engines with features and pricing.

aws-polly-tts-tool list-engines

billing

Query AWS Cost Explorer for actual Polly usage costs.

aws-polly-tts-tool billing [OPTIONS]
  -d, --days INT       Number of days (default: 30)
  --start-date TEXT    Custom start date (YYYY-MM-DD)
  --end-date TEXT      Custom end date (YYYY-MM-DD)
  -r, --region TEXT    AWS region override
  -V, --verbose        Verbosity (-V, -VV, -VVV for progressive detail)

pricing

Show Polly pricing information and examples.

aws-polly-tts-tool pricing

info

Display AWS credentials and tool configuration.

aws-polly-tts-tool info

completion

Generate shell completion scripts for bash, zsh, or fish.

aws-polly-tts-tool completion [bash|zsh|fish]

# Install for bash
eval "$(aws-polly-tts-tool completion bash)"

# Install for zsh
eval "$(aws-polly-tts-tool completion zsh)"

# Install for fish
aws-polly-tts-tool completion fish > ~/.config/fish/completions/aws-polly-tts-tool.fish

See Shell Completion section for detailed installation instructions.

Known Issues

pydub Python 3.13+ Compatibility

Issue: The pydub library depends on Python's audioop module, which was removed in Python 3.13.

Impact: Audio playback through speakers fails on Python 3.13+. File output (--output) works fine.

Workarounds:

  1. Use Python 3.12 (recommended)

    mise use python@3.12
    uv tool install . --python 3.12
    
  2. Save to file instead of playback

    # This works on any Python version
    aws-polly-tts-tool synthesize "Hello" --output speech.mp3
    
  3. Future fix: We plan to replace pydub with a Python 3.13+ compatible library (pygame or sounddevice)

Development

Setup

# Clone and setup
git clone https://github.com/dnvriend/aws-polly-tts-tool.git
cd aws-polly-tts-tool

# Install with Python 3.12
mise use python@3.12
uv sync

# Run quality checks
make check

Available Commands

make install              # Install dependencies
make format               # Format with ruff
make lint                 # Lint with ruff
make typecheck            # Type check with mypy
make test                 # Run tests with pytest
make security-bandit      # Run bandit security linter
make security-pip-audit   # Run pip-audit for vulnerabilities
make security-gitleaks    # Run gitleaks secret scanner
make security             # Run all security checks
make check                # Run all checks (lint, typecheck, test, security)
make pipeline             # Full pipeline (format, lint, typecheck, test, security, build, install)
make build                # Build package
make clean                # Remove artifacts

Security Checks

The project includes three security tools integrated into the development pipeline:

  • bandit - Python security linter that scans for common security issues
  • pip-audit - Dependency vulnerability scanner checking for known CVEs
  • gitleaks - Secret detection tool that scans git history for leaked credentials

Note: gitleaks requires separate installation via brew install gitleaks (macOS) or from GitHub releases

Architecture

aws-polly-tts-tool/
โ”œโ”€โ”€ aws_polly_tts_tool/
โ”‚   โ”œโ”€โ”€ __init__.py           # Public API exports
โ”‚   โ”œโ”€โ”€ cli.py                # CLI entry point
โ”‚   โ”œโ”€โ”€ voices.py             # VoiceManager (dynamic API)
โ”‚   โ”œโ”€โ”€ engines.py            # Engine metadata & validation
โ”‚   โ”œโ”€โ”€ billing.py            # Cost calculations
โ”‚   โ”œโ”€โ”€ utils.py              # Shared utilities
โ”‚   โ”œโ”€โ”€ core/                 # Core library (CLI-independent)
โ”‚   โ”‚   โ”œโ”€โ”€ client.py         # AWS client initialization
โ”‚   โ”‚   โ”œโ”€โ”€ synthesize.py     # TTS functions
โ”‚   โ”‚   โ””โ”€โ”€ cost_explorer.py  # Billing queries
โ”‚   โ””โ”€โ”€ commands/             # CLI command implementations
โ”‚       โ”œโ”€โ”€ synthesize_commands.py
โ”‚       โ”œโ”€โ”€ voice_commands.py
โ”‚       โ”œโ”€โ”€ engine_commands.py
โ”‚       โ”œโ”€โ”€ billing_commands.py
โ”‚       โ””โ”€โ”€ info_commands.py
โ”œโ”€โ”€ tests/
โ”œโ”€โ”€ pyproject.toml
โ””โ”€โ”€ Makefile

Resources

License

MIT License - see LICENSE file for details.

Author

Dennis Vriend


Built with Claude Code

This project was created using Claude Code, featuring AI-assisted development with human review and testing.

Made with โค๏ธ and AI โ€ข Python 3.12+

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aws_polly_tts_tool-0.2.0.tar.gz (283.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

aws_polly_tts_tool-0.2.0-py3-none-any.whl (39.9 kB view details)

Uploaded Python 3

File details

Details for the file aws_polly_tts_tool-0.2.0.tar.gz.

File metadata

  • Download URL: aws_polly_tts_tool-0.2.0.tar.gz
  • Upload date:
  • Size: 283.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for aws_polly_tts_tool-0.2.0.tar.gz
Algorithm Hash digest
SHA256 8103ac1a6066b4efd39291493aa7158bbbfabadfb6607ab6680510835d2a3c22
MD5 ccb88e194c5e6f7ba5976b5d9501c413
BLAKE2b-256 a9bf13816b1b3430f9e63c1cc362928608daae778984191141716a046787fd57

See more details on using hashes here.

Provenance

The following attestation bundles were made for aws_polly_tts_tool-0.2.0.tar.gz:

Publisher: publish.yml on dnvriend/aws-polly-tts-tool

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file aws_polly_tts_tool-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for aws_polly_tts_tool-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2a3ccaf0f12c6c5e60f78fbdc31cd8eb3470b5b6a920cd362fdf2e87573bfa79
MD5 adcd8045aae227d63b1557a0093d876f
BLAKE2b-256 e7eeeef07aa4ac0b34389700ae991b1e0daff5f93f7b897a0a3dcea2f171011b

See more details on using hashes here.

Provenance

The following attestation bundles were made for aws_polly_tts_tool-0.2.0-py3-none-any.whl:

Publisher: publish.yml on dnvriend/aws-polly-tts-tool

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page