Skip to main content

Smart TTS library with ElevenLabs and OpenRouter text enhancement

Project description

elevenlabs-smart-tts

High-level Python library for expressive text-to-speech with ElevenLabs and LLM-powered text enhancement via OpenRouter.

Pass raw text plus task context (language, style, emotion, use case) — the library picks a voice, enriches the text with Eleven v3 audio tags, and returns synthesized audio.

Features

  • SmartTTS facade — one pipeline from text to audio
  • Voice caching — local diskcache catalog with offline list_voices() / get_voice()
  • Automatic voice selection — by voice_id, description, use case, style, and language
  • LLM text enhancement — audio tags, punctuation, and normalization via OpenRouter
  • Eleven v3 first — expressive tags like [whispers], [excited], [short pause]
  • Typed errors & retries — resilient HTTP clients for ElevenLabs and OpenRouter

Installation

pip install elevenlabs-smart-tts

Or from source:

git clone https://github.com/vpuhoff/elevenlabs-smart-tts.git
cd elevenlabs-smart-tts
uv sync --dev

Quick start

  1. Copy .env.example to .env and fill in your API keys:
cp .env.example .env
  1. Run synthesis:
from pathlib import Path

from elevenlabs_smart_tts import SmartTTS, SynthesisTask

tts = SmartTTS.from_env()
tts.sync_voices()

result = tts.synthesize_to_file(
    SynthesisTask(
        text="Welcome to our customer support service.",
        language="en",
        style="professional",
        emotion="warm",
        use_case="conversational",
        voice_description="warm professional conversational",
    ),
    Path("output.mp3"),
)

print(result.enhanced_text)

See example.py for a full runnable example.

Task parameters

SynthesisTask accepts free-text hints that guide voice selection and LLM text enhancement. After sync_voices(), inspect your cached catalog:

for voice in tts.list_voices():
    print(voice.name, voice.labels.get("use_case"), voice.description)

The examples below come from the ElevenLabs premade voice catalog.

use_case

Used for voice matching against the ElevenLabs voice label labels.use_case (exact match scores highest).

Value Typical voices
conversational Casual, agentic, podcast-style voices (e.g. Roger, Eric, Juniper)
informative_educational Clear educators, broadcasters (e.g. Alice, Matilda, Daniel)
narrative_story Storytellers, audiobook voices (e.g. George, Daria Reels)
advertisement Promo and ad reads (e.g. Bill)
social_media Short-form, trendy content
characters_animation Character and animation voices
entertainment_tv TV and entertainment narration

customer_support is not an ElevenLabs label — it still helps the LLM, but for voice selection prefer conversational or pass voice_description="professional support warm".

# Support-style call center message
SynthesisTask(
    text="Thanks for calling. How can I help you today?",
    language="en",
    use_case="conversational",
    style="professional",
    emotion="warm",
    voice_description="trustworthy professional",
)

# Audiobook / long-form narration
SynthesisTask(
    text="Chapter one. It was a dark and stormy night.",
    use_case="narrative_story",
    style="warm",
    emotion="calm",
)

# E-learning explainer
SynthesisTask(
    text="Today we'll learn how photosynthesis works.",
    use_case="informative_educational",
    style="professional",
    emotion="neutral",
)

style

Free-form delivery hint. Affects the LLM enhancement prompt and weak voice matching against voice name, description, and custom tags.

Common values that match premade voice descriptions:

Value Effect
professional Formal, clear delivery
casual / conversational Relaxed, everyday tone
warm Friendly, inviting tone
neutral Balanced, informative
dramatic Strong emphasis, expressive pacing
playful Light, energetic tone
sympathetic Soft, empathetic delivery
SynthesisTask(text="...", style="professional")   # business / IVR
SynthesisTask(text="...", style="casual")         # laid-back chat
SynthesisTask(text="...", style="dramatic")       # emotional scene

emotion

Free-form mood hint for LLM text enhancement only (drives audio tags like [excited], [whispers], [sighs]). Does not filter voices.

Value Typical audio tag behavior
warm Friendly, reassuring tone
calm Steady, subdued delivery
excited Higher energy, [excited] tags
sympathetic Soft, caring tone
curious Questioning, engaged tone
appalled / sarcastic Strong expressive tags
neutral Minimal emotional markup
SynthesisTask(text="...", emotion="warm")         # customer greeting
SynthesisTask(text="...", emotion="excited")      # product launch
SynthesisTask(text="...", emotion="sympathetic")  # apology or support
SynthesisTask(text="...", emotion="neutral")     # plain narration

Combining parameters

Scenario Example values
Customer support (EN) use_case="conversational", style="professional", emotion="warm"
News / podcast intro use_case="informative_educational", style="neutral", emotion="calm"
Audiobook chapter use_case="narrative_story", style="warm", emotion="calm"
Social reel use_case="social_media", style="playful", emotion="excited"
Ad read use_case="advertisement", style="confident", emotion="excited"

Configuration

Required environment variables

Variable Description
ELEVENLABS_API_KEY ElevenLabs API key
OPENROUTER_API_KEY OpenRouter API key
OPENROUTER_API_TTS_PROMPT_MODEL LLM for text enhancement (e.g. anthropic/claude-3.5-sonnet)

Optional environment variables

Variable Default Description
ELEVENLABS_CACHE_DIR ~/.cache/elevenlabs-smart-tts Local cache directory
ELEVENLABS_DEFAULT_MODEL eleven_v3 Default TTS model
ELEVENLABS_DEFAULT_OUTPUT_FORMAT mp3_44100_128 Audio output format
ELEVENLABS_DEFAULT_VOICE_ID Fallback voice when auto-selection fails
OPENROUTER_BASE_URL https://openrouter.ai/api/v1 OpenRouter API base URL

Programmatic configuration is also supported:

from elevenlabs_smart_tts import SmartTTS, SmartTTSConfig, TTSModel

config = SmartTTSConfig(
    elevenlabs_api_key="...",
    openrouter_api_key="...",
    openrouter_tts_prompt_model="anthropic/claude-3.5-sonnet",
    default_model=TTSModel.ELEVEN_V3,
)
tts = SmartTTS(config)

Usage

Synthesis pipeline

from elevenlabs_smart_tts import SmartTTS, SynthesisTask, TTSModel

tts = SmartTTS.from_env()
tts.sync_voices()

result = tts.synthesize(
    SynthesisTask(
        text="Are you serious? I can't believe you did that!",
        voice_id="your-voice-id",
        model=TTSModel.ELEVEN_V3,
        style="dramatic",
        emotion="appalled",
    )
)

audio_bytes = result.audio
enhanced_text = result.enhanced_text

Preview enhanced text without TTS

enhanced = tts.enhance_text_only(
    SynthesisTask(
        text="Thanks for calling. How can I help?",
        language="en",
        style="sympathetic",
    )
)

One-liner

from elevenlabs_smart_tts import synthesize

result = synthesize(
    "Hello world",
    language="en",
    style="neutral",
)

Async API

import asyncio
from pathlib import Path

from elevenlabs_smart_tts import AsyncSmartTTS, SynthesisTask, asynthesize

async def main() -> None:
    async with AsyncSmartTTS.from_env() as tts:
        await tts.sync_voices()
        result = await tts.synthesize_to_file(
            SynthesisTask(text="Hello world", language="en"),
            Path("output.mp3"),
        )
        print(result.enhanced_text)

asyncio.run(main())

# Or as a one-liner:
result = asyncio.run(asynthesize("Hello world", language="en"))

Voice management

voices = tts.list_voices(language="en", tags=["narration"])
voice = tts.get_voice("voice-id")

tts.sync_voices(force=True)  # refresh cache from ElevenLabs API

Supported TTS models

Model Best for
eleven_v3 Expressive speech, audio tags, emotions
eleven_multilingual_v2 Multilingual, high voice similarity
eleven_flash_v2_5 Low latency, conversational agents

Development

uv sync --dev
uv run pytest
uv run ruff check .

License

MIT — see LICENSE.

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

elevenlabs_smart_tts-0.1.4.tar.gz (38.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

elevenlabs_smart_tts-0.1.4-py3-none-any.whl (26.3 kB view details)

Uploaded Python 3

File details

Details for the file elevenlabs_smart_tts-0.1.4.tar.gz.

File metadata

  • Download URL: elevenlabs_smart_tts-0.1.4.tar.gz
  • Upload date:
  • Size: 38.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for elevenlabs_smart_tts-0.1.4.tar.gz
Algorithm Hash digest
SHA256 06f9973058da1659a37e18468e48af45c54a6e13bbc9908aea26631f5c8c6d5c
MD5 401c657667d160e54b4eea95ccfb7600
BLAKE2b-256 583ace4c0f65c15ba932d271e567bae649f3b5779fce9f97fc89fffcf81a7710

See more details on using hashes here.

Provenance

The following attestation bundles were made for elevenlabs_smart_tts-0.1.4.tar.gz:

Publisher: publish.yml on vpuhoff/elevenlabs-smart-tts

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file elevenlabs_smart_tts-0.1.4-py3-none-any.whl.

File metadata

File hashes

Hashes for elevenlabs_smart_tts-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 73933f5d27ac39c4b9f7843067e6a2fd081f62efb8c18eeba8514110ca87191a
MD5 b6d461c1b38f142601febc8dfc5cf90a
BLAKE2b-256 a2e6bf823303f64db0f0dd1482ac5f09c25d7738b203d7b21f2f390024a27e70

See more details on using hashes here.

Provenance

The following attestation bundles were made for elevenlabs_smart_tts-0.1.4-py3-none-any.whl:

Publisher: publish.yml on vpuhoff/elevenlabs-smart-tts

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page