A universal TTS provider interface with unified expressive markup syntax

These details have not been verified by PyPI

Project description

Wave Form Provider

A universal TTS (Text-to-Speech) provider interface with unified expressive markup syntax. Write once, synthesize anywhere.

Why Wave Form Provider?

Expressive voice models have reached near-human quality. Over the past year, both open-source and commercial TTS providers have exploded with models that let creators control emotion, expressiveness, and delivery with precision. They now handle tone, emotion, and non-verbal sounds (laughs, sighs, whispers) at incredible quality.

However, each provider has their own interface, API, studio, and markup syntax. That makes it hard to experiment, switch, fallback, compare, or mix outputs between models.

Wave Form Provider solves this by providing a unified interface that works across all the best voice models. Write your script once using a simple, consistent syntax, and let the library handle the provider-specific compilation.

Unified Syntax: One markup language works across all TTS providers
Provider Agnostic: Switch providers without rewriting your text
Expressive Control: Add emotions, actions, speed, and more
Type Safe: Full type hints and async support
Well Tested: 147+ tests across all providers

Installation

# Clone the repository
git clone https://github.com/phodonou/wave_form_provider.git
cd wave_form_provider

# Create virtual environment
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

Example: Cartesia Quick Test

Set CARTESIA_API_KEY in your environment (or pass api_key="...") and run the script below:

import asyncio
from wave_form_provider.providers import CartesiaProvider

async def main():
    provider = CartesiaProvider()  # Reads CARTESIA_API_KEY from env, or pass api_key="..."

    response = await provider.synthesize(
        voice_id="6ccbfb76-1fc6-48f7-b71d-91ac6298247b", 
        text="Hello there! [laughter] (excited) This is amazing!"
    )

    with open("output_cartesia.mp3", "wb") as f:
        f.write(response.audio)

asyncio.run(main())

Supported Providers

Cartesia
Hume
Inworld
ElevenLabs
Google Gemini
OpenAI
Orpheus

Unified Syntax

The syntax is simple: write what you want to say and how you want to say it using a universal format. Use [] for things that can be inserted into speech, like actions. Use () to dictate how to say the subsequent speech. The library automatically compiles this into the right format for each TTS provider.

Actions (Inserts) - `[]`

Actions that happen during speech:

"Hello! [laugh] How are you? [sigh]"
"That's interesting [pause] tell me more."

Common actions: [laugh], [chuckle], [sigh], [gasp], [pause], [long pause]

Delivery (Style) - `()`

Control how the text is spoken:

Emotions

"(excited) I got the job! (sad) But I have to move."

Speed

"(fast) Quick announcement: (slow) Now speaking slowly."

Volume

"(quiet) Whisper this. (shout) Shout this!"

Special

"My name is (spell) Bob."

Combined Example

text = "Hello! [laugh] (excited) I have great news! (fast) Let me tell you more."

Provider-Specific Compilation

The library automatically compiles the unified syntax for each provider:

Cartesia

Actions: Passed through as [action]
Emotions: Compiled to <emotion value="angry" />
Speed: Maps to <speed ratio="X"/> ((slow) → 0.6, (fast) → 1.3, (really fast) → 1.5)
Volume: Maps to <volume ratio="X"/> ((quiet) → 0.5, (loud) → 1.5, (shout) → 2.0)
Pauses: Maps to <break time="X"/> ((pause) → 1s, (long pause) → 2s)
Special: (spell)word → <spell>word</spell>

Hume

Actions: Added to description field
Emotions: Added to description field
Speed: Maps to speed parameter ((slow) → 0.6, (fast) → 1.5, (really fast) → 2.0)
Volume: Not supported
Pauses: [pause] at end → trailing_silence: 2, [long pause] → trailing_silence: 4. Preserved in text when in the middle

Inworld

Actions: Passed through as [action]
Emotions: Prepended as [emotion] to each segment
Speed: Maps to speakingRate parameter ((slow) → 0.7, (fast) → 1.3, (really fast) → 1.5)
Volume: Not supported
Pauses: Not supported

ElevenLabs

Actions: Converted [action] → [action] (preserved)
Emotions: Converted (emotion) → [emotion]
Speed: Converted (speed) → [speed] (provider interprets)
Volume: Converted (volume) → [volume] (provider interprets)
Pauses: Converted [pause] → [pause] (preserved)

Google Gemini

Actions: Converted [action] → [action] (preserved)
Emotions: Converted (emotion) → [emotion]
Speed: Converted (speed) → [speed] (provider interprets)
Volume: Converted (volume) → [volume] (provider interprets)
Pauses: Converted [pause] → [pause] (preserved)

Orpheus

Actions: Converted [action] → <action>
Emotions: Stripped (not supported)
Speed: Stripped (not supported)
Volume: Stripped (not supported)
Pauses: Converted [pause] → <pause>

OpenAI

Actions: Controlled via style_guidance parameter
Emotions: Controlled via style_guidance parameter
Speed: Controlled via style_guidance parameter
Volume: Controlled via style_guidance parameter
Pauses: Controlled via style_guidance parameter
Note: All markup is stripped from text. Use natural language in style_guidance like "speak with excitement and laugh occasionally"

Using Different Providers

Import any provider directly:

from wave_form_provider.providers import CartesiaProvider, ElevenLabsProvider, HumeProvider

# Use Cartesia
cartesia = CartesiaProvider()
response = await cartesia.synthesize(voice_id="...", text="...")

# Use ElevenLabs
elevenlabs = ElevenLabsProvider(api_key="...")
response = await elevenlabs.synthesize(voice_id="...", text="...")

API Reference

Method: `synthesize()`

Generate speech from text and return audio bytes.

async def synthesize(
    voice_id: str,                    # Voice ID from provider (get from provider's dashboard/docs)
    text: str,                         # Text to synthesize (supports unified syntax)
    style_guidance: Optional[str] = None,  # Natural language style guidance (provider-specific)
    seed: Optional[float] = None,     # Random seed for reproducibility
    creativity: float = 0.5,          # Creativity/variation (0.0-1.0, default 0.5)
) -> SynthesisResponse

Returns: SynthesisResponse object with:

response.audio - bytes: Audio data (MP3, WAV, etc. depending on provider)
response.metadata - SynthesisMetadata object containing:
- voice_id: The voice used
- model: Model name
- size_bytes: Audio file size
- streaming: Always False for synthesize()
- duration_seconds: Audio duration (if available)
- sample_rate: Sample rate in Hz (if available)

Example:

response = await provider.synthesize(
    voice_id="voice-123",
    text="Hello! [laugh] (excited) This is amazing!",
    creativity=0.7
)

# Save audio
with open("output.mp3", "wb") as f:
    f.write(response.audio)

# Access metadata
print(f"Generated {response.metadata.size_bytes} bytes")
print(f"Model: {response.metadata.model}")

Method: `synthesize_stream()`

Generate speech with streaming audio chunks (not yet implemented for most providers).

async def synthesize_stream(
    voice_id: str,
    text: str,
    style_guidance: Optional[str] = None,
    seed: Optional[float] = None,
    creativity: float = 0.5,
) -> SynthesisStreamResponse

Returns: SynthesisStreamResponse with audio as an AsyncIterator[bytes].

Getting Voice IDs

Voice IDs are provider-specific. Get them from:

Cartesia: Cartesia Dashboard
ElevenLabs: ElevenLabs Voice Library
Hume: Hume Dashboard
OpenAI: OpenAI Voice Models
Google Gemini: Google Cloud Console
Inworld: Inworld Studio
Orpheus: Use voice names like "tara", "dan", "josh", "emma" (see Replicate model docs)

Error Handling

Providers may raise:

ValueError: Invalid parameters (e.g., missing API key, invalid voice_id)
RuntimeError: API request failed or synthesis error
ImportError: Provider dependencies not installed

try:
    response = await provider.synthesize(voice_id="...", text="...")
except ValueError as e:
    print(f"Invalid input: {e}")
except RuntimeError as e:
    print(f"Synthesis failed: {e}")

Environment Variables

Set API keys via environment variables:

export CARTESIA_API_KEY="your-key"
export HUME_API_KEY="your-key"
export INWORLD_API_KEY="your-key"
export ELEVENLABS_API_KEY="your-key"
export OPENAI_API_KEY="your-key"
export REPLICATE_API_TOKEN="your-key"  # For Orpheus
export GOOGLE_GENERATIVE_AI_API_KEY="your-key"

Testing

# Run all tests
pytest tests/

# Run specific provider tests
pytest tests/test_cartesia_provider.py -v

Roadmap

Generate proper documentation
Publish to PyPI as installable package
Create web playground
Streaming support
Different lang support
CLI interface
Auto chunk and re-stitch based on character limit
Multi speaker support
Get audio back along with timestamps
Audio format conversion utilities
Cost tracking utilities
More OSS providers

Contributing

Contributions welcome! Please:

Fork the repository
Create a feature branch
Add tests for new functionality
Ensure all tests pass
Submit a pull request

License

MIT License - see LICENSE file for details

Acknowledgments

Built with love for the voice AI community. Special thanks to all the TTS provider teams for their amazing APIs.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.3

Dec 9, 2025

0.1.2

Nov 24, 2025

0.1.1

Nov 24, 2025

This version

0.1.0

Nov 24, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wave_form_provider-0.1.0.tar.gz (22.9 kB view details)

Uploaded Nov 24, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

wave_form_provider-0.1.0-py3-none-any.whl (21.0 kB view details)

Uploaded Nov 24, 2025 Python 3

File details

Details for the file wave_form_provider-0.1.0.tar.gz.

File metadata

Download URL: wave_form_provider-0.1.0.tar.gz
Upload date: Nov 24, 2025
Size: 22.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for wave_form_provider-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`c82cf1ad3a00a2f955accd2b65581c978ac028c3b14dd241f0014d5230ea96b9`
MD5	`084151f8d6336619a434cecefbdd853c`
BLAKE2b-256	`e2962c67dd9b0e273e4f1b9b8aafaafd61ba823b65d7445a965b285ac11c4a60`

See more details on using hashes here.

File details

Details for the file wave_form_provider-0.1.0-py3-none-any.whl.

File metadata

Download URL: wave_form_provider-0.1.0-py3-none-any.whl
Upload date: Nov 24, 2025
Size: 21.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for wave_form_provider-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8761c77c17312d4e47b856de95af2df7df85d854d937b07ce25c97778dbb5a2f`
MD5	`1d35f2902bf1b30b6281e01af8c6fe5d`
BLAKE2b-256	`a00f844418e6a121e5250dc839bc5e3ecf6f9c1171a23a9ff6b8795e657d25fd`

See more details on using hashes here.

wave-form-provider 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Wave Form Provider

Why Wave Form Provider?

Installation

Example: Cartesia Quick Test

Supported Providers

Unified Syntax

Actions (Inserts) - []

Delivery (Style) - ()

Emotions

Speed

Volume

Special

Combined Example

Provider-Specific Compilation

Cartesia

Hume

Inworld

ElevenLabs

Google Gemini

Orpheus

OpenAI

Using Different Providers

API Reference

Method: synthesize()

Method: synthesize_stream()

Getting Voice IDs

Error Handling

Environment Variables

Testing

Roadmap

Contributing

License

Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Actions (Inserts) - `[]`

Delivery (Style) - `()`

Method: `synthesize()`

Method: `synthesize_stream()`