A simple wrapper for Cartesia Sonic TTS

These details have not been verified by PyPI

Project links

Project description

Cartesia Sonic TTS Wrapper

You need your own API key to use demo.

About

A simple and powerful wrapper for the Cartesia Sonic Text-to-Speech (TTS) API, providing an easy-to-use interface for generating speech from text in multiple languages with advanced features. The package includes:

A Python library for developers.
A Command-Line Interface (CLI) for terminal interaction.
A Gradio web interface for user-friendly interaction.

Note: To use this wrapper, you need a valid API key from Cartesia. A subscription is required to access the Sonic TTS API. Visit Cartesia Sonic for more information.

Features
Installation
Getting Started
- Setting Up the API Key
Usage
Examples
- Generating Speech with Emotions
- Creating and Using a Custom Voice
Notes
TODO
License
Acknowledgments
Contact

Features

Easy-to-use Python Wrapper: Simplifies interaction with the Cartesia Sonic TTS API.
Text-to-Speech Generation:
- Supports multiple languages.
- Speed control from very slow to very fast.
- Emotion control with adjustable intensity.
- Text improvement options for better TTS results.
Voice Management:
- List available voices with filtering options.
- Create custom voices from audio files.
- Get detailed information about voices.
Command-Line Interface (CLI): Interact with the TTS functionality via the terminal.
Gradio Web Interface: User-friendly web application for interactive use.

Installation

Install the sonic-wrapper package via pip:

pip install sonic-wrapper

Note: The package requires Python 3.9 or higher.

Additional Dependencies for Gradio Interface

If you plan to use the Gradio web interface, install Gradio:

pip install gradio>=5.0.0

Getting Started

Setting Up the API Key

To use the Cartesia Sonic TTS API, you need a valid API key. Obtain an API key by subscribing to the service on the Cartesia Sonic website.

Once you have your API key, you can set it up:

Using the Python Library: Provide the API key when initializing the CartesiaVoiceManager.
Using the CLI: Set the API key using the set-api-key command.
Using the Gradio Interface: Enter the API key in the provided field.

The API key is stored in a .env file for subsequent use.

Usage

As a Python Library

Initializing the Voice Manager

from sonic_wrapper import CartesiaVoiceManager

# Initialize the manager with your API key
manager = CartesiaVoiceManager(api_key='your_api_key_here')

Alternatively, if you have set the CARTESIA_API_KEY environment variable or stored the API key in a .env file, you can initialize without passing the API key:

manager = CartesiaVoiceManager()

Voice Management

Listing Available Voices:

voices = manager.list_available_voices()
for voice in voices:
    print(f"ID: {voice['id']}, Name: {voice['name']}, Language: {voice['language']}")

Filtering Voices by Language and Accessibility:

from sonic_wrapper import VoiceAccessibility

voices = manager.list_available_voices(
    languages=['en'],
    accessibility=VoiceAccessibility.ONLY_PUBLIC
)

Getting Voice Information:

voice_info = manager.get_voice_info('voice_id')
print(voice_info)

Creating a Custom Voice:

voice_id = manager.create_custom_voice(
    name='My Custom Voice',
    source='path/to/your_voice_sample.wav',
    language='en',
    description='This is a custom voice created from my own sample.'
)

Text-to-Speech Generation

Setting the Voice:

manager.set_voice('voice_id')

Adjusting Speed and Emotions:

# Set speech speed (-1.0 to 1.0)
manager.speed = 0.5  # Faster speech

# Set emotions
emotions = [
    {'name': 'positivity', 'level': 'high'},
    {'name': 'surprise', 'level': 'medium'}
]
manager.set_emotions(emotions)

Generating Speech:

output_file = manager.speak(
    text='Hello, world!',
    output_file='output.wav'
)
print(f"Audio saved to {output_file}")

Improving Text Before Synthesis:

from sonic_wrapper import improve_tts_text

text = 'Your raw text here.'
improved_text = improve_tts_text(text, language='en')
manager.speak(text=improved_text, output_file='improved_output.wav')

Command-Line Interface (CLI)

The package includes a CLI tool for interacting with the TTS functionality directly from the terminal.

Commands and Usage

Set API Key

Set your Cartesia API key:

python -m sonic_wrapper.cli set-api-key your_api_key_here

List Voices

List all available voices:

python -m sonic_wrapper.cli list-voices

With filters:

python -m sonic_wrapper.cli list-voices --language en --accessibility api

Generate Speech

Generate speech from text using a specific voice:

python -m sonic_wrapper.cli generate-speech --text "Hello, world!" --voice "Voice Name or ID"

Additional options:

Specify Output File:
```
--output output.wav
```

Adjust Speech Speed:

--speed 0.5  # Speed ranges from -1.0 (slowest) to 1.0 (fastest)

Add Emotions:
```
--emotions "positivity:medium" "surprise:high"
```
Valid emotions: anger, positivity, surprise, sadness, curiosity

Valid intensities: lowest, low, medium, high, highest

Create Custom Voice

Create a custom voice from an audio file:

python -m sonic_wrapper.cli create-voice --name "My Custom Voice" --source path/to/audio.wav

Gradio Web Interface

The Gradio interface provides a user-friendly web application for interacting with the TTS functionality.

Running the Interface

Install Gradio (if not already installed):
```
pip install gradio>=5.0.0
```
Run the Application:
```
python app.py
```
Access the Web Interface:

Open the provided local URL in your web browser.

Online Demo

Try the Gradio interface online without installing anything:

Examples

Generating Speech with Emotions

python -m sonic_wrapper.cli generate-speech \
  --text "I'm so excited to share this news with you!" \
  --voice "Enthusiastic Voice" \
  --emotions "positivity:high" "surprise:medium" \
  --speed 0.5 \
  --output excited_message.wav

Creating and Using a Custom Voice

Step 1: Create a Custom Voice

python -m sonic_wrapper.cli create-voice \
  --name "Custom Voice" \
  --source path/to/your_voice_sample.wav \
  --description "A custom voice created from my own audio sample."

Step 2: Generate Speech with the Custom Voice

python -m sonic_wrapper.cli generate-speech \
  --text "This is my custom voice." \
  --voice "Custom Voice" \
  --output custom_voice_output.wav

Notes

API Key: A valid Cartesia API key is required to use this wrapper. Set your API key using the CLI or in your code. Visit Cartesia Sonic to obtain an API key.
Subscription: Access to the Cartesia Sonic TTS API requires a subscription. Please refer to their pricing page for more details.
Voice Mixing: Currently, voice mixing functionality is not available in the CLI and Gradio versions but is available in the Python library.
Voice Embeddings: The wrapper handles voice embeddings for you, storing them locally for faster access.

TODO

Implement voice mixing functionality in Gradio interface and CLI.
Enhance error handling and logging.
Improve documentation with more examples and use cases.
Add support for additional languages and voices as they become available.

License

This project is licensed under the MIT License.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.4

Nov 6, 2024

0.1.3

Nov 6, 2024

This version

0.1.2

Nov 4, 2024

0.1.1

Nov 4, 2024

0.1.0

Nov 4, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sonic_wrapper-0.1.2.tar.gz (18.0 kB view details)

Uploaded Nov 4, 2024 Source

Built Distribution

sonic_wrapper-0.1.2-py3-none-any.whl (12.4 kB view details)

Uploaded Nov 4, 2024 Python 3

File details

Details for the file sonic_wrapper-0.1.2.tar.gz.

File metadata

Download URL: sonic_wrapper-0.1.2.tar.gz
Upload date: Nov 4, 2024
Size: 18.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: python-httpx/0.27.2

File hashes

Hashes for sonic_wrapper-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`b110ec905b573ab2b71b7318ca6a128dd553d1eedd1a214bd4dcd8b3a14d6915`
MD5	`c5a1a4290b5356da0f8a3cb25ad089e7`
BLAKE2b-256	`1bc637c9ce3cb44b173e90780c9d8647c323465a49c7e15a5a7a105b6de27658`

See more details on using hashes here.

File details

Details for the file sonic_wrapper-0.1.2-py3-none-any.whl.

File metadata

Download URL: sonic_wrapper-0.1.2-py3-none-any.whl
Upload date: Nov 4, 2024
Size: 12.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: python-httpx/0.27.2

File hashes

Hashes for sonic_wrapper-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d2c4d2c514e535e3505f4c64a13b6d15373ed179b4252d98094964100c75cbab`
MD5	`0652ba8409a438fdcb950a18f561f117`
BLAKE2b-256	`855843bf4f90617db5d6a07f7320a27a11da4e395a220c3e095cf09a55d1f0b8`

See more details on using hashes here.

sonic-wrapper 0.1.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Cartesia Sonic TTS Wrapper

About

Table of Contents

Features

Installation

Additional Dependencies for Gradio Interface

Getting Started

Setting Up the API Key

Usage

As a Python Library

Initializing the Voice Manager

Voice Management

Text-to-Speech Generation

Command-Line Interface (CLI)

Commands and Usage

Gradio Web Interface

Running the Interface

Online Demo

Examples

Generating Speech with Emotions

Creating and Using a Custom Voice

Notes

TODO

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes