Cartesia TTS integration for Vision Agents
Project description
Cartesia
Cartesia is a service that provides Speech-to-Text (STT) and Text-to-Speech (TTS) capabilities. It's designed for real-time voice applications, making it ideal for voice AI agents, transcription pipelines, and conversational interfaces.
The Cartesia plugin for the Stream Python AI SDK allows you to add TTS functionality to your project.
Installation
Install the Stream Cartesia plugin with
uv add "vision-agents[cartesia]"
# or directly
uv add vision-agents-plugins-cartesia
Examples
Read on for some key details and check out our Cartesia examples to see working code samples:
- in tts.py we see a simple bot greeting users upon joining a call
- in narrator-example.py we see a well-prompted combination of a STT -> LLM -> TTS flow that leverages the powers of Cartesia's Sonic 3 model to narrate a creative story from the user's input
Initialisation
The Cartesia plugin for Stream exists in the form of the TTS class:
from vision_agents.plugins import cartesia
tts = cartesia.TTS()
To initialise without passing in the API key, make sure the `CARTESIA_API_KEY` is available as an environment variable.
You can do this either by defining it in a `.env` file or exporting it directly in your terminal.
Parameters
These are the parameters available in the CartesiaTTS plugin for you to customise:
| Name | Type | Default | Description |
|---|---|---|---|
api_key |
str or None |
None |
Your Cartesia API key. If not provided, the plugin will look for the CARTESIA_API_KEY environment variable. |
model_id |
str |
"sonic-3" |
ID of the Cartesia STT or TTS model to use. Defaults to the recently released Sonic-3 |
voice_id |
str or None |
"f9836c6e-a0bd-460e-9d3c-f7299fa60f94" |
ID of the voice to use for TTS responses. |
sample_rate |
int |
16000 |
Sample rate (in Hz) used for audio processing. |
Functionality
Send text to convert to speech
The send() method sends the text passed in for the service to synthesize.
The resulting audio is then played through the configured output track.
tts.send("Demo text you want AI voice to say")
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vision_agents_plugins_cartesia-0.5.1.tar.gz.
File metadata
- Download URL: vision_agents_plugins_cartesia-0.5.1.tar.gz
- Upload date:
- Size: 4.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.6 {"installer":{"name":"uv","version":"0.10.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b6997cbe8829be029b8ce41ea1852bd4aa83e100b667d01a57350bf22b687903
|
|
| MD5 |
6f5df4a88b6a4f581683482d04ada02d
|
|
| BLAKE2b-256 |
3bb622bb2d1256a2306e76e4d54d80f35c1f6a4f0204db513de4a5acaa17e925
|
File details
Details for the file vision_agents_plugins_cartesia-0.5.1-py3-none-any.whl.
File metadata
- Download URL: vision_agents_plugins_cartesia-0.5.1-py3-none-any.whl
- Upload date:
- Size: 8.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.6 {"installer":{"name":"uv","version":"0.10.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
34ee228232adbdf024e890f998a3fdc3d50af925c33e77d4e9b814a6746b717e
|
|
| MD5 |
6fd6f29d12f7663565b495c0cdf8fea2
|
|
| BLAKE2b-256 |
e1b2acf433568bb43f68f19c067899f288a1a1f2db1d3ee6f65dd888bec23507
|