Skip to main content

Cartesia TTS integration for Vision Agents

Project description

Cartesia

Cartesia is a service that provides Speech-to-Text (STT) and Text-to-Speech (TTS) capabilities. It's designed for real-time voice applications, making it ideal for voice AI agents, transcription pipelines, and conversational interfaces.

The Cartesia plugin for the Stream Python AI SDK allows you to add TTS functionality to your project.

Installation

Install the Stream Cartesia plugin with

uv add "vision-agents[cartesia]"
# or directly
uv add vision-agents-plugins-cartesia

Examples

Read on for some key details and check out our Cartesia examples to see working code samples:

  • in tts.py we see a simple bot greeting users upon joining a call
  • in narrator-example.py we see a well-prompted combination of a STT -> LLM -> TTS flow that leverages the powers of Cartesia's Sonic 3 model to narrate a creative story from the user's input

Initialisation

The Cartesia plugin for Stream exists in the form of the TTS class:

from vision_agents.plugins import cartesia

tts = cartesia.TTS()
To initialise without passing in the API key, make sure the `CARTESIA_API_KEY` is available as an environment variable. You can do this either by defining it in a `.env` file or exporting it directly in your terminal.

Parameters

These are the parameters available in the CartesiaTTS plugin for you to customise:

Name Type Default Description
api_key str or None None Your Cartesia API key. If not provided, the plugin will look for the CARTESIA_API_KEY environment variable.
model_id str "sonic-3" ID of the Cartesia STT or TTS model to use. Defaults to the recently released Sonic-3
voice_id str or None "f9836c6e-a0bd-460e-9d3c-f7299fa60f94" ID of the voice to use for TTS responses.
sample_rate int 16000 Sample rate (in Hz) used for audio processing.

Functionality

Send text to convert to speech

The send_iter() method sends the text passed in for the service to synthesize and yields TTSOutputChunks containing the produced PCM audio.

async for chunk in tts.send_iter("Demo text you want AI voice to say"):
    pass

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vision_agents_plugins_cartesia-0.6.1.tar.gz (4.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file vision_agents_plugins_cartesia-0.6.1.tar.gz.

File metadata

  • Download URL: vision_agents_plugins_cartesia-0.6.1.tar.gz
  • Upload date:
  • Size: 4.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.6 {"installer":{"name":"uv","version":"0.10.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for vision_agents_plugins_cartesia-0.6.1.tar.gz
Algorithm Hash digest
SHA256 957bddf7478a98c9b6d54d86e32f2a012823d0dcfa79d0b5ba0d8cfa6faf1fbe
MD5 5e8daf027fe762a6c7f1c667f564709d
BLAKE2b-256 e8d4c7e98266e4426544857ee912293f148a12bb93f025e8fc9533d9560748b8

See more details on using hashes here.

File details

Details for the file vision_agents_plugins_cartesia-0.6.1-py3-none-any.whl.

File metadata

  • Download URL: vision_agents_plugins_cartesia-0.6.1-py3-none-any.whl
  • Upload date:
  • Size: 8.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.6 {"installer":{"name":"uv","version":"0.10.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for vision_agents_plugins_cartesia-0.6.1-py3-none-any.whl
Algorithm Hash digest
SHA256 5b8537f8ec0a5de2e0ed25ca32c5427a6dff233a4c50e1f942187764d3caacdc
MD5 0a206f1085c1e3aba2576b1854c7bfde
BLAKE2b-256 fe1fc30e8102db6a66417f592af27a0b179765342c19b6cdddbda1201d367651

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page