LiveKit Agents TTS plugin for the Clevr Labs conversational speech model

Project description

livekit-plugins-clevrlabs

A LiveKit Agents TTS plugin for the Clevr Labs conversational speech model. It sends text to the Clevr Labs voice servers and streams audio back — no ML code runs on your machine, everything happens on Clevr's servers.

What it does

You drop it into a LiveKit agent in place of any other TTS provider. Your agent sends text, Clevr sends back audio. The model keeps the same voice across the whole conversation as long as you pass user audio context after each user turn (see below).

Install

uv add livekit-plugins-clevrlabs

You'll also need the LiveKit Agents SDK and a few plugins for STT/LLM/VAD:

uv add "livekit-agents>=1.4" livekit-plugins-groq livekit-plugins-openai livekit-plugins-silero python-dotenv numpy

Quick start

from livekit.plugins import clevrlabs

tts = clevrlabs.TTS(api_key="clevr_...")

# Wire into a LiveKit AgentSession:
session = AgentSession(tts=tts, ...)

By default the plugin talks to the hosted Clevr Labs API at https://api.theclevr.com. Pass server_url=... only if you're pointing at a different endpoint.

Complete example

Save as agent.py. This is a full, runnable LiveKit voice agent that uses Clevr for TTS:

import asyncio
import os

import numpy as np
from dotenv import load_dotenv
from livekit import rtc
from livekit.agents import Agent, AgentSession, JobContext, WorkerOptions, cli
from livekit.plugins import clevrlabs, groq, openai as lk_openai, silero

load_dotenv()


async def entrypoint(ctx: JobContext):
    await ctx.connect()

    # 1. Create the TTS plugin
    tts_plugin = clevrlabs.TTS(api_key=os.environ["CLEVR_API_KEY"])
    ctx.add_shutdown_callback(tts_plugin.aclose)

    # 2. Buffer user audio while they're speaking (used for voice context)
    audio_buffer: list[np.ndarray] = []
    user_is_speaking = False

    async def _capture_user_audio(track: rtc.Track):
        stream = rtc.AudioStream.from_track(track=track, sample_rate=48000, num_channels=1)
        async for event in stream:
            if not user_is_speaking:
                continue
            frames = np.array(event.frame.data, dtype=np.int16).astype(np.float32) / 32768.0
            audio_buffer.append(frames)

    @ctx.room.on("track_subscribed")
    def on_track_subscribed(track, publication, participant):
        if track.kind == rtc.TrackKind.KIND_AUDIO:
            asyncio.ensure_future(_capture_user_audio(track))

    # 3. Wire up the agent session (swap STT/LLM for whatever providers you use)
    session = AgentSession(
        stt=groq.STT(model="whisper-large-v3-turbo", language="en"),
        llm=lk_openai.LLM(model="gpt-4o-mini"),
        tts=tts_plugin,
        vad=silero.VAD.load(),
    )

    # 4. Track when the user is speaking so we know what audio to buffer
    @session.on("user_state_changed")
    def _on_user_state(ev):
        nonlocal user_is_speaking
        user_is_speaking = (ev.new_state == "speaking")

    # 5. After each user turn, send the audio + transcript to the TTS server as context
    @session.on("user_input_transcribed")
    def _on_transcript(ev):
        if ev.is_final and ev.transcript and audio_buffer:
            audio_np = np.concatenate(audio_buffer)
            audio_buffer.clear()
            tts_plugin.add_user_turn(text=ev.transcript, audio=audio_np, sample_rate=48000)

    await session.start(
        agent=Agent(instructions="You are a helpful voice assistant."),
        room=ctx.room,
    )


if __name__ == "__main__":
    cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))

Set up your environment

Create a .env file next to agent.py:

# LiveKit Cloud — get from https://cloud.livekit.io
LIVEKIT_URL=wss://your-project.livekit.cloud
LIVEKIT_API_KEY=APIxxxxxxxxx
LIVEKIT_API_SECRET=your-secret

# STT — Groq has a free tier for Whisper
GROQ_API_KEY=gsk_...

# LLM — OpenAI (swap for any provider supported by livekit-plugins-openai)
OPENAI_API_KEY=sk-...

# Clevr TTS
CLEVR_API_KEY=clevr_...

Run it

python agent.py dev

Then connect a LiveKit room (e.g. agents-playground.livekit.io) and talk to your agent.

Notes on the example

add_user_turn requires both text and audio. Passing only one will cause the voice to drift over the conversation. The agent's own turns are handled automatically — clevrlabs.TTS appends each synthesized response to its context for you.
Feed clean transcripts into add_user_turn. Since user text is paired with real audio in the model's context, low-quality STT output can drift the voice over a long conversation. If you use a Whisper-family STT, filter its known phantom transcripts (e.g. "Thanks for watching!" on silent audio) before calling add_user_turn.

Get an API key

theclevr.com

Project details

Release history Release notifications | RSS feed

0.1.5

Jun 9, 2026

0.1.4

Jun 9, 2026

0.1.3

Jun 9, 2026

This version

0.1.2

Jun 9, 2026

0.1.1

Jun 8, 2026

0.1.0

Jun 7, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

livekit_plugins_clevrlabs-0.1.2.tar.gz (8.6 kB view details)

Uploaded Jun 9, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

livekit_plugins_clevrlabs-0.1.2-py3-none-any.whl (10.5 kB view details)

Uploaded Jun 9, 2026 Python 3

File details

Details for the file livekit_plugins_clevrlabs-0.1.2.tar.gz.

File metadata

Download URL: livekit_plugins_clevrlabs-0.1.2.tar.gz
Upload date: Jun 9, 2026
Size: 8.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.7.12

File hashes

Hashes for livekit_plugins_clevrlabs-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`55fcbb898ea4c2d86820c2493cb81471902e6bafc994b663d25299e09463a8d5`
MD5	`2b394d945430b62e3ba2dbfe8649da1c`
BLAKE2b-256	`8f6ff2b854bd8b702bb9fa53db43001cc63bc9c7bb82cdc1227df02547939c2d`

See more details on using hashes here.

File details

Details for the file livekit_plugins_clevrlabs-0.1.2-py3-none-any.whl.

File metadata

Download URL: livekit_plugins_clevrlabs-0.1.2-py3-none-any.whl
Upload date: Jun 9, 2026
Size: 10.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.7.12

File hashes

Hashes for livekit_plugins_clevrlabs-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0a40af70227780962d695ef29334286f048d40d0094b99ae6c3da2d56ea004cb`
MD5	`3e12e5e21487fd92061c8924ef0de1d0`
BLAKE2b-256	`9277d99bebb50f4bf75d75048328bd42e3818ce36b501f659798067c6a9f4037`

See more details on using hashes here.

livekit-plugins-clevrlabs 0.1.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

livekit-plugins-clevrlabs

What it does

Install

Quick start

Complete example

Set up your environment

Run it

Notes on the example

Get an API key

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes