Skip to main content

Dual-agent voice orchestration built on livekit-agents

Project description

stimm

Dual-agent voice orchestration built on livekit-agents.

One agent talks fast. One agent thinks deep. They collaborate in real-time.

┌─────────────────────────────────────────────────────────────┐
│  stimm — dual-agent voice orchestration on LiveKit          │
│                                                             │
│  ┌────────────────────┐   ┌─────────────────────────────┐   │
│  │  VoiceAgent        │   │  Supervisor                 │   │
│  │  (livekit Agent)   │◄──│  (any language/runtime)     │   │
│  │                    │──►│                             │   │
│  │  Talks to user     │   │  Watches transcript         │   │
│  │  Fast LLM          │   │  Calls tools                │   │
│  │  VAD→STT→LLM→TTS  │   │  Sends instructions         │   │
│  │  Pre-TTS buffering │   │  Controls flow              │   │
│  └────────────────────┘   └─────────────────────────────┘   │
│           │                         │                       │
│           └──── Data Channel ───────┘                       │
│                 (stimm protocol)                            │
└─────────────────────────────────────────────────────────────┘

Install

# Python (voice agent + supervisor base class)
pip install stimm[deepgram,openai]

# TypeScript (supervisor client for Node.js consumers)
npm install @stimm/protocol

Quick Start

Voice Agent (Python)

from stimm import VoiceAgent
from livekit.plugins import silero, deepgram, openai

agent = VoiceAgent(
    stt=deepgram.STT(),
    tts=openai.TTS(),
    vad=silero.VAD.load(),
    fast_llm=openai.LLM(model="gpt-4o-mini"),
    buffering_level="MEDIUM",
    mode="hybrid",
    instructions="You are a helpful voice assistant.",
)

if __name__ == "__main__":
    from livekit.agents import WorkerOptions, cli
    cli.run_app(WorkerOptions(entrypoint_fnc=agent.entrypoint))

Supervisor (Python)

from stimm import Supervisor, TranscriptMessage

class MySupervisor(Supervisor):
    async def on_transcript(self, msg: TranscriptMessage):
        if not msg.partial:
            # Process with your powerful LLM, call tools, etc.
            result = await my_big_llm.process(msg.text)
            await self.instruct(result.text, speak=True)

supervisor = MySupervisor()
await supervisor.connect("ws://localhost:7880", token)

Supervisor (TypeScript)

import { StimmSupervisorClient } from "@stimm/protocol";

const client = new StimmSupervisorClient({
  livekitUrl: "ws://localhost:7880",
  token: supervisorToken,
});

client.on("transcript", async (msg) => {
  if (!msg.partial) {
    const result = await myAgent.process(msg.text);
    await client.instruct({ text: result, speak: true, priority: "normal" });
  }
});

await client.connect();

Concepts

Dual-Agent Architecture

Agent Role LLM Latency
VoiceAgent Talks to the user Fast, small (e.g. GPT-4o-mini) ~500ms
Supervisor Thinks, plans, uses tools Large, capable (e.g. Claude, GPT-4o) Background

They communicate via LiveKit data channels using the stimm protocol — structured JSON messages flowing both directions.

Modes

Mode Behavior
autonomous Voice agent uses its own fast LLM independently
relay Voice agent speaks exactly what the supervisor sends
hybrid (default) Voice agent responds autonomously but incorporates supervisor instructions

Pre-TTS Buffering

Controls how LLM tokens are batched before TTS:

Level Behavior
NONE Every token immediately (lowest latency, choppiest)
LOW Buffer until word boundary
MEDIUM Buffer until 4+ words or punctuation (default)
HIGH Buffer until sentence boundary

Development

# Local LiveKit server
docker compose up -d

# Install in dev mode
pip install -e ".[dev]"

# Run tests
pytest

# Lint
ruff check src/ tests/

Protocol

See docs/protocol.md for the full message specification.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

stimm-0.1.2.tar.gz (598.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

stimm-0.1.2-py3-none-any.whl (26.3 kB view details)

Uploaded Python 3

File details

Details for the file stimm-0.1.2.tar.gz.

File metadata

  • Download URL: stimm-0.1.2.tar.gz
  • Upload date:
  • Size: 598.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for stimm-0.1.2.tar.gz
Algorithm Hash digest
SHA256 03b284adc57e7964e712c891cd6f12d8b756b05eec4705a50588d069694ee608
MD5 3441c473a61fbbac5d415f0c507169b2
BLAKE2b-256 088b5696311cbd7d6f6550fe1d0d6e235e0e944fb2eddf4b8bf2a2e05bc66078

See more details on using hashes here.

Provenance

The following attestation bundles were made for stimm-0.1.2.tar.gz:

Publisher: release.yml on stimm-ai/stimm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file stimm-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: stimm-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 26.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for stimm-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 54e75b098051e007b3829dc21209ffcc51bb4a480e94cfe2ef924daa71bd2876
MD5 0ae450e2a7ecd360629cd5b62be660e5
BLAKE2b-256 18599012e689c5004cc35ae2a7aa2ad39fc3cfa5fd0737e8993a8f154d38c0fa

See more details on using hashes here.

Provenance

The following attestation bundles were made for stimm-0.1.2-py3-none-any.whl:

Publisher: release.yml on stimm-ai/stimm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page