Skip to main content

End of utterance detection for LiveKit Agents

Project description

LiveKit Plugins – Turn Detector

livekit-plugins-external-turn-detector provides end-of-turn detection for LiveKit Agents using custom models to determine when a user has finished speaking.

This plugin enables accurate conversation flow management by leveraging language models trained specifically for turn detection, offering superior performance compared to traditional VAD-based approaches.

✨ Features

  • 🎯 Built-in Models — English and multilingual models that run locally
  • 🔌 LiveKit plugin integration — plug-and-play support for LiveKit workflows
  • 🤖 Compatible with livekit-agents — seamless integration with agent framework
  • 🚀 External Server Support — use custom models via OpenAI-compatible APIs, vLLM, or NVIDIA Triton
  • Low-latency inference — ~10ms (English) / ~25ms (multilingual) per inference
  • 🌍 Multilingual support — 13+ languages in the multilingual model
  • 🔧 Flexible backends — choose between local inference or remote servers

🔧 Installation

# from PyPI
pip install -U livekit-plugins-external-turn-detector

# from source
pip install git+https://github.com/dangvansam/livekit-plugins-turn-detector.git

🔌 Usage

Built-in Models

English model

The English model is the smaller of the two models. It requires 200MB of RAM and completes inference in ~10ms

from livekit.plugins.turn_detector.english import EnglishModel

session = AgentSession(
    ...
    turn_detection=EnglishModel(),
)

Multilingual model

We've trained a separate multilingual model that supports the following languages: English, French, Spanish, German, Italian, Portuguese, Dutch, Chinese, Japanese, Korean, Indonesian, Russian, Turkish

The multilingual model requires ~400MB of RAM and completes inferences in ~25ms.

from livekit.plugins.turn_detector.multilingual import MultilingualModel

session = AgentSession(
    ...
    turn_detection=MultilingualModel(),
)

External Server Models

For custom models or when you need to offload inference to a dedicated server, you can use external backends. The plugin supports flexible model configuration, allowing you to use any compatible language model for turn detection.

Supported Backends:

  • vLLM: High-performance inference with any HuggingFace-compatible model
  • OpenAI API: Direct integration with OpenAI models
  • Triton: Enterprise-grade inference server with custom model support
  • Custom APIs: Any OpenAI-compatible API endpoint

Using vLLM Backend

For high-performance inference with custom models using vLLM:

from livekit.plugins.turn_detector.external import ExternalModel

# Using vLLM with OpenAI-compatible API
turn_detector = ExternalModel(
    provider="openai",  # vLLM uses OpenAI-compatible API
    base_url="http://localhost:8000",  # Your vLLM server endpoint
    model_name="Qwen/Qwen3-0.6B",  # Model name in vLLM (or your custom model)
    api_key="EMPTY",  # Usually "EMPTY" for vLLM or your custom key
    temperature=0.1,
    max_tokens=20,
    system_prompt="You are a speaking turn-ending identifier. Your task is to identify whether the user's speaking turn is complete or not. Respond with 'end' if the user's turn is complete, or 'continue' if it is not."
)

session = AgentSession(
    ...
    turn_detection=turn_detector,
)

Using NVIDIA Triton Inference Server

For high-performance inference with custom models:

from livekit.plugins.turn_detector.external import ExternalModel

turn_detector = ExternalModel(
    provider="triton",
    url="localhost:7001",  # Your Triton server gRPC endpoint
    model_name="ensemble",      # Your model name in Triton
    tokenizer="Qwen/Qwen3-0.6B",
    temperature=0.1,
    max_tokens=20,
)

session = AgentSession(
    ...
    turn_detection=turn_detector,
)

Using OpenAI Backend

Environment Variables (shared across all providers):

See .env.example for a complete configuration template with examples for different use cases.

Core Configuration:

export TURN_DETECTION_PROVIDER="openai"  # Provider: "openai" or "triton"
export TURN_DETECTION_BASE_URL="http://localhost:8000"  # Server URL
export TURN_DETECTION_MODEL="Qwen/Qwen3-0.6B"  # Any compatible model
export TURN_DETECTION_API_KEY="EMPTY"  # API key (EMPTY for vLLM, required for OpenAI)

Optional Tuning Parameters:

export TURN_DETECTION_TEMPERATURE="0.1"  # Lower = more deterministic
export TURN_DETECTION_MAX_TOKENS="20"  # Response length limit
export TURN_DETECTION_SUPPORT_LANGUAGES="en,zh"  # Target languages
export TURN_DETECTION_SYSTEM_PROMPT="Custom instructions..."  # Model behavior
export TURN_DETECTION_TOKENIZER="Qwen/Qwen3-0.6B"  # Triton only: preprocessing

Flexible Model Options:

  • Use any HuggingFace model ID: "microsoft/DialoGPT-medium", "Qwen/Qwen2.5-7B-Instruct"
  • Deploy custom fine-tuned models: "your-org/custom-turn-detector"
  • Point to local model paths with Triton or vLLM
  • Configure multi-language support for your specific use case

You can then use the turn detector with just environment variables:

from livekit.plugins.turn_detector.external import ExternalModel

# Using environment variables only (provider auto-detected from TURN_DETECTION_PROVIDER)
turn_detector = ExternalModel()

session = AgentSession(
    ...
    turn_detection=turn_detector,
)

Easy Provider Switching: With unified environment variables, you can easily switch between providers:

# For vLLM/OpenAI
export TURN_DETECTION_PROVIDER="openai"
export TURN_DETECTION_BASE_URL="http://localhost:8000"
export TURN_DETECTION_MODEL="Qwen/Qwen3-0.6B"

# For Triton (same variables, different values)
export TURN_DETECTION_PROVIDER="triton"
export TURN_DETECTION_BASE_URL="localhost:7001"
export TURN_DETECTION_MODEL="ensemble"
export TURN_DETECTION_TOKENIZER="Qwen/Qwen3-0.6B"

Setting Up vLLM Server

For flexible model deployment with vLLM:

# Install vLLM
pip install vllm

# Option 1: Use Qwen models (recommended for turn detection)
python -m vllm.entrypoints.openai.api_server \
    --model Qwen/Qwen3-0.6B \
    --host 0.0.0.0 \
    --port 8000

# Option 2: Use your custom fine-tuned model
python -m vllm.entrypoints.openai.api_server \
    --model your-username/your-turn-detection-model \
    --host 0.0.0.0 \
    --port 8000

# Option 3: Local model path
python -m vllm.entrypoints.openai.api_server \
    --model /path/to/your/local/model \
    --host 0.0.0.0 \
    --port 8000

Model Flexibility:

  • Any HuggingFace Model: Use any compatible model for turn detection
  • Custom Fine-tuned Models: Deploy your domain-specific turn detection models
  • Multi-language Support: Configure TURN_DETECTION_SUPPORT_LANGUAGES for your target languages
  • Performance Tuning: Adjust temperature and max_tokens based on your model's characteristics

Using NVIDIA Triton Inference Server

from livekit.plugins.turn_detector.external import ExternalModel

turn_detector = ExternalModel(
    provider="triton",
    url="localhost:7001",  # Your Triton server gRPC endpoint
    model_name="ensemble",      # Your model name in Triton
    tokenizer="Qwen/Qwen3-0.6B",
    temperature=0.1,
    max_tokens=20,
)

session = AgentSession(
    ...
    turn_detection=turn_detector,
)

Triton Server Configuration

Your Triton server should have models that accept:

Inputs:

  • text_input (BYTES): Input prompt
  • max_tokens (INT32): Max tokens to generate
  • temperature (FP32): Sampling temperature
  • Additional generation parameters as needed

Outputs:

  • text_output (BYTES): Generated text ("end" or "continue")

Usage with RealtimeModel

The turn detector can be used even with speech-to-speech models such as OpenAI's Realtime API. You'll need to provide a separate STT to ensure our model has access to the text content.

session = AgentSession(
    ...
    stt=deepgram.STT(model="nova-3", language="multi"),
    llm=openai.realtime.RealtimeModel(),
    turn_detection=MultilingualModel(),
)

🚀 Running your agent

This plugin requires model files. Before starting your agent for the first time, or when building Docker images for deployment, run the following command to download the model files:

python my_agent.py download-files

📊 Model system requirements

Built-in Models

The built-in end-of-turn models are optimized to run on CPUs with modest system requirements. They are designed to run on the same server hosting your agents.

  • English model: ~200MB RAM, ~10ms inference time
  • Multilingual model: ~400MB RAM, ~25ms inference time
  • Both models run within a shared inference server, supporting multiple concurrent sessions

External Models

When using external backends, system requirements depend on your chosen configuration:

vLLM Backend

  • Highly optimized for transformer models with GPU acceleration
  • Supports continuous batching for improved throughput
  • Memory-efficient PagedAttention for handling multiple concurrent requests
  • Recommended for production deployments requiring high performance
  • Compatible with most Hugging Face models

Triton Inference Server

  • Server requirements depend on your model size and configuration
  • Supports GPU acceleration for faster inference
  • Can handle high-throughput scenarios with proper scaling
  • Recommended for production deployments with custom models

📚 Documentation

For more information, see the official documentation.

📄 License

The plugin source code is licensed under the Apache-2.0 license.

The end-of-turn model is licensed under the LiveKit Model License.

🙏 Acknowledgments

This plugin leverages language models specifically trained for turn detection, providing more accurate conversation flow management compared to traditional VAD-based approaches.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file livekit_plugins_external_turn_detector-1.2.14.tar.gz.

File metadata

File hashes

Hashes for livekit_plugins_external_turn_detector-1.2.14.tar.gz
Algorithm Hash digest
SHA256 47172a2550f8abac284210e52ade45e515038cd64055986da012db5e0d8a36cf
MD5 cb725a91277d0184818017c6218c0625
BLAKE2b-256 75f98f4ec38e2c8a09fd7077ea996d11443173dec2606e14ce742c37357fc17d

See more details on using hashes here.

File details

Details for the file livekit_plugins_external_turn_detector-1.2.14-py3-none-any.whl.

File metadata

File hashes

Hashes for livekit_plugins_external_turn_detector-1.2.14-py3-none-any.whl
Algorithm Hash digest
SHA256 1c5fbb610e1df1fb54cacbdff041d5ddbca7964c1fa4f147753f6e8a2c13edd6
MD5 42ef882ac10d8935c36e9cc219451eeb
BLAKE2b-256 3fa4a2c6403e4f70853c465d80d2b9e11c0b7b17a70777b0b55931eb4d4dd91e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page