Skip to main content

Qwen Omni plugin for vision agents

Project description

Qwen Realtime Plugin for Vision Agents

Qwen3 Realtime LLM integration for Vision Agents framework with native audio output and built-in speech recognition using WebSocket-based realtime communication.

Features

  • Native audio output: No TTS service needed - audio comes directly from the model
  • Built-in STT: Integrated speech-to-text using gummy-realtime-v1 - no external STT service required
  • Server-side VAD: Automatic turn detection with configurable silence thresholds
  • Video understanding: Optional video frame support for multimodal interactions
  • Real-time streaming: WebSocket-based bidirectional communication for low-latency responses
  • Interruption handling: Automatic cancellation when user starts speaking

Installation

uv add "vision-agents[qwen]"
# or directly
uv add vision-agents-plugins-qwen

Usage

from vision_agents.core import User, Agent
from vision_agents.plugins import getstream, qwen

agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="Qwen Assistant"),
    instructions="Be helpful and friendly",
    llm=qwen.Realtime(
        model="qwen3-omni-flash-realtime",
        voice="Cherry",
        fps=1,
    ),
    # No STT or TTS needed - Qwen Realtime provides both
)

Configuration

Parameter Description Default Accepted Values
model Qwen Realtime model identifier "qwen3-omni-flash-realtime" Model name string
api_key DashScope API key None (from env) String or None
base_url WebSocket API base URL "wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime" URL string
voice Voice for audio output "Cherry" Voice name string
fps Video frames per second 1 Integer
include_video Include video frames in requests False Boolean
video_width Video frame width 1280 Integer
video_height Video frame height 720 Integer

Environment Variables

Set DASHSCOPE_API_KEY in your environment or .env file:

DASHSCOPE_API_KEY=your_dashscope_api_key_here

Example

See plugins/qwen/example/qwen_realtime_example.py for a complete working example.

Dependencies

  • vision-agents
  • websockets
  • aiortc
  • av

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vision_agents_plugins_qwen-0.6.2.tar.gz (8.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vision_agents_plugins_qwen-0.6.2-py3-none-any.whl (15.7 kB view details)

Uploaded Python 3

File details

Details for the file vision_agents_plugins_qwen-0.6.2.tar.gz.

File metadata

  • Download URL: vision_agents_plugins_qwen-0.6.2.tar.gz
  • Upload date:
  • Size: 8.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.6 {"installer":{"name":"uv","version":"0.10.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for vision_agents_plugins_qwen-0.6.2.tar.gz
Algorithm Hash digest
SHA256 724fedc6250159a5ef6d9efbd2fedd73bc766f1acdd26421ed5ada2c7ed54d51
MD5 4b171d15af0af3266f471c7ffa36d32b
BLAKE2b-256 7486bc9b1122e0c9836a8ac7b1b79a3f3c0d41bb8ac0e79de1d603c8d806abd5

See more details on using hashes here.

File details

Details for the file vision_agents_plugins_qwen-0.6.2-py3-none-any.whl.

File metadata

  • Download URL: vision_agents_plugins_qwen-0.6.2-py3-none-any.whl
  • Upload date:
  • Size: 15.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.6 {"installer":{"name":"uv","version":"0.10.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for vision_agents_plugins_qwen-0.6.2-py3-none-any.whl
Algorithm Hash digest
SHA256 eb64841e7957b67446d7697c6c864e47f1d2f0841eed6388a2a238841e3c92e5
MD5 3b94c5ed335f58c91c78fce3442ca5f3
BLAKE2b-256 11b69100e4faa47f2a0c43d5d6bc2a9affe787e9679915a58aad951a3f751db9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page