Qwen Omni plugin for vision agents
Project description
Qwen Realtime Plugin for Vision Agents
Qwen3 Realtime LLM integration for Vision Agents framework with native audio output and built-in speech recognition using WebSocket-based realtime communication.
Features
- Native audio output: No TTS service needed - audio comes directly from the model
- Built-in STT: Integrated speech-to-text using
gummy-realtime-v1- no external STT service required - Server-side VAD: Automatic turn detection with configurable silence thresholds
- Video understanding: Optional video frame support for multimodal interactions
- Real-time streaming: WebSocket-based bidirectional communication for low-latency responses
- Interruption handling: Automatic cancellation when user starts speaking
Installation
uv add vision-agents[qwen]
Usage
from vision_agents.core import User, Agent
from vision_agents.plugins import getstream, qwen
agent = Agent(
edge=getstream.Edge(),
agent_user=User(name="Qwen Assistant"),
instructions="Be helpful and friendly",
llm=qwen.Realtime(
model="qwen3-omni-flash-realtime",
voice="Cherry",
fps=1,
),
# No STT or TTS needed - Qwen Realtime provides both
)
Configuration
| Parameter | Description | Default | Accepted Values |
|---|---|---|---|
model |
Qwen Realtime model identifier | "qwen3-omni-flash-realtime" |
Model name string |
api_key |
DashScope API key | None (from env) |
String or None |
base_url |
WebSocket API base URL | "wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime" |
URL string |
voice |
Voice for audio output | "Cherry" |
Voice name string |
fps |
Video frames per second | 1 |
Integer |
include_video |
Include video frames in requests | False |
Boolean |
video_width |
Video frame width | 1280 |
Integer |
video_height |
Video frame height | 720 |
Integer |
Environment Variables
Set DASHSCOPE_API_KEY in your environment or .env file:
DASHSCOPE_API_KEY=your_dashscope_api_key_here
Example
See plugins/qwen/example/qwen_realtime_example.py for a complete working example.
Dependencies
- vision-agents
- websockets
- aiortc
- av
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vision_agents_plugins_qwen-0.3.8.tar.gz.
File metadata
- Download URL: vision_agents_plugins_qwen-0.3.8.tar.gz
- Upload date:
- Size: 8.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4b8d869aea55e7ff52d09f53096a0195a7047734a10c911ca6d305e4d11750f6
|
|
| MD5 |
e4e53712305f4b3a71fdfcff88490715
|
|
| BLAKE2b-256 |
bec647405d7373890acf82126c5954e3d5cd08de725d59e6b65caf78a7bcc481
|
File details
Details for the file vision_agents_plugins_qwen-0.3.8-py3-none-any.whl.
File metadata
- Download URL: vision_agents_plugins_qwen-0.3.8-py3-none-any.whl
- Upload date:
- Size: 16.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
12eb4cbe42a129110c94369228d443c9cfc64cac787793c29dcf2828b4dfde0d
|
|
| MD5 |
2764ae858ba42324a8ad6ce63c5578b3
|
|
| BLAKE2b-256 |
f949e017cdde4a69a408bca0f82245e14f4012023520decdc7ae6fbe9ae15629
|