Inworld AI integration for Vision Agents (TTS + Realtime WebRTC)
Project description
Inworld AI Plugin
Inworld AI integration for Vision Agents. Provides both text-to-speech and a WebRTC-based Realtime speech-to-speech conversational API.
Installation
uv add "vision-agents[inworld]"
# or directly
uv add vision-agents-plugins-inworld
Get your API key from the Inworld Portal and set
INWORLD_API_KEY in your environment (or pass api_key= explicitly).
TTS
High-quality text-to-speech with streaming support. The plugin now defaults
to Inworld's TTS-2 model (currently in research preview), which adds
natural-language steering, 100+ languages (15 GA, 90+ experimental), and
high-quality instant voice cloning over the previous inworld-tts-1.5-*
generation.
from vision_agents.plugins import inworld
# Defaults to model_id="inworld-tts-2", voice_id="Sarah"
tts = inworld.TTS()
# Or specify explicitly
tts = inworld.TTS(
api_key="your_inworld_api_key",
voice_id="Ashley",
model_id="inworld-tts-2",
temperature=1.1,
)
TTS options
api_key: Inworld AI API key (default: reads fromINWORLD_API_KEY)voice_id: Voice to use (default:"Sarah";"Dennis","Ashley","Olivia","Clive"and custom/cloned voices also supported)model_id:"inworld-tts-2"(default),"inworld-tts-1.5-max","inworld-tts-1.5-mini"."inworld-tts-1"and"inworld-tts-1-max"are deprecated by Inworld — migrate toinworld-tts-2orinworld-tts-1.5-*.temperature: 0–2 (default: 1.1)
The plugin requests LINEAR16 (16-bit PCM WAV) chunks from Inworld so each
streamed chunk is self-contained and decodes cleanly under streaming TTS;
no extra configuration needed.
Steering (TTS-2)
TTS-2 takes natural-language stage directions inline with your text. Place the instruction in square brackets before the segment it should apply to:
text = (
"[whisper in a hushed style] I have to tell you something. "
"[laugh] Just kidding! [say with force] Now let's get to work."
)
async for chunk in await tts.stream_audio(text):
...
Steering covers articulation, intonation, volume, pitch, range, speed, and
vocal style — and supports non-verbal sounds like [laugh], [breathe],
[clear throat], [sigh], [cough], [yawn]. Combining dimensions
([whisper in a hushed style], [say playfully and very fast]) produces
better results than bare single-word tags. See Inworld's
steering docs and
prompting guide
for the full reference.
Agent example
A complete example wiring inworld.TTS() into a Stream-edge agent with
Deepgram STT, Gemini LLM, and smart-turn detection lives at
example/inworld_tts_example.py. The
companion example/inworld-audio-guide.md
is loaded as the agent's system prompt and teaches the LLM how to emit
TTS-2 steering tags so replies sound expressive out of the box.
Realtime (WebRTC)
Low-latency speech-to-speech via Inworld's Realtime API. This transport uses
WebRTC (UDP, native Opus) for lower latency than the WebSocket alternative.
Requires a WebRTC-capable edge transport — pair with getstream.Edge() as
shown below.
from vision_agents.core import Agent, User
from vision_agents.plugins import getstream, inworld, smart_turn
agent = Agent(
edge=getstream.Edge(),
agent_user=User(name="My Agent", id="agent"),
llm=inworld.Realtime(
model="openai/gpt-4o-mini",
voice="Dennis",
instructions="You are a friendly voice assistant.",
),
turn_detection=smart_turn.TurnDetection(),
)
Realtime options
model: provider-prefixed model ID. Examples:"openai/gpt-4o-mini"(default),"google-ai-studio/gemini-2.5-flash","inworld/<router-id>"for an Inworld routervoice: voice for audio responses (default:"Dennis";"Clive","Olivia"and custom voices also supported)api_key: Inworld AI API key (default: reads fromINWORLD_API_KEY)instructions: system promptrealtime_session: advanced — pass a fullRealtimeSessionCreateRequestParamfor session fields not exposed by the primary args (custom turn-detection,tool_choice, etc.)
Registering tools
realtime = inworld.Realtime()
@realtime.register_function(description="Get the current weather for a city.")
async def get_weather(city: str) -> str:
return f"It's sunny in {city}."
Tools follow the OpenAI function-calling schema. Inworld's Realtime API is
protocol-compatible with OpenAI's Realtime API, so registered functions flow
through the same response.function_call_arguments.done path.
Notes
- v1 is WebRTC only; a WebSocket transport may be added later.
- Video input is not currently supported by Inworld's Realtime API.
Requirements
- Python 3.10+
httpx>=0.28,av>=10,aiortc>=1.9,openai[realtime]>=2.26,<3
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vision_agents_plugins_inworld-0.5.6.tar.gz.
File metadata
- Download URL: vision_agents_plugins_inworld-0.5.6.tar.gz
- Upload date:
- Size: 13.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.10 {"installer":{"name":"uv","version":"0.10.10","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
284e28f3a6dedd290176085c6ca8ee5532934bc717a66b0116ae23347df2ed30
|
|
| MD5 |
4dd309cea8d98d230eac0755073f772f
|
|
| BLAKE2b-256 |
4046686c318712f227425d53c3649db38d1aa7e8647d2dfcb38120165ac5e132
|
File details
Details for the file vision_agents_plugins_inworld-0.5.6-py3-none-any.whl.
File metadata
- Download URL: vision_agents_plugins_inworld-0.5.6-py3-none-any.whl
- Upload date:
- Size: 35.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.10 {"installer":{"name":"uv","version":"0.10.10","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
896e6c3dca0d1a70668d11f76b9184182207512168b6d9f6b156bb8ef60a9bc9
|
|
| MD5 |
ef09e824345a7b82e2aecba322252fe1
|
|
| BLAKE2b-256 |
da2f36d26ef7debc27be8a0cdcdd0c6280d7c1601358d2d933e6e3e3acc98c51
|