OpenAI plugin for vision agents
Project description
OpenAI Plugin for Vision Agents
OpenAI LLM integration for Vision Agents framework with support for both standard and realtime interactions.
It enables features such as:
- Real-time transcription and language processing using OpenAI models
- Easy integration with other Vision Agents plugins and services
- Function calling capabilities for dynamic interactions
Installation
pip install vision-agents[openai]
Usage
Standard LLM
This example shows how to use "gpt-4.1" model with TTS and STT services for audio communication via openai.LLM() API.
The openai.LLM() class uses OpenAI's Responses API under the hood.
To work with models via legacy Chat Completions API, see the Chat Completions models section.
from vision_agents.core import User, Agent
from vision_agents.core.agents import AgentLauncher
from vision_agents.plugins import deepgram, getstream, cartesia, smart_turn, openai
agent = Agent(
edge=getstream.Edge(),
agent_user=User(name="Friendly AI"),
instructions="Be nice to the user",
llm=openai.LLM("gpt-4.1"),
tts=cartesia.TTS(),
stt=deepgram.STT(),
turn_detection=smart_turn.TurnDetection(),
)
Realtime LLM
Realtime audio and video communication is also supported via Realtime class.
In this mode, the model handles audio and video processing directly without the need for TTS and STT services.
from vision_agents.core import User, Agent
from vision_agents.plugins import getstream, openai
agent = Agent(
edge=getstream.Edge(),
agent_user=User(name="Friendly AI"),
instructions="Be nice to the user",
llm=openai.Realtime(),
)
Chat Completions models
The openai.ChatCompletionsLLM and openai.ChatCompletionsVLM classes provide APIs for text and vision models that use the Chat Completions API.
They are compatible with popular inference backends such as vLLM, TGI, and Ollama.
For example, you can use them to interact with Qwen 3 VL visual model hosted on Baseten:
from vision_agents.core import User, Agent
from vision_agents.plugins import deepgram, getstream, elevenlabs, vogent, openai
# Instantiate the visual model wrapper
llm = openai.ChatCompletionsVLM(model="qwen3vl")
# Create an agent with video understanding capabilities
agent = Agent(
edge=getstream.Edge(),
agent_user=User(name="Video Assistant", id="agent"),
instructions="You're a helpful video AI assistant. Analyze the video frames and respond to user questions about what you see.",
llm=llm,
stt=deepgram.STT(),
tts=elevenlabs.TTS(),
turn_detection=vogent.TurnDetection(),
processors=[],
)
For full code, see examples/qwen_vl_example.
Function Calling
The LLM and Realtime APIs support function calling, allowing the assistant to invoke custom functions you define.
This enables dynamic interactions like:
- Database queries
- API calls to external services
- File operations
- Custom business logic
from vision_agents.plugins import openai
llm = openai.LLM("gpt-4.1")
# Or use openai.Realtime() for realtime model
@llm.register_function(
name="get_weather",
description="Get the current weather for a given city"
)
def get_weather(city: str) -> dict:
"""Get weather information for a city."""
return {
"city": city,
"temperature": 72,
"condition": "Sunny"
}
# The function will be automatically called when the model decides to use it
Requirements
- Python 3.10+
- GetStream account for video calls
- Open AI API key
Links
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vision_agents_plugins_openai-0.3.4.tar.gz.
File metadata
- Download URL: vision_agents_plugins_openai-0.3.4.tar.gz
- Upload date:
- Size: 30.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3b3bac46d91efbd1ea0b4e3f89d5511032886a976db138d4c8d33a0ce4c1be90
|
|
| MD5 |
7c4f144c35de3a450a29365837bf7990
|
|
| BLAKE2b-256 |
52abb3fd899042fafa0bc3481fb8e6fbc5731db41ae2f208a05a01fd96a05cf4
|
File details
Details for the file vision_agents_plugins_openai-0.3.4-py3-none-any.whl.
File metadata
- Download URL: vision_agents_plugins_openai-0.3.4-py3-none-any.whl
- Upload date:
- Size: 45.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e23e7fe400c507561de2cda47ac648a200ce9c62269088442284c789a05a30bf
|
|
| MD5 |
5a3282f036ae0eac92bacfb0324011a0
|
|
| BLAKE2b-256 |
86f10b0fe04a0a41338277b2f5690c76cb07b2d51e6f0063b61f382ab9c64421
|