Skip to main content

OpenAI plugin for vision agents

Project description

OpenAI Plugin for Vision Agents

OpenAI LLM integration for Vision Agents framework with support for both standard and realtime interactions.

It enables features such as:

  • Real-time transcription and language processing using OpenAI models
  • Easy integration with other Vision Agents plugins and services
  • Function calling capabilities for dynamic interactions

Installation

uv add "vision-agents[openai]"
# or directly
uv add vision-agents-plugins-openai

Usage

Standard LLM

This example shows how to use "gpt-4.1" model with TTS and STT services for audio communication via openai.LLM() API.

The openai.LLM() class uses OpenAI's Responses API under the hood.

To work with models via legacy Chat Completions API, see the Chat Completions models section.

from vision_agents.core import User, Agent
from vision_agents.core.agents import AgentLauncher
from vision_agents.plugins import deepgram, getstream, cartesia, smart_turn, openai

agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="Friendly AI"),
    instructions="Be nice to the user",
    llm=openai.LLM("gpt-4.1"),
    tts=cartesia.TTS(),
    stt=deepgram.STT(),
    turn_detection=smart_turn.TurnDetection(),
)

Realtime LLM

Realtime audio and video communication is also supported via Realtime class. In this mode, the model handles audio and video processing directly without the need for TTS and STT services.

from vision_agents.core import User, Agent
from vision_agents.plugins import getstream, openai

agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="Friendly AI"),
    instructions="Be nice to the user",
    llm=openai.Realtime(),
)

Chat Completions models

The openai.ChatCompletionsLLM and openai.ChatCompletionsVLM classes provide APIs for text and vision models that use the Chat Completions API.

They are compatible with popular inference backends such as vLLM, TGI, and Ollama.

For example, you can use them to interact with Qwen 3 VL visual model hosted on Baseten:

from vision_agents.core import User, Agent
from vision_agents.plugins import deepgram, getstream, elevenlabs, vogent, openai

# Instantiate the visual model wrapper
llm = openai.ChatCompletionsVLM(model="qwen3vl")

# Create an agent with video understanding capabilities
agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="Video Assistant", id="agent"),
    instructions="You're a helpful video AI assistant. Analyze the video frames and respond to user questions about what you see.",
    llm=llm,
    stt=deepgram.STT(),
    tts=elevenlabs.TTS(),
    turn_detection=vogent.TurnDetection(),
    processors=[],
)

For full code, see examples/qwen_vl_example.

Function Calling

The LLM and Realtime APIs support function calling, allowing the assistant to invoke custom functions you define.

This enables dynamic interactions like:

  • Database queries
  • API calls to external services
  • File operations
  • Custom business logic
from vision_agents.plugins import openai

llm = openai.LLM("gpt-4.1")
# Or use openai.Realtime() for realtime model



@llm.register_function(
    name="get_weather",
    description="Get the current weather for a given city"
)
async def get_weather(city: str) -> dict:
    """Get weather information for a city."""
    return {
        "city": city,
        "temperature": 72,
        "condition": "Sunny"
    }
# The function will be automatically called when the model decides to use it

Requirements

  • Python 3.10+
  • GetStream account for video calls
  • Open AI API key

Links

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vision_agents_plugins_openai-0.5.8.tar.gz (36.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vision_agents_plugins_openai-0.5.8-py3-none-any.whl (52.9 kB view details)

Uploaded Python 3

File details

Details for the file vision_agents_plugins_openai-0.5.8.tar.gz.

File metadata

  • Download URL: vision_agents_plugins_openai-0.5.8.tar.gz
  • Upload date:
  • Size: 36.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.6 {"installer":{"name":"uv","version":"0.10.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for vision_agents_plugins_openai-0.5.8.tar.gz
Algorithm Hash digest
SHA256 092a2aa6d9c211dca8816fff2b279f16f768c7a333a31a3f7cf73c9ce285f293
MD5 b676024cb817a1f41b6c828f966909a2
BLAKE2b-256 b53f0f7cc6a1f041786fffbedcf98686f0a96fae3c5cc6ec7c2a890978a28853

See more details on using hashes here.

File details

Details for the file vision_agents_plugins_openai-0.5.8-py3-none-any.whl.

File metadata

  • Download URL: vision_agents_plugins_openai-0.5.8-py3-none-any.whl
  • Upload date:
  • Size: 52.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.6 {"installer":{"name":"uv","version":"0.10.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for vision_agents_plugins_openai-0.5.8-py3-none-any.whl
Algorithm Hash digest
SHA256 62060cf4d2bcb422dc3fb48b246ca13df757481a9f33a81f7ae8c1cc477fec5a
MD5 8aeee9d1eeac817534d9ff353a2b0c69
BLAKE2b-256 f85f64f71406f08a1108265ca9826e6282aaf9b6e17404e0229f8b1ad4913ece

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page