Skip to main content

AWS (Bedrock) LLM integration for Vision Agents

Project description

AWS Plugin for Vision Agents

AWS (Bedrock) integration for Vision Agents framework with support for standard LLM, realtime with Nova Sonic, and text-to-speech with automatic session resumption.

Installation

uv add vision-agents[aws]

Usage

Standard LLM Usage

The AWS plugin supports various Bedrock models including Qwen, Claude, and others. Claude models also support vision/image inputs.

from vision_agents.core import Agent, User
from vision_agents.plugins import aws, getstream, cartesia, deepgram, smart_turn

agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="Friendly AI"),
    instructions="Be nice to the user",
    llm=aws.LLM(
        model="qwen.qwen3-32b-v1:0",
        region_name="us-east-1"
    ),
    tts=cartesia.TTS(),
    stt=deepgram.STT(),
    turn_detection=smart_turn.TurnDetection(buffer_duration=2.0, confidence_threshold=0.5),
)

For vision-capable models like Claude:

llm = aws.LLM(
    model="anthropic.claude-3-haiku-20240307-v1:0",
    region_name="us-east-1"
)

# Send image with text
response = await llm.converse(
    messages=[{
        "role": "user",
        "content": [
            {"image": {"format": "png", "source": {"bytes": image_bytes}}},
            {"text": "What do you see in this image?"}
        ]
    }]
)

Realtime Audio Usage

AWS Nova 2 Sonic provides realtime speech-to-speech capabilities with automatic reconnection logic. The default model is amazon.nova-2-sonic-v1:0.

from vision_agents.core import Agent, User
from vision_agents.plugins import aws, getstream

agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="Story Teller AI"),
    instructions="Tell a story suitable for a 7 year old about a dragon and a princess",
    llm=aws.Realtime(
        model="amazon.nova-2-sonic-v1:0",
        region_name="us-east-1",
        voice_id="matthew"  # See available voices in AWS Nova documentation
    ),
)

The Realtime implementation includes automatic reconnection logic that reconnects after periods of silence or when approaching connection time limits.

See example/aws_realtime_nova_example.py for a complete example.

Text-to-Speech (TTS)

AWS Polly TTS is available for converting text to speech:

from vision_agents.plugins import aws

tts = aws.TTS(
    region_name="us-east-1",
    voice_id="Joanna",  # AWS Polly voice ID
    engine="neural",  # 'standard' or 'neural'
    text_type="text",  # 'text' or 'ssml'
    language_code="en-US"
)

# Use in agent
agent = Agent(
    llm=aws.LLM(model="qwen.qwen3-32b-v1:0"),
    tts=tts,
    # ... other components
)

Function Calling

Standard LLM (aws.LLM)

The standard LLM implementation fully supports function calling. Register functions using the @llm.register_function decorator:

from vision_agents.plugins import aws

llm = aws.LLM(
    model="qwen.qwen3-32b-v1:0",
    region_name="us-east-1"
)

@llm.register_function(
    name="get_weather",
    description="Get the current weather for a given city"
)
def get_weather(city: str) -> dict:
    """Get weather information for a city."""
    return {
        "city": city,
        "temperature": 72,
        "condition": "Sunny"
    }

Realtime (aws.Realtime)

The Realtime implementation fully supports function calling with AWS Nova 2 Sonic. Register functions using the @llm.register_function decorator:

from vision_agents.plugins import aws

llm = aws.Realtime(
    model="amazon.nova-2-sonic-v1:0",
    region_name="us-east-1",
    voice_id="matthew"
)

@llm.register_function(
    name="get_weather",
    description="Get the current weather for a given city"
)
def get_weather(city: str) -> dict:
    """Get weather information for a city."""
    return {
        "city": city,
        "temperature": 72,
        "condition": "Sunny"
    }

# The function will be automatically called when the model decides to use it

See example/aws_realtime_function_calling_example.py for a complete example.

Configuration

Environment Variables

Create a .env file with the following variables:

STREAM_API_KEY=your_stream_api_key_here
STREAM_API_SECRET=your_stream_api_secret_here

AWS_BEDROCK_API_KEY=
AWS_ACCESS_KEY_ID=
AWS_SECRET_ACCESS_KEY=
AWS_REGION=us-east-1

CARTESIA_API_KEY=
DEEPGRAM_API_KEY=

Make sure your .env file is configured before running the examples.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vision_agents_plugins_aws-0.2.8.tar.gz (17.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vision_agents_plugins_aws-0.2.8-py3-none-any.whl (24.4 kB view details)

Uploaded Python 3

File details

Details for the file vision_agents_plugins_aws-0.2.8.tar.gz.

File metadata

  • Download URL: vision_agents_plugins_aws-0.2.8.tar.gz
  • Upload date:
  • Size: 17.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.22 {"installer":{"name":"uv","version":"0.9.22","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for vision_agents_plugins_aws-0.2.8.tar.gz
Algorithm Hash digest
SHA256 6e7496866e3a2d52403ad81b76ac9638a81b9c665d35074cf3bbd3f269c578a7
MD5 cffd2300d1e5ea9963c4a6a28d1397b0
BLAKE2b-256 3c4364df8cc00ac2b614f5ee851c02086107372b5a8af0b1e51b2e228e7ff353

See more details on using hashes here.

File details

Details for the file vision_agents_plugins_aws-0.2.8-py3-none-any.whl.

File metadata

  • Download URL: vision_agents_plugins_aws-0.2.8-py3-none-any.whl
  • Upload date:
  • Size: 24.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.22 {"installer":{"name":"uv","version":"0.9.22","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for vision_agents_plugins_aws-0.2.8-py3-none-any.whl
Algorithm Hash digest
SHA256 98557521ba1e9f99c6f94b8afbb479953b05044f4f3e07011a14859cdb0941e3
MD5 d1237922005ac215e024604b293cb619
BLAKE2b-256 3ddbef5ba6217a0593fb5ca74a996c0568a7d04ef92dc7041c9f7bf8e133b5e7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page