AWS (Bedrock) LLM integration for Vision Agents
Project description
AWS Plugin for Vision Agents
AWS (Bedrock) integration for Vision Agents framework with support for standard LLM, realtime with Nova Sonic, and text-to-speech with automatic session resumption.
Installation
uv add vision-agents[aws]
Usage
Standard LLM Usage
The AWS plugin supports various Bedrock models including Qwen, Claude, and others. Claude models also support vision/image inputs.
from vision_agents.core import Agent, User
from vision_agents.plugins import aws, getstream, cartesia, deepgram, smart_turn
agent = Agent(
edge=getstream.Edge(),
agent_user=User(name="Friendly AI"),
instructions="Be nice to the user",
llm=aws.LLM(
model="qwen.qwen3-32b-v1:0",
region_name="us-east-1"
),
tts=cartesia.TTS(),
stt=deepgram.STT(),
turn_detection=smart_turn.TurnDetection(buffer_duration=2.0, confidence_threshold=0.5),
)
For vision-capable models like Claude:
llm = aws.LLM(
model="anthropic.claude-3-haiku-20240307-v1:0",
region_name="us-east-1"
)
# Send image with text
response = await llm.converse(
messages=[{
"role": "user",
"content": [
{"image": {"format": "png", "source": {"bytes": image_bytes}}},
{"text": "What do you see in this image?"}
]
}]
)
Realtime Audio Usage
AWS Nova 2 Sonic provides realtime speech-to-speech capabilities with automatic reconnection logic. The default model is amazon.nova-2-sonic-v1:0.
from vision_agents.core import Agent, User
from vision_agents.plugins import aws, getstream
agent = Agent(
edge=getstream.Edge(),
agent_user=User(name="Story Teller AI"),
instructions="Tell a story suitable for a 7 year old about a dragon and a princess",
llm=aws.Realtime(
model="amazon.nova-2-sonic-v1:0",
region_name="us-east-1",
voice_id="matthew" # See available voices in AWS Nova documentation
),
)
The Realtime implementation includes automatic reconnection logic that reconnects after periods of silence or when approaching connection time limits.
See example/aws_realtime_nova_example.py for a complete example.
Text-to-Speech (TTS)
AWS Polly TTS is available for converting text to speech:
from vision_agents.plugins import aws
tts = aws.TTS(
region_name="us-east-1",
voice_id="Joanna", # AWS Polly voice ID
engine="neural", # 'standard' or 'neural'
text_type="text", # 'text' or 'ssml'
language_code="en-US"
)
# Use in agent
agent = Agent(
llm=aws.LLM(model="qwen.qwen3-32b-v1:0"),
tts=tts,
# ... other components
)
Function Calling
Standard LLM (aws.LLM)
The standard LLM implementation fully supports function calling. Register functions using the @llm.register_function decorator:
from vision_agents.plugins import aws
llm = aws.LLM(
model="qwen.qwen3-32b-v1:0",
region_name="us-east-1"
)
@llm.register_function(
name="get_weather",
description="Get the current weather for a given city"
)
def get_weather(city: str) -> dict:
"""Get weather information for a city."""
return {
"city": city,
"temperature": 72,
"condition": "Sunny"
}
Realtime (aws.Realtime)
The Realtime implementation fully supports function calling with AWS Nova 2 Sonic. Register functions using the @llm.register_function decorator:
from vision_agents.plugins import aws
llm = aws.Realtime(
model="amazon.nova-2-sonic-v1:0",
region_name="us-east-1",
voice_id="matthew"
)
@llm.register_function(
name="get_weather",
description="Get the current weather for a given city"
)
def get_weather(city: str) -> dict:
"""Get weather information for a city."""
return {
"city": city,
"temperature": 72,
"condition": "Sunny"
}
# The function will be automatically called when the model decides to use it
See example/aws_realtime_function_calling_example.py for a complete example.
Configuration
Environment Variables
Create a .env file with the following variables:
STREAM_API_KEY=your_stream_api_key_here
STREAM_API_SECRET=your_stream_api_secret_here
AWS_BEDROCK_API_KEY=
AWS_ACCESS_KEY_ID=
AWS_SECRET_ACCESS_KEY=
AWS_REGION=us-east-1
CARTESIA_API_KEY=
DEEPGRAM_API_KEY=
Make sure your .env file is configured before running the examples.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vision_agents_plugins_aws-0.3.2.tar.gz.
File metadata
- Download URL: vision_agents_plugins_aws-0.3.2.tar.gz
- Upload date:
- Size: 17.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.18 {"installer":{"name":"uv","version":"0.9.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a71af61be97343e1fce60ff80d059126b46fa743b1f558d3e8d2fe348f0840ab
|
|
| MD5 |
3aac5a9b25e44e0d4334bf15b17b4e08
|
|
| BLAKE2b-256 |
e035acd079972683bd03ee81382aa997b2a3e044bd7be71817096930bafeb3e0
|
File details
Details for the file vision_agents_plugins_aws-0.3.2-py3-none-any.whl.
File metadata
- Download URL: vision_agents_plugins_aws-0.3.2-py3-none-any.whl
- Upload date:
- Size: 24.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.9.27 {"installer":{"name":"uv","version":"0.9.27","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1194b2e7f617cb7f2a3f933d51b1e23de061d97a82c39216087d3406bf0916c6
|
|
| MD5 |
8388c846593298f271842556a90c9f36
|
|
| BLAKE2b-256 |
2788c3c0a48fc90dd05ecd0b26863cd18f2536121d3a415fceee18bee2f0ceab
|