AWS (Bedrock LLM, Transcribe STT, Polly TTS) integration for Vision Agents

These details have not been verified by PyPI

Project links

Project description

AWS Plugin for Vision Agents

AWS integration for Vision Agents framework with support for standard LLM (Bedrock), realtime with Nova Sonic, text-to-speech (Polly), and streaming speech-to-text (Transcribe).

Installation

uv add "vision-agents[aws]"
# or directly
uv add vision-agents-plugins-aws

Usage

Standard LLM Usage

The AWS plugin supports various Bedrock models including Qwen, Claude, and others. Claude models also support vision/image inputs.

from vision_agents.core import Agent, User
from vision_agents.plugins import aws, getstream, cartesia, deepgram, smart_turn

agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="Friendly AI"),
    instructions="Be nice to the user",
    llm=aws.LLM(
        model="qwen.qwen3-32b-v1:0",
        region_name="us-east-1"
    ),
    tts=cartesia.TTS(),
    stt=deepgram.STT(),
    turn_detection=smart_turn.TurnDetection(buffer_duration=2.0, confidence_threshold=0.5),
)

For vision-capable models like Claude:

llm = aws.LLM(
    model="anthropic.claude-3-haiku-20240307-v1:0",
    region_name="us-east-1"
)

# Send image with text
response = await llm.converse(
    messages=[{
        "role": "user",
        "content": [
            {"image": {"format": "png", "source": {"bytes": image_bytes}}},
            {"text": "What do you see in this image?"}
        ]
    }]
)

Realtime Audio Usage

AWS Nova 2 Sonic provides realtime speech-to-speech capabilities with automatic reconnection logic. The default model is amazon.nova-2-sonic-v1:0.

from vision_agents.core import Agent, User
from vision_agents.plugins import aws, getstream

agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="Story Teller AI"),
    instructions="Tell a story suitable for a 7 year old about a dragon and a princess",
    llm=aws.Realtime(
        model="amazon.nova-2-sonic-v1:0",
        region_name="us-east-1",
        voice_id="matthew"  # See available voices in AWS Nova documentation
    ),
)

The Realtime implementation includes automatic reconnection logic that reconnects after periods of silence or when approaching connection time limits.

See example/aws_realtime_nova_example.py for a complete example.

Text-to-Speech (TTS)

AWS Polly synthesises speech from text and streams the resulting audio. Supports both standard and neural engines, plain-text or SSML input, and Polly lexicons for pronunciation overrides.

from vision_agents.plugins import aws

tts = aws.TTS(
    region_name="us-east-1",
    voice_id="Joanna",       # any Polly voice ID
    engine="neural",         # "standard" | "neural"
    text_type="text",        # "text" | "ssml"
    language_code="en-US",
    lexicon_names=None,      # optional list of Polly lexicons
)

# Use in agent
agent = Agent(
    llm=aws.LLM(model="qwen.qwen3-32b-v1:0"),
    tts=tts,
    # ... other components
)

Credentials follow the standard boto3 chain (env vars, ~/.aws/credentials, SSO, instance profile, etc.). Pass aws_access_key_id + aws_secret_access_key (both required together, plus aws_session_token for temporary credentials from STS / SSO / assumed roles) or aws_profile to override. You may also inject a pre-built boto3 Polly client via client=.... region_name falls back to AWS_REGION / AWS_DEFAULT_REGION and finally us-east-1.

Speech-to-Text (STT)

AWS Transcribe streaming STT converts audio to text in realtime. The connection auto-reconnects with exponential backoff on idle timeouts, audio-length limits, and transient errors.

from vision_agents.plugins import aws

stt = aws.STT(
    language_code="en-US",
    region_name="us-east-1",
    show_speaker_label=False,
    enable_partial_results_stabilization=False,
    partial_results_stability=None,  # "high" | "medium" | "low"
)

# Use in agent
agent = Agent(
    llm=aws.LLM(model="qwen.qwen3-32b-v1:0"),
    stt=stt,
    # ... other components
)

See example/aws_pipeline_example.py for a complete STT - LLM - TTS pipeline using only AWS components.

Function Calling

Standard LLM (aws.LLM)

The standard LLM implementation fully supports function calling. Register functions using the @llm.register_function decorator:

from vision_agents.plugins import aws

llm = aws.LLM(
    model="qwen.qwen3-32b-v1:0",
    region_name="us-east-1"
)


@llm.register_function(
    name="get_weather",
    description="Get the current weather for a given city"
)
async def get_weather(city: str) -> dict:
    """Get weather information for a city."""
    return {
        "city": city,
        "temperature": 72,
        "condition": "Sunny"
    }

Realtime (aws.Realtime)

The Realtime implementation fully supports function calling with AWS Nova 2 Sonic. Register functions using the @llm.register_function decorator:

from vision_agents.plugins import aws

llm = aws.Realtime(
    model="amazon.nova-2-sonic-v1:0",
    region_name="us-east-1",
    voice_id="matthew"
)


@llm.register_function(
    name="get_weather",
    description="Get the current weather for a given city"
)
async def get_weather(city: str) -> dict:
    """Get weather information for a city."""
    return {
        "city": city,
        "temperature": 72,
        "condition": "Sunny"
    }

# The function will be automatically called when the model decides to use it

See example/aws_realtime_function_calling_example.py for a complete example.

Configuration

Environment Variables

Create a .env file with the following variables:

STREAM_API_KEY=your_stream_api_key_here
STREAM_API_SECRET=your_stream_api_secret_here

AWS_BEDROCK_API_KEY=
AWS_ACCESS_KEY_ID=
AWS_SECRET_ACCESS_KEY=
AWS_REGION=us-east-1

CARTESIA_API_KEY=
DEEPGRAM_API_KEY=

Make sure your .env file is configured before running the examples.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.6.4

Jun 11, 2026

This version

0.6.3

Jun 5, 2026

0.6.2

May 26, 2026

0.6.1

May 20, 2026

0.6.0

May 18, 2026

0.5.9

May 15, 2026

0.5.8

May 13, 2026

0.5.7

May 7, 2026

0.5.6

May 5, 2026

0.5.5

Apr 27, 2026

0.5.4

Apr 15, 2026

0.5.3

Apr 14, 2026

0.5.2

Apr 13, 2026

0.5.1

Apr 7, 2026

0.5.0

Apr 1, 2026

0.4.7

Mar 27, 2026

0.4.6

Mar 26, 2026

0.4.5

Mar 25, 2026

0.4.4

Mar 23, 2026

0.4.3

Mar 11, 2026

0.4.2

Mar 10, 2026

0.4.1

Mar 4, 2026

0.4.0

Mar 3, 2026

0.3.8

Feb 24, 2026

0.3.7

Feb 23, 2026

0.3.6

Feb 13, 2026

0.3.5

Feb 10, 2026

0.3.4

Feb 6, 2026

0.3.3

Feb 4, 2026

0.3.2

Jan 27, 2026

0.3.1

Jan 21, 2026

0.3.0

Jan 20, 2026

0.2.10

Jan 14, 2026

0.2.9

Jan 9, 2026

0.2.8

Jan 8, 2026

0.2.7

Jan 6, 2026

0.2.6

Dec 16, 2025

0.2.5

Dec 12, 2025

0.2.4

Dec 12, 2025

0.2.3

Dec 7, 2025

0.2.2

Nov 29, 2025

0.2.1

Nov 21, 2025

0.2.0

Nov 14, 2025

0.1.14

Nov 11, 2025

0.1.13

Nov 3, 2025

0.1.12

Oct 31, 2025

0.1.11

Oct 28, 2025

0.1.9

Oct 22, 2025

0.1.8

Oct 22, 2025

0.1.7

Oct 21, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vision_agents_plugins_aws-0.6.3.tar.gz (23.6 kB view details)

Uploaded Jun 5, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

vision_agents_plugins_aws-0.6.3-py3-none-any.whl (26.5 kB view details)

Uploaded Jun 5, 2026 Python 3

File details

Details for the file vision_agents_plugins_aws-0.6.3.tar.gz.

File metadata

Download URL: vision_agents_plugins_aws-0.6.3.tar.gz
Upload date: Jun 5, 2026
Size: 23.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.10 {"installer":{"name":"uv","version":"0.10.10","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for vision_agents_plugins_aws-0.6.3.tar.gz
Algorithm	Hash digest
SHA256	`910725e84dc37ea9d5d43bd38886a7bda634df7c099fafe7caea80ae88b3f6e1`
MD5	`e2843346441b21602a4af34748248c32`
BLAKE2b-256	`e91d789cfc559a00de90339c84454e3a017dd9a39948e16a0e11632bbb8e26ce`

See more details on using hashes here.

File details

Details for the file vision_agents_plugins_aws-0.6.3-py3-none-any.whl.

File metadata

Download URL: vision_agents_plugins_aws-0.6.3-py3-none-any.whl
Upload date: Jun 5, 2026
Size: 26.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.10 {"installer":{"name":"uv","version":"0.10.10","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for vision_agents_plugins_aws-0.6.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1fce830815e6fdd2335982fd24bdc9250216814c6613cfc545e024eca70c3320`
MD5	`0f3042b1a004e1b97dc35fe3843fe61f`
BLAKE2b-256	`80290254ebfde3ff08ea38f2086b5251921470fc0cb2810ab289d371fbb6a896`

See more details on using hashes here.

vision-agents-plugins-aws 0.6.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

AWS Plugin for Vision Agents

Installation

Usage

Standard LLM Usage

Realtime Audio Usage

Text-to-Speech (TTS)

Speech-to-Text (STT)

Function Calling

Standard LLM (aws.LLM)

Realtime (aws.Realtime)

Configuration

Environment Variables

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes