Python SDK for building voice agents on the Voice Gateway

These details have not been verified by PyPI

Project description

Voice Agent SDK (Python)

Python SDK for building voice agents on the Voice Gateway.

Installation

pip install voice-agent

Quick Start

import asyncio
from voice_agent import VoiceAgent, VoiceAgentConfig, ConnectionMode

config = VoiceAgentConfig(
    api_key="sk-voice-xxx",
    gateway_url="wss://gateway.example.com",
    mode=ConnectionMode.WEBSOCKET,
)

agent = VoiceAgent(config)

@agent.on_utterance
async def handle_utterance(ctx):
    print(f"User said: {ctx.text}")

    # Stream response
    ctx.send_delta("Hello ")
    ctx.send_delta("World!")
    ctx.done()

@agent.on_interrupt
def handle_interrupt(session_id: str, reason: str):
    print(f"Interrupted: {reason}")

@agent.on_error
def handle_error(error: Exception):
    print(f"Error: {error}")

async def main():
    await agent.connect()
    # Keep running
    while agent.is_connected():
        await asyncio.sleep(1)

asyncio.run(main())

Streaming with LLM

import asyncio
from openai import AsyncOpenAI
from voice_agent import VoiceAgent, VoiceAgentConfig

openai = AsyncOpenAI()

config = VoiceAgentConfig(
    api_key=os.environ["VOICE_API_KEY"],
    gateway_url=os.environ["GATEWAY_URL"],
)

agent = VoiceAgent(config)

@agent.on_utterance
async def handle(ctx):
    stream = await openai.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": ctx.text}],
        stream=True,
    )

    async for chunk in stream:
        if ctx.is_aborted:
            break

        delta = chunk.choices[0].delta.content
        if delta:
            ctx.send_delta(delta)

    ctx.done()

asyncio.run(agent.connect())

Using Vision (Video Frames)

Agents with vision scope can request video frames:

from voice_agent import FrameRequestOptions

@agent.on_utterance
async def handle(ctx):
    # Check if vision context is available
    if ctx.vision and ctx.vision.available:
        print(f"Auto-analyzed: {ctx.vision.description}")

    # Request raw frames for custom analysis
    frames = await ctx.request_frames(FrameRequestOptions(
        limit=5,
        raw_base64=True,
    ))

    if frames.frames:
        for frame in frames.frames:
            # frame.base64 contains the image data
            # frame.timestamp is when it was captured
            pass

    # Or get pre-analyzed descriptions
    analyzed = await ctx.request_frames(FrameRequestOptions(limit=3))
    if analyzed.descriptions:
        print(f"Frame descriptions: {analyzed.descriptions}")

    ctx.done("I can see what you're showing me!")

Using Memory

Agents with memory scope can query stored facts:

from voice_agent import MemoryQueryOptions

@agent.on_utterance
async def handle(ctx):
    # Query relevant memories
    memories = await ctx.query_memory(MemoryQueryOptions(
        query=ctx.text,
        top_k=5,
        threshold=0.7,
        types=["preference", "fact"],
    ))

    if memories.facts:
        context = "\n".join(f.content for f in memories.facts)
        response = await generate_with_context(ctx.text, context)
        ctx.done(response)
    else:
        ctx.done("I don't have any relevant memories about that.")

Handling Interrupts

When the user starts speaking, the gateway sends an interrupt:

@agent.on_utterance
async def handle(ctx):
    async for chunk in stream_response(ctx.text):
        # Check before each operation
        if ctx.is_aborted:
            print("User interrupted, stopping")
            return
        ctx.send_delta(chunk)
    ctx.done()

@agent.on_interrupt
def on_interrupt(session_id: str, reason: str):
    # reason: "new_user_speech" | "lost_arbitration" | "supersede"
    print(f"Session {session_id} interrupted: {reason}")

Configuration

from voice_agent import VoiceAgentConfig, ConnectionMode

config = VoiceAgentConfig(
    api_key="sk-voice-xxx",          # Your API key
    gateway_url="wss://...",          # Gateway WebSocket/HTTP URL
    mode=ConnectionMode.WEBSOCKET,    # WEBSOCKET (default) or SSE
    reconnect=True,                   # Auto-reconnect (default: True)
    reconnect_interval=1.0,           # Base reconnect delay in seconds
    max_reconnect_attempts=None,      # Max attempts (None = unlimited)
)

Context API

The UtteranceContext provides:

Property	Type	Description
`text`	`str`	The user's utterance text
`is_final`	`bool`	Whether this is a final transcript
`user`	`UserInfo \| None`	User info (if profile/email/location scope)
`vision`	`VisionContext \| None`	Vision context (if vision scope)
`session_id`	`str`	Current session ID
`request_id`	`str`	Current request ID
`user_id`	`str \| None`	User ID
`timestamp`	`datetime`	When the utterance was received
`is_aborted`	`bool`	Whether the context was interrupted

Method	Description
`send_delta(delta)`	Stream a text chunk to the user
`done(final_text=None)`	Complete the response
`request_frames(options=None)`	Request video frames (async)
`query_memory(options)`	Query user memories (async)

Connection Modes

WebSocket (recommended)

Full-duplex communication, lower latency:

config = VoiceAgentConfig(
    mode=ConnectionMode.WEBSOCKET,
    # ...
)

Server-Sent Events (SSE)

One-way server push with HTTP POST for sending:

config = VoiceAgentConfig(
    mode=ConnectionMode.SSE,
    # ...
)

License

MIT

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.0.6

Feb 3, 2026

0.0.5

Jan 27, 2026

0.0.4

Jan 27, 2026

0.0.3

Jan 27, 2026

0.0.2

Jan 27, 2026

This version

0.0.1

Jan 26, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hu_sdk-0.0.1.tar.gz (11.9 kB view details)

Uploaded Jan 26, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

hu_sdk-0.0.1-py3-none-any.whl (14.8 kB view details)

Uploaded Jan 26, 2026 Python 3

File details

Details for the file hu_sdk-0.0.1.tar.gz.

File metadata

Download URL: hu_sdk-0.0.1.tar.gz
Upload date: Jan 26, 2026
Size: 11.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.0

File hashes

Hashes for hu_sdk-0.0.1.tar.gz
Algorithm	Hash digest
SHA256	`9f9710dbe67861f3108982de87149247996f2aeb1964d6538a1845d4e584e557`
MD5	`bfdf464439ac6da6a81982933ed1fbb5`
BLAKE2b-256	`a647c9aaa9af511afd66446b9e5b5c140ec195689c1962f0a093937a97eb853a`

See more details on using hashes here.

File details

Details for the file hu_sdk-0.0.1-py3-none-any.whl.

File metadata

Download URL: hu_sdk-0.0.1-py3-none-any.whl
Upload date: Jan 26, 2026
Size: 14.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.0

File hashes

Hashes for hu_sdk-0.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1b384441fe9af4088fb904a9a1a239fa211768a4de2e060c6df6f1fc27e5b82b`
MD5	`8297c9c48360f4cad53ceeab4fb806ae`
BLAKE2b-256	`237f9b2006d9790c2331694e04037ab2a3de593d475e5465a4829107a8de8628`

See more details on using hashes here.

hu-sdk 0.0.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Voice Agent SDK (Python)

Installation

Quick Start

Streaming with LLM

Using Vision (Video Frames)

Using Memory

Handling Interrupts

Configuration

Context API

Connection Modes

WebSocket (recommended)

Server-Sent Events (SSE)

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes