Skip to main content

Python SDK for building voice agents on the Voice Gateway

Project description

HU SDK (Python)

Python SDK for building voice agents on the Voice Gateway.

Installation

pip install hu-sdk

Quick Start

import asyncio
from voice_agent import VoiceAgent, VoiceAgentConfig, ConnectionMode

config = VoiceAgentConfig(
    api_key="sk-voice-xxx",
    gateway_url="wss://gateway.example.com",
    mode=ConnectionMode.WEBSOCKET,
)

agent = VoiceAgent(config)

@agent.on_utterance
async def handle_utterance(ctx):
    print(f"User said: {ctx.text}")

    # Stream response
    ctx.send_delta("Hello ")
    ctx.send_delta("World!")
    ctx.done()

@agent.on_interrupt
def handle_interrupt(session_id: str, reason: str):
    print(f"Interrupted: {reason}")

@agent.on_error
def handle_error(error: Exception):
    print(f"Error: {error}")

async def main():
    await agent.connect()
    # Keep running
    while agent.is_connected():
        await asyncio.sleep(1)

asyncio.run(main())

Streaming with LLM

import asyncio
from openai import AsyncOpenAI
from voice_agent import VoiceAgent, VoiceAgentConfig

openai = AsyncOpenAI()

config = VoiceAgentConfig(
    api_key=os.environ["VOICE_API_KEY"],
    gateway_url=os.environ["GATEWAY_URL"],
)

agent = VoiceAgent(config)

@agent.on_utterance
async def handle(ctx):
    stream = await openai.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": ctx.text}],
        stream=True,
    )

    async for chunk in stream:
        if ctx.is_aborted:
            break

        delta = chunk.choices[0].delta.content
        if delta:
            ctx.send_delta(delta)

    ctx.done()

asyncio.run(agent.connect())

Using Vision (Video Frames)

Agents with vision scope can request video frames:

from voice_agent import FrameRequestOptions

@agent.on_utterance
async def handle(ctx):
    # Check if vision context is available
    if ctx.vision and ctx.vision.available:
        print(f"Auto-analyzed: {ctx.vision.description}")

    # Request raw frames for custom analysis
    frames = await ctx.request_frames(FrameRequestOptions(
        limit=5,
        raw_base64=True,
    ))

    if frames.frames:
        for frame in frames.frames:
            # frame.base64 contains the image data
            # frame.timestamp is when it was captured
            pass

    # Or get pre-analyzed descriptions
    analyzed = await ctx.request_frames(FrameRequestOptions(limit=3))
    if analyzed.descriptions:
        print(f"Frame descriptions: {analyzed.descriptions}")

    ctx.done("I can see what you're showing me!")

Using Memory

Agents with memory scope can query stored facts:

from voice_agent import MemoryQueryOptions

@agent.on_utterance
async def handle(ctx):
    # Query relevant memories
    memories = await ctx.query_memory(MemoryQueryOptions(
        query=ctx.text,
        top_k=5,
        threshold=0.7,
        types=["preference", "fact"],
    ))

    if memories.facts:
        context = "\n".join(f.content for f in memories.facts)
        response = await generate_with_context(ctx.text, context)
        ctx.done(response)
    else:
        ctx.done("I don't have any relevant memories about that.")

Routing Filters

Agents can register filters to control which utterances are routed to them. Filters are evaluated server-side for efficient routing in multi-agent setups:

await agent.connect()

# Register filters after connecting
agent.register_filters(
    # Match utterances containing these entity types or values
    entities=["PERSON", "John"],
    # Match utterances about these topics
    topics=["weather", "travel"],
    # Match utterances containing these keywords
    keywords=["urgent", "help"],
    # Match specific speakers
    speakers=["user"],
    # Number of previous utterances to include for context (used with "filtered" tier)
    include_context=5,
    # Data access tier - controls what data the agent receives:
    # - "full": everything (whole conversation stream)
    # - "filtered": matching messages + context window (default)
    # - "summary": just {entities, topics} - no text
    tier="filtered",
)

Filters can be updated at any time while connected. The gateway will apply the new filters to subsequent utterances.

Handling Interrupts

When the user starts speaking, the gateway sends an interrupt:

@agent.on_utterance
async def handle(ctx):
    async for chunk in stream_response(ctx.text):
        # Check before each operation
        if ctx.is_aborted:
            print("User interrupted, stopping")
            return
        ctx.send_delta(chunk)
    ctx.done()

@agent.on_interrupt
def on_interrupt(session_id: str, reason: str):
    # reason: "new_user_speech" | "lost_arbitration" | "supersede"
    print(f"Session {session_id} interrupted: {reason}")

Configuration

from voice_agent import VoiceAgentConfig, ConnectionMode

config = VoiceAgentConfig(
    api_key="sk-voice-xxx",          # Your API key
    gateway_url="wss://...",          # Gateway WebSocket/HTTP URL
    mode=ConnectionMode.WEBSOCKET,    # WEBSOCKET (default) or SSE
    reconnect=True,                   # Auto-reconnect (default: True)
    reconnect_interval=1.0,           # Base reconnect delay in seconds
    max_reconnect_attempts=None,      # Max attempts (None = unlimited)
)

Context API

The UtteranceContext provides:

Property Type Description
text str The user's utterance text
is_final bool Whether this is a final transcript
user UserInfo | None User info (if profile/email/location scope)
vision VisionContext | None Vision context (if vision scope)
entities list[EntityInfo] Entities extracted from the utterance (NER)
topics list[str] Topics detected in the utterance
context list[ContextUtterance] Previous utterances (if include_context filter set)
session_id str Current session ID
request_id str Current request ID
user_id str | None User ID
timestamp datetime When the utterance was received
is_aborted bool Whether the context was interrupted
Method Description
send_delta(delta) Stream a text chunk to the user
done(final_text=None) Complete the response
request_frames(options=None) Request video frames (async)
query_memory(options) Query user memories (async)

Connection Modes

WebSocket (recommended)

Full-duplex communication, lower latency:

config = VoiceAgentConfig(
    mode=ConnectionMode.WEBSOCKET,
    # ...
)

Server-Sent Events (SSE)

One-way server push with HTTP POST for sending:

config = VoiceAgentConfig(
    mode=ConnectionMode.SSE,
    # ...
)

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hu_sdk-0.0.6.tar.gz (123.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hu_sdk-0.0.6-py3-none-any.whl (18.8 kB view details)

Uploaded Python 3

File details

Details for the file hu_sdk-0.0.6.tar.gz.

File metadata

  • Download URL: hu_sdk-0.0.6.tar.gz
  • Upload date:
  • Size: 123.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.0

File hashes

Hashes for hu_sdk-0.0.6.tar.gz
Algorithm Hash digest
SHA256 29e46a91e8604cbd101670b8d13e8dca202a5c5340265eb20b870b42ea44fbd0
MD5 f3cf28bb9cb99199fe47ac9d76d2a0c5
BLAKE2b-256 9129f8dfaa0a51bf71f581d518fc0792c72233d36b657c5ecfc4f4457902370e

See more details on using hashes here.

File details

Details for the file hu_sdk-0.0.6-py3-none-any.whl.

File metadata

  • Download URL: hu_sdk-0.0.6-py3-none-any.whl
  • Upload date:
  • Size: 18.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.0

File hashes

Hashes for hu_sdk-0.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 3f9ca491f3cd9427c5eed92ab8a3b3f2c8cc3e9c735c34bfbdc075f8f6cb6435
MD5 bf66e7e3f83af8f01d90893ae2385b63
BLAKE2b-256 ac31e9c1662117a0cacd81dcb003a234ba97b54f33c05e688674e0487e12e241

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page