Python SDK for building voice agents on the Voice Gateway
Project description
HU SDK (Python)
Python SDK for building voice agents on the Voice Gateway.
Installation
pip install hu-sdk
Quick Start
import asyncio
from voice_agent import VoiceAgent, VoiceAgentConfig, ConnectionMode
config = VoiceAgentConfig(
api_key="sk-voice-xxx",
gateway_url="wss://gateway.example.com",
mode=ConnectionMode.WEBSOCKET,
)
agent = VoiceAgent(config)
@agent.on_utterance
async def handle_utterance(ctx):
print(f"User said: {ctx.text}")
# Stream response
ctx.send_delta("Hello ")
ctx.send_delta("World!")
ctx.done()
@agent.on_interrupt
def handle_interrupt(session_id: str, reason: str):
print(f"Interrupted: {reason}")
@agent.on_error
def handle_error(error: Exception):
print(f"Error: {error}")
async def main():
await agent.connect()
# Keep running
while agent.is_connected():
await asyncio.sleep(1)
asyncio.run(main())
Streaming with LLM
import asyncio
from openai import AsyncOpenAI
from voice_agent import VoiceAgent, VoiceAgentConfig
openai = AsyncOpenAI()
config = VoiceAgentConfig(
api_key=os.environ["VOICE_API_KEY"],
gateway_url=os.environ["GATEWAY_URL"],
)
agent = VoiceAgent(config)
@agent.on_utterance
async def handle(ctx):
stream = await openai.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": ctx.text}],
stream=True,
)
async for chunk in stream:
if ctx.is_aborted:
break
delta = chunk.choices[0].delta.content
if delta:
ctx.send_delta(delta)
ctx.done()
asyncio.run(agent.connect())
Using Vision (Video Frames)
Agents with vision scope can request video frames:
from voice_agent import FrameRequestOptions
@agent.on_utterance
async def handle(ctx):
# Check if vision context is available
if ctx.vision and ctx.vision.available:
print(f"Auto-analyzed: {ctx.vision.description}")
# Request raw frames for custom analysis
frames = await ctx.request_frames(FrameRequestOptions(
limit=5,
raw_base64=True,
))
if frames.frames:
for frame in frames.frames:
# frame.base64 contains the image data
# frame.timestamp is when it was captured
pass
# Or get pre-analyzed descriptions
analyzed = await ctx.request_frames(FrameRequestOptions(limit=3))
if analyzed.descriptions:
print(f"Frame descriptions: {analyzed.descriptions}")
ctx.done("I can see what you're showing me!")
Using Memory
Agents with memory scope can query stored facts:
from voice_agent import MemoryQueryOptions
@agent.on_utterance
async def handle(ctx):
# Query relevant memories
memories = await ctx.query_memory(MemoryQueryOptions(
query=ctx.text,
top_k=5,
threshold=0.7,
types=["preference", "fact"],
))
if memories.facts:
context = "\n".join(f.content for f in memories.facts)
response = await generate_with_context(ctx.text, context)
ctx.done(response)
else:
ctx.done("I don't have any relevant memories about that.")
Handling Interrupts
When the user starts speaking, the gateway sends an interrupt:
@agent.on_utterance
async def handle(ctx):
async for chunk in stream_response(ctx.text):
# Check before each operation
if ctx.is_aborted:
print("User interrupted, stopping")
return
ctx.send_delta(chunk)
ctx.done()
@agent.on_interrupt
def on_interrupt(session_id: str, reason: str):
# reason: "new_user_speech" | "lost_arbitration" | "supersede"
print(f"Session {session_id} interrupted: {reason}")
Configuration
from voice_agent import VoiceAgentConfig, ConnectionMode
config = VoiceAgentConfig(
api_key="sk-voice-xxx", # Your API key
gateway_url="wss://...", # Gateway WebSocket/HTTP URL
mode=ConnectionMode.WEBSOCKET, # WEBSOCKET (default) or SSE
reconnect=True, # Auto-reconnect (default: True)
reconnect_interval=1.0, # Base reconnect delay in seconds
max_reconnect_attempts=None, # Max attempts (None = unlimited)
)
Context API
The UtteranceContext provides:
| Property | Type | Description |
|---|---|---|
text |
str |
The user's utterance text |
is_final |
bool |
Whether this is a final transcript |
user |
UserInfo | None |
User info (if profile/email/location scope) |
vision |
VisionContext | None |
Vision context (if vision scope) |
session_id |
str |
Current session ID |
request_id |
str |
Current request ID |
user_id |
str | None |
User ID |
timestamp |
datetime |
When the utterance was received |
is_aborted |
bool |
Whether the context was interrupted |
| Method | Description |
|---|---|
send_delta(delta) |
Stream a text chunk to the user |
done(final_text=None) |
Complete the response |
request_frames(options=None) |
Request video frames (async) |
query_memory(options) |
Query user memories (async) |
Connection Modes
WebSocket (recommended)
Full-duplex communication, lower latency:
config = VoiceAgentConfig(
mode=ConnectionMode.WEBSOCKET,
# ...
)
Server-Sent Events (SSE)
One-way server push with HTTP POST for sending:
config = VoiceAgentConfig(
mode=ConnectionMode.SSE,
# ...
)
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file hu_sdk-0.0.2.tar.gz.
File metadata
- Download URL: hu_sdk-0.0.2.tar.gz
- Upload date:
- Size: 109.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
04949933fe254dc7c07424ee4c78cde269afc609d1a7b6ff1f7a7f7e112267d5
|
|
| MD5 |
047fc103101b19a78706e4968b676fcb
|
|
| BLAKE2b-256 |
9c340dcf7bedc1e07e31d1a42b758f8156be42f1fc75dc3493cf9af23983d657
|
File details
Details for the file hu_sdk-0.0.2-py3-none-any.whl.
File metadata
- Download URL: hu_sdk-0.0.2-py3-none-any.whl
- Upload date:
- Size: 16.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fca96a5742b52fc86b1b042a51a143e695362c5bdbe2eaf8f741ab19bdc963c4
|
|
| MD5 |
661b0f86f30277980663ef8e7ab1b762
|
|
| BLAKE2b-256 |
b6caa625dad471827250add8b8316039c50f5fadb884879e1486910c26b1e1d7
|