Python SDK for avatar WebSocket services with audio streaming and animation frame reception
Project description
Avatar SDK Python
A Python SDK for connecting to avatar services via WebSocket, supporting audio streaming and receiving animation frames.
Installation
pip install avatarkit
To enable the built-in PCM-to-Ogg-Opus encoder, install the optional opus extra:
pip install "avatarkit[opus]"
The optional encoder uses opuslib, which requires a working libopus runtime on the
host system.
Quick Start
import asyncio
from datetime import datetime, timedelta, timezone
from avatarkit import AudioFormat, new_avatar_session
async def main():
# Create session
session = new_avatar_session(
api_key="your-api-key",
app_id="your-app-id",
console_endpoint_url="https://console.us-west.spatialwalk.cloud/v1/console",
ingress_endpoint_url="wss://api.us-west.spatialwalk.cloud/v2/driveningress",
avatar_id="your-avatar-id",
expire_at=datetime.now(timezone.utc) + timedelta(minutes=5),
transport_frames=lambda frame, last: print(f"Received frame: {len(frame)} bytes"),
on_error=lambda err: print(f"Error: {err}"),
on_close=lambda: print("Session closed")
)
# Initialize and connect
await session.init()
connection_id = await session.start()
print(f"Connected: {connection_id}")
# Send audio
audio_data = b"..." # Your PCM or Ogg Opus audio data
request_id = await session.send_audio(audio_data, end=True)
print(f"Sent audio: {request_id}")
# Wait for frames...
await asyncio.sleep(10)
# Close
await session.close()
if __name__ == "__main__":
asyncio.run(main())
Detailed Usage
Session Configuration
The SDK provides two ways to configure a session:
Option 1: Using new_avatar_session() (Recommended)
from avatarkit import AudioFormat, new_avatar_session
session = new_avatar_session(
avatar_id="avatar-123",
api_key="your-api-key",
app_id="your-app-id",
# For web-style auth, set use_query_auth=True to put (appId, sessionKey)
# in the websocket URL query params instead of headers.
use_query_auth=False,
expire_at=datetime.now(timezone.utc) + timedelta(minutes=5),
console_endpoint_url="https://console.us-west.spatialwalk.cloud/v1/console",
ingress_endpoint_url="wss://api.us-west.spatialwalk.cloud/v2/driveningress",
sample_rate=16000, # Default: 16000 Hz
audio_format=AudioFormat.PCM_S16LE,
transport_frames=on_frame_received,
on_error=on_error,
on_close=on_close
)
Option 2: Using Configuration Builder
from avatarkit import SessionConfigBuilder, AvatarSession
config = (SessionConfigBuilder()
.with_avatar_id("avatar-123")
.with_api_key("your-api-key")
.with_app_id("your-app-id")
.with_console_endpoint_url("https://console.us-west.spatialwalk.cloud/v1/console")
.with_ingress_endpoint_url("wss://api.us-west.spatialwalk.cloud/v2/driveningress")
.with_expire_at(datetime.now(timezone.utc) + timedelta(minutes=5))
.with_transport_frames(on_frame_received)
.build())
session = AvatarSession(config)
Session Lifecycle
# 1. Initialize (get session token)
await session.init()
# 2. Start WebSocket connection
connection_id = await session.start()
# 3. Send audio data
request_id = await session.send_audio(audio_bytes, end=True)
# 4. Receive frames via callback
# (automatically handled in background)
# 5. Close session
await session.close()
Audio Format
The SDK supports two session-level input formats:
AudioFormat.PCM_S16LE- mono 16-bit PCM bytesAudioFormat.OGG_OPUS- one continuous Ogg Opus stream per request ID
PCM input
- Sample Rate: one of
[8000, 16000, 22050, 24000, 32000, 44100, 48000] - Channels: 1 (mono)
- Bit Depth: 16-bit
- Format: Raw PCM bytes
from avatarkit import AudioFormat
session = new_avatar_session(
...,
sample_rate=16000,
audio_format=AudioFormat.PCM_S16LE,
)
with open("audio.pcm", "rb") as f:
audio_data = f.read()
await session.send_audio(audio_data, end=True)
Ogg Opus input
- Sample Rate: one of
[8000, 12000, 16000, 24000, 48000] - Channels: 1 (mono)
- Format: Ogg Opus pages/chunks
- Request contract: each request ID must carry one continuous Ogg Opus stream across one or more
send_audio()calls, and the final chunk must useend=True
from avatarkit import AudioFormat
session = new_avatar_session(
...,
sample_rate=24000,
bitrate=32000,
audio_format=AudioFormat.OGG_OPUS,
)
with open("audio.ogg", "rb") as f:
while chunk := f.read(4096):
await session.send_audio(chunk, end=False)
await session.send_audio(b"", end=True)
Built-in PCM to Ogg Opus encoder
If you want the session to negotiate AudioFormat.OGG_OPUS but still provide raw PCM
bytes to send_audio(), enable the optional internal encoder.
from avatarkit import AudioFormat, OggOpusEncoderConfig
encoded_outputs = []
session = new_avatar_session(
...,
sample_rate=24000,
bitrate=32000,
audio_format=AudioFormat.OGG_OPUS,
ogg_opus_encoder=OggOpusEncoderConfig(frame_duration_ms=20),
on_encoded_audio=lambda req_id, payload: encoded_outputs.append((req_id, payload)),
)
with open("audio_24000.pcm", "rb") as f:
pcm_audio = f.read()
await session.send_audio(pcm_audio, end=True)
Notes:
- The internal encoder is optional; if you do not install
avatarkit[opus], keep using PCM or provide pre-encoded Ogg Opus bytes yourself. on_encoded_audiofires when internal encoding completes for a request and receives(req_id, encoded_audio_bytes).- Advanced usage still works: if
audio_format=AudioFormat.OGG_OPUSandogg_opus_encoderis unset,send_audio()forwards your pre-encoded Ogg Opus bytes unchanged.
LiveKit Egress Mode
When configured with livekit_egress, audio and animation data are streamed to a LiveKit room via the egress service instead of being returned through the WebSocket connection.
from avatarkit import new_avatar_session, LiveKitEgressConfig
session = new_avatar_session(
avatar_id="avatar-123",
api_key="your-api-key",
app_id="your-app-id",
console_endpoint_url="https://console.us-west.spatialwalk.cloud/v1/console",
ingress_endpoint_url="wss://api.us-west.spatialwalk.cloud/v2/driveningress",
expire_at=datetime.now(timezone.utc) + timedelta(minutes=5),
livekit_egress=LiveKitEgressConfig(
url="wss://livekit.example.com",
api_key="livekit-api-key",
api_secret="livekit-api-secret",
room_name="my-room",
publisher_id="avatar-publisher",
),
)
When LiveKit egress is enabled:
- The server streams output to the specified LiveKit room
- The
transport_framescallback will not be invoked - Audio and animation data are published to the room under the specified publisher ID
Interrupt (LiveKit Egress Only)
The interrupt() method sends an interrupt signal to stop current audio processing. This is only available when using LiveKit egress mode.
# Send audio
request_id = await session.send_audio(audio_data, end=True)
# Later, if you need to interrupt (e.g., user wants to stop playback)
interrupted_id = await session.interrupt()
print(f"Interrupted request: {interrupted_id}")
The interrupt uses the most recent request ID, even after end=True was sent. This allows interrupting requests that have finished sending audio but are still being processed by the server.
Callbacks
Transport Frames Callback
Receives animation frames from the server:
def on_frame_received(frame_data: bytes, is_last: bool):
print(f"Received frame: {len(frame_data)} bytes")
if is_last:
print("This is the last frame")
# Process frame_data (contains serialized Message protobuf)
Error Callback
Handles errors from the session:
from avatarkit import AvatarSDKError
def on_error(error: Exception):
print(f"Session error: {error}")
if isinstance(error, AvatarSDKError):
print(" code:", error.code.value)
print(" phase:", error.phase)
print(" http_status:", error.http_status)
print(" server_code:", error.server_code)
print(" server_detail:", error.server_detail)
The SDK reports structured AvatarSDKError instances for token creation failures,
WebSocket upgrade rejections, handshake failures, runtime ServerError messages,
and unexpected connection drops.
Error Handling
Use SessionTokenError for token creation failures and AvatarSDKError for all
other structured SDK errors:
from avatarkit import AvatarSDKError, SessionTokenError
try:
await session.init()
await session.start()
except SessionTokenError as error:
print("token failed", error.code.value, error.server_detail)
except AvatarSDKError as error:
print("sdk error", error.code.value, error.phase, error.server_detail)
AvatarSDKError and SessionTokenError expose these fields:
code- Stable SDK error codemessage- Human-readable messagephase- Failure phase such assession_token,websocket_connect,websocket_handshake,websocket_runtime, orwebsocket_sendhttp_status- HTTP status for token or WebSocket upgrade rejectionsserver_code- Server-provided error code, including runtime protobufServerError.codeserver_title/server_detail- Parsed server error details when availableconnection_id/req_id- Server correlation identifiers when availableraw_body- Raw HTTP rejection body for token or WebSocket upgrade failuresclose_code/close_reason- WebSocket close details for unexpected disconnects
Common AvatarSDKErrorCode values:
sessionTokenExpired- Session token expired or unauthorizedsessionTokenInvalid- Invalid or empty session tokenappIDUnrecognized- App ID is not recognized by the serverappIDMismatch- Session token belongs to a different appavatarNotFound- Avatar does not existbillingRequired- Session denied by billing checkscreditsExhausted- Runtime or connect-time credits exhaustedsessionDurationExceeded- Billing-enforced session timeout reachedunsupportedSampleRate- Handshake rejected unsupported audio sample rateinvalidEgressConfig- LiveKit or Agora egress config is invalidegressUnavailable- Egress service is unavailable or not configuredidleTimeout- Server closed the session after input inactivityupstreamError- Internal upstream service failedprotocolError- Invalid protobuf or unexpected message sequenceconnectionFailed- Transport-level connection failureconnectionClosed- Unexpected WebSocket closeserverError- Server-side failure that did not match a more specific mappinginvalidRequest- Other client-side request validation errorsunknown- Fallback when the SDK cannot classify the failure
Close Callback
Called when the session closes:
def on_close():
print("Session has been closed")
API Reference
AvatarSession
Main class for managing avatar sessions.
Methods
async init()- Initialize session and obtain tokenasync start() -> str- Start WebSocket connection, returns connection IDasync send_audio(audio: bytes, end: bool = False) -> str- Send audio data, returns request IDasync interrupt() -> str- Interrupt current audio processing (LiveKit egress mode only), returns interrupted request IDasync close()- Close the session and clean up resourcesconfig -> SessionConfig- Get session configuration (property)
SessionConfig
Configuration dataclass for avatar sessions.
Fields
avatar_id: str- Avatar identifierapi_key: str- API key for authenticationapp_id: str- Application identifieruse_query_auth: bool- Send websocket auth via query params (web) instead of headers (mobile)expire_at: datetime- Session expiration timesample_rate: int- Audio sample rate (default: 16000)bitrate: int- Audio bitrate (default: 0; PCM typically uses 0)transport_frames: Callable[[bytes, bool], None]- Frame callbackon_error: Callable[[Exception], None]- Error callbackon_close: Callable[[], None]- Close callbackconsole_endpoint_url: str- Console API URLingress_endpoint_url: str- Ingress WebSocket URLlivekit_egress: Optional[LiveKitEgressConfig]- LiveKit egress configuration
LiveKitEgressConfig
Configuration for streaming to a LiveKit room.
Fields
url: str- LiveKit server URL (e.g.,wss://livekit.example.com)api_key: str- LiveKit API keyapi_secret: str- LiveKit API secretroom_name: str- LiveKit room name to joinpublisher_id: str- Publisher identity in the roomextra_attributes: dict[str, str]- Extra LiveKit participant attributesidle_timeout: int- Idle timeout in seconds (0 uses server defaults)
SessionConfigBuilder
Builder for constructing SessionConfig with fluent interface.
Methods
All methods return self for chaining:
with_avatar_id(avatar_id: str)with_api_key(api_key: str)with_app_id(app_id: str)with_use_query_auth(use_query_auth: bool)with_expire_at(expire_at: datetime)with_sample_rate(sample_rate: int)with_bitrate(bitrate: int)with_transport_frames(handler: Callable)with_on_error(handler: Callable)with_on_close(handler: Callable)with_console_endpoint_url(url: str)with_ingress_endpoint_url(url: str)with_livekit_egress(config: LiveKitEgressConfig)build() -> SessionConfig- Build the configuration
Utility Functions
generate_log_id() -> str- Generate unique log ID in format "YYYYMMDDHHMMSS_<nanoid>"
Exceptions
AvatarSDKError- Structured SDK error with stable code and context fieldsSessionTokenError- Subclass ofAvatarSDKErrorraised when session token request fails
Examples
See the examples directory for complete working examples:
- single_audio_clip - Basic usage with a single audio file
- http_service - Simple HTTP API that returns PCM audio (by sample rate) and generated animation Message binaries
Protocol Buffers
The SDK uses Protocol Buffers for efficient serialization. The proto definitions are in proto/message.proto.
Generating Proto Code
Proto code is generated using buf:
cd proto
buf generate
The generated Python code is placed in src/avatarkit/proto/generated/.
Message Types
MESSAGE_CLIENT_CONFIGURE_SESSION(1) - Client session negotiation parametersMESSAGE_SERVER_CONFIRM_SESSION(2) - Server confirms and returnsconnection_idMESSAGE_CLIENT_AUDIO_INPUT(3) - Client audio inputMESSAGE_SERVER_ERROR(4) - Server-side error messageMESSAGE_SERVER_RESPONSE_ANIMATION(5) - Server animation response (endindicates final)MESSAGE_CLIENT_INTERRUPT(7) - Client interrupt signal to stop processing
Development
Setup
# Install uv if not already installed
curl -LsSf https://astral.sh/uv/install.sh | sh
# Clone and setup
git clone <repository-url>
cd avatar-sdk-python
uv sync --all-extras
Running Tests
# Unit tests
uv run pytest
End-to-End Tests
The repository includes opt-in network tests in tests/test_e2e_errors.py and
tests/test_e2e_request.py. They are skipped by default and only run when
AVATARKIT_RUN_E2E=1 is set.
AVATARKIT_RUN_E2E=1 uv run pytest tests/test_e2e_errors.py tests/test_e2e_request.py
Available e2e cases:
- invalid WebSocket credentials -> expects
sessionTokenInvalid - valid credentials + missing avatar -> expects
avatarNotFound - valid credentials + real avatar + real audio -> sends a request and waits for the final animation frame
Environment variables:
AVATARKIT_RUN_E2E=1- Enables e2e testsAVATARKIT_E2E_API_KEY- Required for the realavatarNotFoundtestAVATARKIT_E2E_APP_ID- Required for the realavatarNotFoundtestAVATARKIT_E2E_CONSOLE_ENDPOINT- Required for the realavatarNotFoundtestAVATARKIT_E2E_INGRESS_ENDPOINT- Required for the realavatarNotFoundtest unless you use the default public ingress endpoint for the invalid-token test onlyAVATARKIT_E2E_MISSING_AVATAR_ID- Optional avatar id that should not exist; defaults toavatarkit-e2e-missing-avatar-404AVATARKIT_E2E_AVATAR_ID- Required for the real request testAVATARKIT_E2E_AUDIO_FORMAT- Optional,pcm_s16leorogg_opus; defaults topcm_s16leAVATARKIT_E2E_USE_INTERNAL_OGG_OPUS_ENCODER- Optional, set to1to test the SDK's built-in PCM-to-Ogg-Opus encoderAVATARKIT_E2E_AUDIO_PATH- Optional audio file path; defaults toaudio_16000.pcmfor PCM and for Ogg Opus when the internal encoder is enabled, otherwiseaudio.oggAVATARKIT_E2E_SAMPLE_RATE- Optional sample rate; defaults to16000for PCM and for Ogg Opus when the internal encoder is enabled, otherwise24000AVATARKIT_E2E_BITRATE- Optional bitrate; defaults to32000AVATARKIT_E2E_CHUNK_SIZE- Optional chunk size for streaming Ogg Opus; defaults to4096AVATARKIT_E2E_TIMEOUT_SECONDS- Optional request timeout; defaults to45
Example:
export AVATARKIT_RUN_E2E=1
export AVATARKIT_E2E_API_KEY="your-api-key"
export AVATARKIT_E2E_APP_ID="your-app-id"
export AVATARKIT_E2E_CONSOLE_ENDPOINT="https://console.us-west.spatialwalk.cloud/v1/console"
export AVATARKIT_E2E_INGRESS_ENDPOINT="wss://api.us-west.spatialwalk.cloud/v2/driveningress"
export AVATARKIT_E2E_MISSING_AVATAR_ID="avatarkit-e2e-missing-avatar-404"
export AVATARKIT_E2E_AVATAR_ID="your-real-avatar-id"
export AVATARKIT_E2E_AUDIO_FORMAT="pcm_s16le"
export AVATARKIT_E2E_AUDIO_PATH="audio_16000.pcm"
uv run pytest tests/test_e2e_errors.py tests/test_e2e_request.py
To test the SDK's built-in Ogg Opus encoder with a raw PCM fixture:
export AVATARKIT_RUN_E2E=1
export AVATARKIT_E2E_API_KEY="your-api-key"
export AVATARKIT_E2E_APP_ID="your-app-id"
export AVATARKIT_E2E_CONSOLE_ENDPOINT="https://console.us-west.spatialwalk.cloud/v1/console"
export AVATARKIT_E2E_INGRESS_ENDPOINT="wss://api.us-west.spatialwalk.cloud/v2/driveningress"
export AVATARKIT_E2E_AVATAR_ID="your-real-avatar-id"
export AVATARKIT_E2E_AUDIO_FORMAT="ogg_opus"
export AVATARKIT_E2E_USE_INTERNAL_OGG_OPUS_ENCODER="1"
export AVATARKIT_E2E_AUDIO_PATH="audio_16000.pcm"
export AVATARKIT_E2E_SAMPLE_RATE="16000"
uv run pytest tests/test_e2e_request.py -k send_audio_receives_animation_frames -s
If the credentialed variables are missing, the invalid-token e2e test still runs, and the
real avatarNotFound test is skipped automatically.
To run only the real request smoke test:
AVATARKIT_RUN_E2E=1 uv run pytest tests/test_e2e_request.py -k send_audio_receives_animation_frames -s
License
See LICENSE for details.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file avatarkit-0.1.6.tar.gz.
File metadata
- Download URL: avatarkit-0.1.6.tar.gz
- Upload date:
- Size: 22.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.11 {"installer":{"name":"uv","version":"0.10.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f5a7317b2e7b84726423f2ee523160beb8b8b6c9683316020db7f6262ce4a3ea
|
|
| MD5 |
096b2f75452b2ae2c2bd4a8091ee78cc
|
|
| BLAKE2b-256 |
e2be9973af61eda5ba5393c340d131448821bdff30d7540829dfc79daf6db01e
|
File details
Details for the file avatarkit-0.1.6-py3-none-any.whl.
File metadata
- Download URL: avatarkit-0.1.6-py3-none-any.whl
- Upload date:
- Size: 25.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.11 {"installer":{"name":"uv","version":"0.10.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e8df59a8a21412f24c67e7552cbb005e78f78ba0d4aa0d1668b5cbbd33c2d710
|
|
| MD5 |
b1b19095ef52a4d17d5057f45f3c9e3a
|
|
| BLAKE2b-256 |
a6cf4e784fe4816c9ee6bb77d56752c3bbca2d922cd31facc04f4f444e4ebb7f
|