Python SDK for AttentionLabs real-time attention detection.
Project description
attenlabs-sas
Python SDK for Attention Labs real-time selective auditory attention.
Every voice pipeline has the same problem: the microphone hears everything, but your ASR should only process speech directed at the device. Wake words solve this with a rigid trigger phrase. SAS solves it without one — classifying every audio frame as silent, human-directed, or device-directed and routing only what matters.
attenlabs-sas streams mic and webcam data to the SAS inference server over WebSocket and emits typed events: attention predictions, voice activity, conversation state, and ready-to-forward speech audio. LLM routing is left to you.
Sign up
Get your API token at attentionlabs.ai/dashboard.
Install
pip install attenlabs-sas
Requires Python 3.10+. sounddevice and opencv-python are pulled in automatically for mic and camera access.
Quickstart
import time
from sas import AttentionClient
client = AttentionClient(token="your-token")
@client.on_prediction
def _(event):
label = {0: "silent", 1: "human", 2: "device"}.get(event.cls, "?")
print(f"{label} {event.confidence:.0%} faces={event.num_faces} src={event.source}")
@client.on_speech_ready
def _(event):
# event.audio_base64 — base64 PCM16 @ 16 kHz mono, ready for OpenAI Realtime / any LLM
# event.audio_pcm16 — same audio as np.int16 array
print(f"speech ready ({event.duration_sec:.2f}s)")
@client.on_error
def _(event):
print(f"ERROR: {event.title}: {event.message}")
client.start()
try:
while True:
time.sleep(0.1)
except KeyboardInterrupt:
client.stop()
A full CLI demo wiring SAS + OpenAI Realtime lives at sas-py-demo.
API
AttentionClient
from sas import AttentionClient, CameraConfig, MicConfig
client = AttentionClient(
token="...", # Auth token — sent as WS subprotocol
url=None, # Server URL (default: wss://server.attentionlabs.ai/ws)
video=CameraConfig(), # Webcam config
audio=MicConfig(), # Mic config
initial_threshold=0.7, # Device-class confidence threshold (0..1)
enable_audio=True, # Set False to skip mic capture
enable_video=True, # Set False to skip webcam capture
)
Configuration
MicConfig
| field | type | default | notes |
|---|---|---|---|
device |
int | str | None |
None |
Device index, name, or None for system default |
channels |
int |
1 |
Number of input channels |
CameraConfig
| field | type | default | notes |
|---|---|---|---|
device_index |
int |
0 |
Webcam device index |
width |
int |
1920 |
Capture width |
height |
int |
1080 |
Capture height |
jpeg_quality |
int |
60 |
JPEG compression quality 0–100 |
Methods
| method | description |
|---|---|
start() |
Opens WebSocket, acquires mic + camera, starts capture threads. Non-blocking. Raises on handshake failure. |
stop() |
Tears down capture, joins threads, closes WebSocket. |
mute() |
Pauses upstream audio and signals server to stop VAD. |
unmute() |
Resumes upstream audio. |
mark_responding(bool) |
Tell the server an LLM response is in flight. Server stops emitting predictions while True. |
set_threshold(value: float) |
Update device-class confidence threshold (0..1). Server acks via config event. |
Events
Register handlers with decorators. All callbacks fire on internal threads — keep them fast or hand work off to your own thread.
@client.on_prediction
def handle(event):
...
| decorator | payload | fires when |
|---|---|---|
@on_connected |
— | WebSocket opens |
@on_started |
— | Server-side warmup complete |
@on_warmup_complete |
— | First non-zero-confidence prediction |
@on_prediction |
PredictionEvent |
Each attention prediction |
@on_vad |
VadEvent |
Voice activity update |
@on_state |
StateEvent |
Conversation state transition |
@on_speech_ready |
SpeechReadyEvent |
Complete speech segment ready to forward |
@on_config |
ConfigEvent |
Server acks a threshold change |
@on_stats |
StatsEvent |
Every ~10s with connection health |
@on_error |
AttentionErrorEvent |
Connection, auth, or server error |
@on_disconnected |
DisconnectedEvent |
WebSocket closes |
Event types
PredictionEvent
cls: int # 0 = silent, 1 = human-directed, 2 = device-directed
confidence: float # 0..1
source: str # "video" or "audio"
num_faces: int # faces detected in frame
VadEvent
probability: float # VAD probability 0..1
is_speech: bool # whether speech was detected
StateEvent
state: ConversationState # "listening" | "sending" | "cancelled" | "idle"
SpeechReadyEvent
audio_pcm16: np.ndarray # int16 array @ 16 kHz mono
audio_base64: str # same audio as base64 — ready for OpenAI Realtime, etc.
duration_sec: float # duration in seconds
ConfigEvent
model_class2_threshold: float # server-confirmed threshold
StatsEvent
rtt_ms: float | None # round-trip latency in ms
sent_video: int # total video frames sent
skipped_video: int # total video frames skipped
sent_audio: int # total audio chunks sent
uptime_s: float # connection uptime in seconds
AttentionErrorEvent
title: str # error category ("Auth Failed", "Connection Stalled", etc.)
message: str # human-readable message
detail: str | None = None # technical detail
code: int | None = None # WebSocket close code, if applicable
DisconnectedEvent
code: int # WebSocket close code
reason: str # close reason
was_clean: bool # True if code == 1000
LLM integration
LLM routing is intentionally not part of the SDK. The speech_ready event hands you PCM16 audio — both as a NumPy array and as base64 — forward it wherever you like.
When your LLM starts generating, call mute() + mark_responding(True) to suppress predictions during playback. When it finishes, unmute() + mark_responding(False).
from sas import AttentionClient
client = AttentionClient(token="...")
@client.on_speech_ready
def _(event):
# Forward to your LLM of choice
your_llm.send(event.audio_base64)
def on_llm_speaking():
client.mute()
client.mark_responding(True)
def on_llm_done():
client.unmute()
client.mark_responding(False)
See sas-py-demo for a full working example with OpenAI Realtime.
Threading model
The SDK manages four threads internally:
| thread | purpose |
|---|---|
sas-ws |
WebSocket send/receive |
sas-heartbeat |
JSON pings every 5s, stats every 10s |
sas-camera |
JPEG capture at 4 fps (250 ms) |
| (sounddevice) | Audio callback at native sample rate, resampled to 16 kHz |
All event callbacks fire on sas-ws or sas-heartbeat. Don't block them — offload heavy work to your own thread.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file attenlabs_sas-0.1.3.tar.gz.
File metadata
- Download URL: attenlabs_sas-0.1.3.tar.gz
- Upload date:
- Size: 11.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c57f29e095a7fd182346f51fde3adac9781584d932c5508927a122850c4a7236
|
|
| MD5 |
4368d60d7c8950e42af1b13ffaf17a92
|
|
| BLAKE2b-256 |
931410db0fb9698b8179531fd1d53ffe7a40f29732fb3c8f0544acdf7c7109b9
|
File details
Details for the file attenlabs_sas-0.1.3-py3-none-any.whl.
File metadata
- Download URL: attenlabs_sas-0.1.3-py3-none-any.whl
- Upload date:
- Size: 13.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eae8900857acc2728179bacf3b6ead61373316a059bca4cbb3adfe24f77b8c39
|
|
| MD5 |
b01f30b43affbb149ddc0306d1966397
|
|
| BLAKE2b-256 |
a16d6517ba767a023119f130fb06cb19cbc984eed64e7848e482333bdd73db27
|