Skip to main content

Python SDK for AttentionLabs real-time attention detection.

Project description

attenlabs-sas

Python SDK for Attention Labs real-time selective auditory attention.

Every voice pipeline has the same problem: the microphone hears everything, but your ASR should only process speech directed at the device. Wake words solve this with a rigid trigger phrase. SAS solves it without one — classifying every audio frame as silent, human-directed, or device-directed and routing only what matters.

attenlabs-sas streams mic and webcam data to the SAS inference server over WebSocket and emits typed events: attention predictions, voice activity, conversation state, and ready-to-forward speech audio. LLM routing is left to you.

Sign up

Get your API token at attentionlabs.ai/dashboard.

Install

pip install attenlabs-sas

Requires Python 3.10+. sounddevice and opencv-python are pulled in automatically for mic and camera access.

Quickstart

import time
from sas import AttentionClient

client = AttentionClient(token="your-token")

@client.on_prediction
def _(event):
    label = {0: "silent", 1: "human", 2: "device"}.get(event.cls, "?")
    print(f"{label}  {event.confidence:.0%}  faces={event.num_faces}  src={event.source}")

@client.on_speech_ready
def _(event):
    # event.audio_base64 — base64 PCM16 @ 16 kHz mono, ready for OpenAI Realtime / any LLM
    # event.audio_pcm16  — same audio as np.int16 array
    print(f"speech ready ({event.duration_sec:.2f}s)")

@client.on_error
def _(event):
    print(f"ERROR: {event.title}: {event.message}")

client.start()
try:
    while True:
        time.sleep(0.1)
except KeyboardInterrupt:
    client.stop()

A full CLI demo wiring SAS + OpenAI Realtime lives at sas-py-demo.


API

AttentionClient

from sas import AttentionClient, CameraConfig, MicConfig

client = AttentionClient(
    token="...",                    # Auth token — sent as WS subprotocol
    url=None,                      # Server URL (default: wss://server.attentionlabs.ai/ws)
    video=CameraConfig(),          # Webcam config
    audio=MicConfig(),             # Mic config
    initial_threshold=0.7,         # Device-class confidence threshold (0..1)
    enable_audio=True,             # Set False to skip mic capture
    enable_video=True,             # Set False to skip webcam capture
)

Configuration

MicConfig

field type default notes
device int | str | None None Device index, name, or None for system default
channels int 1 Number of input channels

CameraConfig

field type default notes
device_index int 0 Webcam device index
width int 1920 Capture width
height int 1080 Capture height
jpeg_quality int 60 JPEG compression quality 0–100

Methods

method description
start() Opens WebSocket, acquires mic + camera, starts capture threads. Non-blocking. Raises on handshake failure.
stop() Tears down capture, joins threads, closes WebSocket.
mute() Pauses upstream audio and signals server to stop VAD.
unmute() Resumes upstream audio.
mark_responding(bool) Tell the server an LLM response is in flight. Server stops emitting predictions while True.
set_threshold(value: float) Update device-class confidence threshold (0..1). Server acks via config event.

Events

Register handlers with decorators. All callbacks fire on internal threads — keep them fast or hand work off to your own thread.

@client.on_prediction
def handle(event):
    ...
decorator payload fires when
@on_connected WebSocket opens
@on_started Server-side warmup complete
@on_warmup_complete First non-zero-confidence prediction
@on_prediction PredictionEvent Each attention prediction
@on_vad VadEvent Voice activity update
@on_state StateEvent Conversation state transition
@on_speech_ready SpeechReadyEvent Complete speech segment ready to forward
@on_config ConfigEvent Server acks a threshold change
@on_stats StatsEvent Every ~10s with connection health
@on_error AttentionErrorEvent Connection, auth, or server error
@on_disconnected DisconnectedEvent WebSocket closes

Event types

PredictionEvent

cls: int            # 0 = silent, 1 = human-directed, 2 = device-directed
confidence: float   # 0..1
source: str         # "video" or "audio"
num_faces: int      # faces detected in frame

VadEvent

probability: float  # VAD probability 0..1
is_speech: bool     # whether speech was detected

StateEvent

state: ConversationState  # "listening" | "sending" | "cancelled" | "idle"

SpeechReadyEvent

audio_pcm16: np.ndarray   # int16 array @ 16 kHz mono
audio_base64: str          # same audio as base64 — ready for OpenAI Realtime, etc.
duration_sec: float        # duration in seconds

ConfigEvent

model_class2_threshold: float  # server-confirmed threshold

StatsEvent

rtt_ms: float | None  # round-trip latency in ms
sent_video: int        # total video frames sent
skipped_video: int     # total video frames skipped
sent_audio: int        # total audio chunks sent
uptime_s: float        # connection uptime in seconds

AttentionErrorEvent

title: str                  # error category ("Auth Failed", "Connection Stalled", etc.)
message: str                # human-readable message
detail: str | None = None   # technical detail
code: int | None = None     # WebSocket close code, if applicable

DisconnectedEvent

code: int        # WebSocket close code
reason: str      # close reason
was_clean: bool  # True if code == 1000

LLM integration

LLM routing is intentionally not part of the SDK. The speech_ready event hands you PCM16 audio — both as a NumPy array and as base64 — forward it wherever you like.

When your LLM starts generating, call mute() + mark_responding(True) to suppress predictions during playback. When it finishes, unmute() + mark_responding(False).

from sas import AttentionClient

client = AttentionClient(token="...")

@client.on_speech_ready
def _(event):
    # Forward to your LLM of choice
    your_llm.send(event.audio_base64)

def on_llm_speaking():
    client.mute()
    client.mark_responding(True)

def on_llm_done():
    client.unmute()
    client.mark_responding(False)

See sas-py-demo for a full working example with OpenAI Realtime.

Threading model

The SDK manages four threads internally:

thread purpose
sas-ws WebSocket send/receive
sas-heartbeat JSON pings every 5s, stats every 10s
sas-camera JPEG capture at 4 fps (250 ms)
(sounddevice) Audio callback at native sample rate, resampled to 16 kHz

All event callbacks fire on sas-ws or sas-heartbeat. Don't block them — offload heavy work to your own thread.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

attenlabs_sas-0.2.1.tar.gz (11.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

attenlabs_sas-0.2.1-py3-none-any.whl (13.9 kB view details)

Uploaded Python 3

File details

Details for the file attenlabs_sas-0.2.1.tar.gz.

File metadata

  • Download URL: attenlabs_sas-0.2.1.tar.gz
  • Upload date:
  • Size: 11.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.14

File hashes

Hashes for attenlabs_sas-0.2.1.tar.gz
Algorithm Hash digest
SHA256 a5710bc47d26d5d04a9d551fb2042fa5f9b79e20894613f59e89eab6223a98e0
MD5 dd41d7ded70623f29cc537ace087550c
BLAKE2b-256 f14bf8a1fd3f57add9f556494dacee0444bf0bde02364a3e7b80a8b61498e328

See more details on using hashes here.

File details

Details for the file attenlabs_sas-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: attenlabs_sas-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 13.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.14

File hashes

Hashes for attenlabs_sas-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 8f598ad5b0fe9f58be6bb0a0ec63ffa91aedc1c97fddf225d3c93a6519f0cde6
MD5 68543098254ccc3244ca2e77495a53c9
BLAKE2b-256 42949574f56e7a2d578014ad64ab9ab38f66ef108befdbc62d3d2e72e7e6188f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page