Skip to main content

AssemblyAI streaming STT integration for Vision Agents

Project description

AssemblyAI Plugin

Streaming Speech-to-Text (STT) plugin for Vision Agents using AssemblyAI's Universal-3 Pro model.

Features

  • Real-time streaming transcription via async WebSocket
  • Built-in punctuation-based turn detection with configurable silence thresholds
  • Streaming diarization — identify speakers in real time
  • Native SpeechStarted event support
  • Custom prompt and keyterms boosting support
  • Sub-300ms time to complete transcript latency
  • Built-in reconnection with exponential backoff

Installation

uv add "vision-agents[assemblyai]"
# or directly
uv add vision-agents-plugins-assemblyai

Usage

from vision_agents.plugins import assemblyai

stt = assemblyai.STT(
    speech_model="u3-rt-pro",  # Default model
    sample_rate=16000,
)

With streaming diarization

Enable speaker_labels to identify speakers in a mixed audio stream. Each transcript event will carry a distinct participant per speaker and the raw label in response.other["speaker_label"].

stt = assemblyai.STT(
    speaker_labels=True,
    max_speakers=2,  # optional hint (1-10)
)

With keyterms boosting

stt = assemblyai.STT(
    keyterms_prompt=["AssemblyAI", "Vision Agents"],
)

With custom turn silence thresholds

stt = assemblyai.STT(
    min_turn_silence=100,   # ms before speculative EOT check
    max_turn_silence=1200,  # ms before forcing turn end
)

Configuration

Parameter Description Default
api_key AssemblyAI API key (falls back to ASSEMBLYAI_API_KEY env var) None
speech_model Model identifier "u3-rt-pro"
sample_rate Audio sample rate in Hz 16000
min_turn_silence Silence (ms) before speculative end-of-turn check API default
max_turn_silence Maximum silence (ms) before forcing turn end API default
prompt Custom transcription prompt (cannot be combined with keyterms_prompt) None
keyterms_prompt List of terms to boost recognition for (cannot be combined with prompt) None
speaker_labels Enable streaming diarization for multi-speaker identification False
max_speakers Hint for expected number of speakers, 1-10 (requires speaker_labels=True) None
max_reconnect_attempts Maximum reconnect attempts on transient failures 3
reconnect_backoff_initial_s Initial backoff delay in seconds 0.5
reconnect_backoff_max_s Maximum backoff delay in seconds 4.0

Environment Variables

Set ASSEMBLYAI_API_KEY in your environment or pass api_key to the constructor.

Dependencies

  • aiohttp>=3.9.0
  • vision-agents

Docs

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vision_agents_plugins_assemblyai-0.4.6.tar.gz (6.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file vision_agents_plugins_assemblyai-0.4.6.tar.gz.

File metadata

  • Download URL: vision_agents_plugins_assemblyai-0.4.6.tar.gz
  • Upload date:
  • Size: 6.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.6 {"installer":{"name":"uv","version":"0.10.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for vision_agents_plugins_assemblyai-0.4.6.tar.gz
Algorithm Hash digest
SHA256 38f02f86fc54304391c5908e31013c5b371dbb684b6af35c2da4b2f4f9865c7e
MD5 4d9a9b25142d934118e924e9c2461bf0
BLAKE2b-256 7a9aff26de59cd74f18b599a5bfbbfbeb968c6c0fe56bc40578c5ba2c8f227b1

See more details on using hashes here.

File details

Details for the file vision_agents_plugins_assemblyai-0.4.6-py3-none-any.whl.

File metadata

  • Download URL: vision_agents_plugins_assemblyai-0.4.6-py3-none-any.whl
  • Upload date:
  • Size: 15.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.6 {"installer":{"name":"uv","version":"0.10.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for vision_agents_plugins_assemblyai-0.4.6-py3-none-any.whl
Algorithm Hash digest
SHA256 cbf9795bd7755c0eab2a8071a9bc4b2afc8469ae31fd7a6dcac93134326ab86c
MD5 d4b54b7839ca925b1a19389a748c06a9
BLAKE2b-256 402aa4b61dd4be1642300896c5f9cf8d17177a46362b4d9ab5ed545315755927

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page