Skip to main content

Live speech-to-text streaming on Apple Silicon

Project description

TextStream

TextStream

Live speech-to-text on Apple Silicon. Streams microphone audio through Qwen3-ASR and shows results in a browser — updated every 2.5 seconds with finalized and draft text.

Built for real-time transcription during calls, meetings, or recording sessions. Uses Silero VAD to filter non-speech audio (music, background noise) before it reaches the model, which kills the hallucination problem where ASR models regurgitate their system prompt on noise input.

What it does

  • Captures mic audio at 16kHz, runs Silero voice activity detection, feeds speech chunks to Qwen3-ASR on MLX
  • Streams finalized + draft text to a browser UI via SSE at localhost:7890
  • Saves timestamped transcripts to ~/Documents/textstream/transcripts/YYYY-MM-DD/
  • Optionally pushes Grafana annotations for each finalized text segment
  • Two model sizes: 0.6B (default, fast) and 1.7B (more accurate), hot-swappable from the browser

Install

pip install textstream-asr

Requires Apple Silicon (M1 or later) and Python 3.10+. MLX doesn't run on Intel or Linux.

Usage

textstream                            # Qwen3-ASR 0.6B, opens browser
textstream --engine qwen-1.7b         # larger model, lower word error rate
textstream --vad-threshold 0.5        # stricter voice detection (default 0.4)
textstream --interval 2.0             # faster updates
textstream --no-browser --no-grafana  # headless
textstream --port 8080                # custom port

The browser UI shows live text with a dark theme. Finalized text in white, draft predictions in grey. Switch models from the dropdown without restarting.

How it works

Every --interval seconds, TextStream drains the mic buffer and runs Silero VAD on the chunk. If speech is detected (probability >= --vad-threshold), the chunk is fed to Qwen3-ASR's streaming decoder. The model returns stable (finalized) text and speculative (draft) text. Stable text is persisted to disk and broadcast to all connected browsers via server-sent events.

If the model hallucinates (outputs its chat system prompt on noise that slips past VAD), a pattern filter catches it and resets the stream. This is a safety net — with VAD active, it almost never fires.

Qwen3-ASR handles its own 30-second sliding context window internally, so there's no manual drift reset needed.

API

GET /          → browser UI
GET /stream    → SSE event stream (data: {"type":"stream","finalized":"...","draft":"..."})
GET /engine    → {"engine":"qwen"} or {"engine":"qwen-1.7b"}
GET /switch?engine=qwen-1.7b → hot-swap model
GET /pause     → pause mic capture
GET /resume    → resume mic capture
GET /stop      → shutdown server

Configuration

Flag Default Description
--port 7890 HTTP server port
--engine qwen qwen (0.6B) or qwen-1.7b
--interval 2.5 Seconds between transcription updates
--vad-threshold 0.4 Silero VAD speech probability threshold
--no-browser false Don't open browser on start
--no-grafana false Disable Grafana annotation push

Grafana integration reads GRAFANA_URL and GRAFANA_SERVICE_ACCOUNT_TOKEN from environment variables. If the token is empty, Grafana push is skipped automatically.

Dependencies

Author

Boris Djordjevic — 199 Biotechnologies

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

textstream_asr-0.1.0.tar.gz (128.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

textstream_asr-0.1.0-py3-none-any.whl (11.4 kB view details)

Uploaded Python 3

File details

Details for the file textstream_asr-0.1.0.tar.gz.

File metadata

  • Download URL: textstream_asr-0.1.0.tar.gz
  • Upload date:
  • Size: 128.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for textstream_asr-0.1.0.tar.gz
Algorithm Hash digest
SHA256 aacc20a1f57abcc37132fba64907b330a56d91d086db485af55be7b38060856d
MD5 cab68b6d4d0ad4d78206b4a8e3f43923
BLAKE2b-256 287cc20c5c390410412f30d2664163fa37bdf5b43ab0b43921fdedfa2456e6c2

See more details on using hashes here.

File details

Details for the file textstream_asr-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: textstream_asr-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 11.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for textstream_asr-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 483d7df8cbea5b8dfb28c8f77b41309512d542e0f01bbc61a15b44ad1b49ebd1
MD5 558453068bb72a8d0c5b77de101348b4
BLAKE2b-256 d80032d540da46f5b122b7e946a39b16d8ad7c5c2eb391967856323c1d75ee60

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page