Live speech-to-text streaming on Apple Silicon

Project description

TextStream

Live speech-to-text on Apple Silicon. Streams microphone audio through Qwen3-ASR and shows results in a browser — updated every 2.5 seconds with finalized and draft text.

Built for real-time transcription during calls, meetings, or recording sessions. Uses Silero VAD to filter non-speech audio (music, background noise) before it reaches the model, which kills the hallucination problem where ASR models regurgitate their system prompt on noise input.

What it does

Captures mic audio at 16kHz, runs Silero voice activity detection, feeds speech chunks to Qwen3-ASR on MLX
Streams finalized + draft text to a browser UI via SSE at localhost:7890
Saves timestamped transcripts to ~/Documents/textstream/transcripts/YYYY-MM-DD/
Optionally pushes Grafana annotations for each finalized text segment
Two model sizes: 0.6B (default, fast) and 1.7B (more accurate), hot-swappable from the browser

Install

pip install textstream-asr

Requires Apple Silicon (M1 or later) and Python 3.10+. MLX doesn't run on Intel or Linux.

Usage

textstream                            # Qwen3-ASR 0.6B, opens browser
textstream --engine qwen-1.7b         # larger model, lower word error rate
textstream --vad-threshold 0.5        # stricter voice detection (default 0.4)
textstream --interval 2.0             # faster updates
textstream --no-browser --no-grafana  # headless
textstream --port 8080                # custom port

The browser UI shows live text with a dark theme. Finalized text in white, draft predictions in grey. Switch models from the dropdown without restarting.

How it works

Every --interval seconds, TextStream drains the mic buffer and runs Silero VAD on the chunk. If speech is detected (probability >= --vad-threshold), the chunk is fed to Qwen3-ASR's streaming decoder. The model returns stable (finalized) text and speculative (draft) text. Stable text is persisted to disk and broadcast to all connected browsers via server-sent events.

If the model hallucinates (outputs its chat system prompt on noise that slips past VAD), a pattern filter catches it and resets the stream. This is a safety net — with VAD active, it almost never fires.

Qwen3-ASR handles its own 30-second sliding context window internally, so there's no manual drift reset needed.

API

GET /          → browser UI
GET /stream    → SSE event stream (data: {"type":"stream","finalized":"...","draft":"..."})
GET /engine    → {"engine":"qwen"} or {"engine":"qwen-1.7b"}
GET /switch?engine=qwen-1.7b → hot-swap model
GET /pause     → pause mic capture
GET /resume    → resume mic capture
GET /stop      → shutdown server

Configuration

Flag	Default	Description
`--port`	7890	HTTP server port
`--engine`	qwen	`qwen` (0.6B) or `qwen-1.7b`
`--interval`	2.5	Seconds between transcription updates
`--vad-threshold`	0.4	Silero VAD speech probability threshold
`--no-browser`	false	Don't open browser on start
`--no-grafana`	false	Disable Grafana annotation push

Grafana integration reads GRAFANA_URL and GRAFANA_SERVICE_ACCOUNT_TOKEN from environment variables. If the token is empty, Grafana push is skipped automatically.

Dependencies

MLX — Apple Silicon ML framework
mlx-qwen3-asr — Qwen3-ASR for MLX
silero-vad-lite — Voice activity detection (~2MB, bundles ONNX runtime)
sounddevice — PortAudio bindings for mic capture
NumPy

Author

Boris Djordjevic — 199 Biotechnologies

License

MIT

Project details

Release history Release notifications | RSS feed

0.2.0

Feb 28, 2026

This version

0.1.0

Feb 28, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

textstream_asr-0.1.0.tar.gz (128.1 kB view details)

Uploaded Feb 28, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

textstream_asr-0.1.0-py3-none-any.whl (11.4 kB view details)

Uploaded Feb 28, 2026 Python 3

File details

Details for the file textstream_asr-0.1.0.tar.gz.

File metadata

Download URL: textstream_asr-0.1.0.tar.gz
Upload date: Feb 28, 2026
Size: 128.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for textstream_asr-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`aacc20a1f57abcc37132fba64907b330a56d91d086db485af55be7b38060856d`
MD5	`cab68b6d4d0ad4d78206b4a8e3f43923`
BLAKE2b-256	`287cc20c5c390410412f30d2664163fa37bdf5b43ab0b43921fdedfa2456e6c2`

See more details on using hashes here.

File details

Details for the file textstream_asr-0.1.0-py3-none-any.whl.

File metadata

Download URL: textstream_asr-0.1.0-py3-none-any.whl
Upload date: Feb 28, 2026
Size: 11.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for textstream_asr-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`483d7df8cbea5b8dfb28c8f77b41309512d542e0f01bbc61a15b44ad1b49ebd1`
MD5	`558453068bb72a8d0c5b77de101348b4`
BLAKE2b-256	`d80032d540da46f5b122b7e946a39b16d8ad7c5c2eb391967856323c1d75ee60`

See more details on using hashes here.

textstream-asr 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

TextStream

What it does

Install

Usage

How it works

API

Configuration

Dependencies

Author

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes