Live speech-to-text streaming on Apple Silicon

Project description

TextStream

Local real-time speech-to-text for Apple Silicon. One pip install. No API keys. No cloud. No cost.

pip install textstream-asr then textstream

TextStream turns your Mac's microphone into a live transcription server. It runs Qwen3-ASR (~2% word error rate) on-device through MLX, filters noise with Silero VAD, and streams text over SSE at localhost:7890/stream. Any app, script, or frontend can subscribe and get words as they're spoken.

Build voice-controlled tools. Add live captions to your app. Record meeting notes that write themselves. Pipe speech into your IDE. Whatever needs ears — point it at the stream.

Why this exists

Cloud speech APIs charge per minute and add latency. Whisper runs offline but isn't real-time. TextStream gives you a live, local transcription endpoint that any process on your machine can read from — for free, with 2% WER accuracy.

Benchmarks

Numbers from published evaluations. Your actual RTF will depend on model size and what else is running.

Accuracy (Word Error Rate)

Model	LibriSpeech clean	LibriSpeech other	Params
Qwen3-ASR 0.6B (default)	2.11%	4.55%	600M
Qwen3-ASR 1.7B	1.63%	3.38%	1.7B
Whisper-large-v3	1.51%	3.97%	1.5B
GPT-4o-Transcribe	1.39%	3.75%	—

Source: Qwen3-ASR Technical Report

Speed (Apple Silicon via MLX)

Metric	Value
Real-time factor (RTF)	~0.06 (16x faster than real-time)
MLX vs PyTorch	~4x faster on Apple Silicon
VAD latency	<1ms per 32ms audio chunk
Time to first token	~92ms

Source: mlx-qwen3-asr benchmarks, Silero VAD performance metrics

Resource usage

RAM: ~1.2GB for 0.6B model, ~3GB for 1.7B
CPU/GPU: Runs on Neural Engine + GPU via MLX Metal backend. Minimal CPU overhead — the transcription loop sleeps between intervals
Disk: Models are cached by HuggingFace Hub (~1.2GB / 3.4GB first download)
Battery: Comparable to background music playback. MLX is designed for Apple Silicon power efficiency

Requirements

	Supported
macOS on Apple Silicon (M1/M2/M3/M4)	Yes
macOS on Intel	No — MLX requires Apple Silicon
Linux / Windows	Not yet — MLX is macOS-only. PyTorch backend planned
Python	3.10+

Install

pip install textstream-asr

Quick start

textstream                            # start transcribing, opens browser UI
textstream --no-browser               # headless — just the SSE server
textstream --engine qwen-1.7b         # larger model, lower word error rate
textstream --vad-threshold 0.5        # stricter voice detection (default 0.4)

Connect from your app

import json, urllib.request

# Subscribe to the live transcript stream
req = urllib.request.Request("http://localhost:7890/stream")
with urllib.request.urlopen(req) as resp:
    for line in resp:
        line = line.decode().strip()
        if line.startswith("data: "):
            event = json.loads(line[6:])
            if event["type"] == "stream":
                print(event["finalized"], event["draft"])

// Browser / Node SSE
const src = new EventSource("http://localhost:7890/stream");
src.onmessage = (e) => {
  const { finalized, draft } = JSON.parse(e.data);
  console.log(finalized, draft);
};

How it works

Every --interval seconds (default 2.5), TextStream drains the mic buffer and runs Silero VAD on the chunk. If speech is detected, the chunk is fed to Qwen3-ASR's streaming decoder. The model returns stable (finalized) text and speculative (draft) text. Stable text gets persisted to disk and broadcast to all SSE subscribers.

If the model hallucinates on noise that slips past VAD, a pattern filter catches it and resets the stream. Safety net — with VAD active, it almost never fires.

API

GET /stream    → SSE stream: {"type":"stream","finalized":"...","draft":"..."}
GET /engine    → {"engine":"qwen"}
GET /switch?engine=qwen-1.7b → hot-swap model without restart
GET /pause     → pause mic capture
GET /resume    → resume
GET /stop      → shutdown
GET /          → built-in browser UI

Configuration

Flag	Default	Description
`--port`	7890	HTTP server port
`--engine`	qwen	`qwen` (0.6B) or `qwen-1.7b`
`--interval`	2.5	Seconds between transcription updates
`--vad-threshold`	0.4	Silero VAD speech probability threshold
`--no-browser`	—	Don't open browser on start

Transcripts are saved to ~/Documents/textstream/transcripts/YYYY-MM-DD/.

Dependencies

MLX — Apple Silicon ML framework
mlx-qwen3-asr — Qwen3-ASR for MLX
silero-vad-lite — Voice activity detection (~2MB, bundles ONNX runtime)
sounddevice — PortAudio bindings
NumPy

Author

Boris Djordjevic — 199 Biotechnologies

License

MIT

Project details

Release history Release notifications | RSS feed

This version

0.2.0

Feb 28, 2026

0.1.0

Feb 28, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

textstream_asr-0.2.0.tar.gz (129.5 kB view details)

Uploaded Feb 28, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

textstream_asr-0.2.0-py3-none-any.whl (12.7 kB view details)

Uploaded Feb 28, 2026 Python 3

File details

Details for the file textstream_asr-0.2.0.tar.gz.

File metadata

Download URL: textstream_asr-0.2.0.tar.gz
Upload date: Feb 28, 2026
Size: 129.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for textstream_asr-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`23f46d3a72882b8c6fff43f6d506a217166fd066aa960a928687252d2d8952a5`
MD5	`e98d077da202667ca511f97006158f61`
BLAKE2b-256	`fa8bfa425aa0a55d1abe703baf8e989d87a52c16c074d4dc2aa51bfe1faf4df7`

See more details on using hashes here.

File details

Details for the file textstream_asr-0.2.0-py3-none-any.whl.

File metadata

Download URL: textstream_asr-0.2.0-py3-none-any.whl
Upload date: Feb 28, 2026
Size: 12.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for textstream_asr-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b22565a93c66f47654cb4a3fcadc8567c85a4e9c0893f6ff8eb1206b36b44637`
MD5	`0140718105bf60fc81c5e3c4c0ad400f`
BLAKE2b-256	`3da933d0f3b2f9becd6c1634fb6580ba53fe408a01d4418d4391327a3a2573a3`

See more details on using hashes here.

textstream-asr 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

TextStream

Why this exists

Benchmarks

Accuracy (Word Error Rate)

Speed (Apple Silicon via MLX)

Resource usage

Requirements

Install

Quick start

Connect from your app

How it works

API

Configuration

Dependencies

Author

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes