Skip to main content

Live speech-to-text streaming on Apple Silicon

Project description

TextStream

TextStream

Local real-time speech-to-text for Apple Silicon. One pip install. No API keys. No cloud. No cost.

pip install textstream-asr   then   textstream


TextStream turns your Mac's microphone into a live transcription server. It runs Qwen3-ASR (~2% word error rate) on-device through MLX, filters noise with Silero VAD, and streams text over SSE at localhost:7890/stream. Any app, script, or frontend can subscribe and get words as they're spoken.

Build voice-controlled tools. Add live captions to your app. Record meeting notes that write themselves. Pipe speech into your IDE. Whatever needs ears — point it at the stream.

Why this exists

Cloud speech APIs charge per minute and add latency. Whisper runs offline but isn't real-time. TextStream gives you a live, local transcription endpoint that any process on your machine can read from — for free, with 2% WER accuracy.

Benchmarks

Numbers from published evaluations. Your actual RTF will depend on model size and what else is running.

Accuracy (Word Error Rate)

Model LibriSpeech clean LibriSpeech other Params
Qwen3-ASR 0.6B (default) 2.11% 4.55% 600M
Qwen3-ASR 1.7B 1.63% 3.38% 1.7B
Whisper-large-v3 1.51% 3.97% 1.5B
GPT-4o-Transcribe 1.39% 3.75%

Source: Qwen3-ASR Technical Report

Speed (Apple Silicon via MLX)

Metric Value
Real-time factor (RTF) ~0.06 (16x faster than real-time)
MLX vs PyTorch ~4x faster on Apple Silicon
VAD latency <1ms per 32ms audio chunk
Time to first token ~92ms

Source: mlx-qwen3-asr benchmarks, Silero VAD performance metrics

Resource usage

  • RAM: ~1.2GB for 0.6B model, ~3GB for 1.7B
  • CPU/GPU: Runs on Neural Engine + GPU via MLX Metal backend. Minimal CPU overhead — the transcription loop sleeps between intervals
  • Disk: Models are cached by HuggingFace Hub (~1.2GB / 3.4GB first download)
  • Battery: Comparable to background music playback. MLX is designed for Apple Silicon power efficiency

Requirements

Supported
macOS on Apple Silicon (M1/M2/M3/M4) Yes
macOS on Intel No — MLX requires Apple Silicon
Linux / Windows Not yet — MLX is macOS-only. PyTorch backend planned
Python 3.10+

Install

pip install textstream-asr

Quick start

textstream                            # start transcribing, opens browser UI
textstream --no-browser               # headless — just the SSE server
textstream --engine qwen-1.7b         # larger model, lower word error rate
textstream --vad-threshold 0.5        # stricter voice detection (default 0.4)

Connect from your app

import json, urllib.request

# Subscribe to the live transcript stream
req = urllib.request.Request("http://localhost:7890/stream")
with urllib.request.urlopen(req) as resp:
    for line in resp:
        line = line.decode().strip()
        if line.startswith("data: "):
            event = json.loads(line[6:])
            if event["type"] == "stream":
                print(event["finalized"], event["draft"])
// Browser / Node SSE
const src = new EventSource("http://localhost:7890/stream");
src.onmessage = (e) => {
  const { finalized, draft } = JSON.parse(e.data);
  console.log(finalized, draft);
};

How it works

Every --interval seconds (default 2.5), TextStream drains the mic buffer and runs Silero VAD on the chunk. If speech is detected, the chunk is fed to Qwen3-ASR's streaming decoder. The model returns stable (finalized) text and speculative (draft) text. Stable text gets persisted to disk and broadcast to all SSE subscribers.

If the model hallucinates on noise that slips past VAD, a pattern filter catches it and resets the stream. Safety net — with VAD active, it almost never fires.

API

GET /stream    → SSE stream: {"type":"stream","finalized":"...","draft":"..."}
GET /engine    → {"engine":"qwen"}
GET /switch?engine=qwen-1.7b → hot-swap model without restart
GET /pause     → pause mic capture
GET /resume    → resume
GET /stop      → shutdown
GET /          → built-in browser UI

Configuration

Flag Default Description
--port 7890 HTTP server port
--engine qwen qwen (0.6B) or qwen-1.7b
--interval 2.5 Seconds between transcription updates
--vad-threshold 0.4 Silero VAD speech probability threshold
--no-browser Don't open browser on start

Transcripts are saved to ~/Documents/textstream/transcripts/YYYY-MM-DD/.

Dependencies

Author

Boris Djordjevic — 199 Biotechnologies

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

textstream_asr-0.2.0.tar.gz (129.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

textstream_asr-0.2.0-py3-none-any.whl (12.7 kB view details)

Uploaded Python 3

File details

Details for the file textstream_asr-0.2.0.tar.gz.

File metadata

  • Download URL: textstream_asr-0.2.0.tar.gz
  • Upload date:
  • Size: 129.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for textstream_asr-0.2.0.tar.gz
Algorithm Hash digest
SHA256 23f46d3a72882b8c6fff43f6d506a217166fd066aa960a928687252d2d8952a5
MD5 e98d077da202667ca511f97006158f61
BLAKE2b-256 fa8bfa425aa0a55d1abe703baf8e989d87a52c16c074d4dc2aa51bfe1faf4df7

See more details on using hashes here.

File details

Details for the file textstream_asr-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: textstream_asr-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 12.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for textstream_asr-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b22565a93c66f47654cb4a3fcadc8567c85a4e9c0893f6ff8eb1206b36b44637
MD5 0140718105bf60fc81c5e3c4c0ad400f
BLAKE2b-256 3da933d0f3b2f9becd6c1634fb6580ba53fe408a01d4418d4391327a3a2573a3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page