Skip to main content

Hardware-agnostic middleware connecting sEMG silent-speech interfaces to LLM agents

Project description

Subvocal SDK: Physiological Silent Speech Interface Middleware

The Subvocal SDK is an open-source, hardware-agnostic middleware platform that connects surface electromyography (sEMG) interfaces to LLM-driven AI agents.

Rather than locking developers to a proprietary neckband or a closed whole-word vocabulary, the Subvocal SDK provides the software rails—signal conditioning, deep learning training skeletons, articulatory phonetic shorthand simulators, and context-aware decoders—to enable high-accuracy, low-latency, and open-vocabulary silent speech control.


🛠️ Installation

pip install subvocal

The base install is lightweight (pydantic + numpy) and covers the pipeline, hardware drivers, shorthand decoding, context, and the MCP server. Optional extras pull in heavier subsystems:

Extra Enables Installs
subvocal[ml] Classifier training, inference, calibration (subvocal.emg_core) scipy, scikit-learn, joblib, torch
subvocal[hardware] Public-dataset drivers (Ninapro, PutEMG, CSL-HDEMG) scipy, h5py
subvocal[tts] Audio feedback outside macOS pyttsx3
subvocal[export] ONNX model export onnx
subvocal[all] Everything above

🚀 Quickstart

A complete pipeline—synthetic sEMG source through intent reconstruction to action execution—runs offline in a few lines:

from subvocal import SubvocalPipeline
from subvocal.core.testing import MockActionExecutor, MockContextProvider, MockLLMProvider
from subvocal.hardware.drivers import SyntheticSignalGenerator
from subvocal.core.models import CommandToken
import time

hardware = SyntheticSignalGenerator(fs=1000.0, num_channels=8)

def classify(frame):
    """Replace with subvocal.emg_core.ml.infer.InferenceEngine for real models."""
    arr = frame.to_numpy()
    if abs(arr).max() > 1.0:  # a command burst is present
        return CommandToken(text="gt", confidence=0.95, timestamp=time.time())
    return None

pipeline = SubvocalPipeline(
    hardware=hardware,
    classify_fn=classify,
    llm_provider=MockLLMProvider(),       # or resolve_provider() / ClaudeProvider() ...
    context_provider=MockContextProvider(),
    executor=MockActionExecutor(),
    phrase_timeout_seconds=0.5,
    on_action=lambda action, status: print("observed:", action.action_type, status),
)

hardware.start()
hardware.trigger_command("gt", duration_ms=120)
for _ in range(30):
    action = pipeline.step(window_ms=50)
    if action:
        print("Executed:", action.action_type, action.params)
        # -> Executed: goto {'arguments': ['google.com'], 'resolved_text': 'GOTO google.com', ...}
        break
    time.sleep(0.05)  # real-time pacing: the phrase ends after 0.5 s of silence

Swap in a real LLM provider (subvocal.core.llm_providers.ClaudeProvider, OpenAIProvider, GeminiProvider, LlamaProvider), a real driver (OpenBCICytonDriver, DelsysTrignoDriver, FileReplayDriver), and a trained classifier (subvocal.emg_core.ml.infer.InferenceEngine) without changing the pipeline code. subvocal.resolve_provider() picks the best provider for the environment automatically — a real LLM when an API key is present, the offline HeuristicProvider otherwise.

Production behavior

  • Typed errors: everything the SDK raises derives from subvocal.SubvocalError (HardwareError, ProviderError, ConfigurationError, PolicyViolationError, ...), each compatible with the builtin exception type it replaces.
  • Resilient providers: configurable per-request timeouts and exponential-backoff retries for transient failures (connection errors, HTTP 408/429/5xx); non-retryable statuses fail fast.
  • Observability: pipeline.stats exposes running counters (frames, tokens, intents, executed/blocked actions, errors, uptime), and on_token / on_intent / on_action / on_error observer callbacks stream pipeline lifecycle events without ever breaking the pipeline. Every phrase is JSONL-traced for audit.
  • Safety: pluggable policy engine with dry-run mode; set raise_on_policy_violation=True to turn rejections into PolicyViolationError.

MCP server

The SDK ships a stdio Model Context Protocol server so Claude Desktop (or any MCP client) can ingest subvocal commands as tools:

subvocal-mcp

Claude Desktop config:

{
  "mcpServers": {
    "subvocal": { "command": "subvocal-mcp" }
  }
}

📂 Repository Structure

subvocal/
├── src/subvocal/           # The installable package
│   ├── core/               # Data models, interfaces, pipeline, security policies, LLM providers
│   ├── hardware/           # HAL drivers (file replay, synthetic, OpenBCI, Delsys) + dataset loaders
│   ├── emg_core/           # DSP filters, TD10 features, classifiers (RF/CNN/GRU/Transformer)
│   ├── shorthand/          # Phonetic shorthand vocabulary, simulator, hybrid decoder
│   ├── context/            # User context schemas and phonetic context matching
│   ├── mcp/                # Model Context Protocol stdio server
│   └── tts/                # Multi-backend TTS feedback engine
├── tests/                  # Pytest suite
├── benchmarks/             # 50-case intent-reconstruction eval harnesses
├── tools/                  # Site/API-page builders, license audit, benchmark runner
└── docs/                   # GitHub Pages site (landing, docs, platform corpus, API reference)
    └── content/            # Markdown sources for the platform corpus and walkthrough

🚀 Core Features

  1. Articulatory Shorthand Decoder: Overcomes the whole-word sEMG vocabulary ceiling. Decodes compressed phonetic consonant shorthand inputs (e.g. g gl -> Google) under heavy muscle-movement noise.
  2. Asymmetric Levenshtein Distance: A dynamic programming string alignment cost matrix configured with physiological sEMG confusion clusters (Glottal, Labial, Alveolar, Velar, Rhotic) to discount vowel/consonant omissions in silent speech.
  3. Command-Aware Context Prioritization: Dynamic target matching against active user contacts (TYPE), calendar events (SEARCH), browser URLs (GOTO), and active application screen elements (CLICK).
  4. Physiological Signal Conditioning: Preprocessing filter configurations defaulting to AlterEgo's 1.3–50.0 Hz bandpass filter (designed for low-velocity articulatory gestures) with configuration support for standard 20.0–450.0 hz EMG.
  5. Classifiers (RF + Deep Learning): Custom pipelines to train scikit-learn Random Forest, PyTorch 1D CNN, GRU, and Transformer architectures on raw multi-channel sEMG traces.
  6. Asynchronous Execution (V2 Architecture): Low-latency, thread-safe asynchronous pipeline orchestration built on LiveKit's OpsQueue and IncrementalDispatcher design.
  7. Physiological Signal Monitoring: Real-time EMA-smoothed signal level activity detection and MOS-like connection quality scoring (evaluating saturation, drift, and dropouts).
  8. Prometheus Telemetry: Integrated Prometheus metric exporter and pre-built Grafana monitoring dashboards for tracking SDK errors, session lifecycles, and action execution statistics.
  9. HMAC-Signed Capability Grants: Secure token-based credentials (ActionGrants) specifying allowed command scopes, confidence thresholds, and dry-run policies, verified dynamically via the GrantsPolicy middleware.
  10. MCP Integration: A zero-dependency stdio JSON-RPC server exposing pipeline status, token injection, phrase processing, and calibration as MCP tools.

🧪 Development

git clone https://github.com/PranavKalkunte/subvocal.git
cd subvocal
pip install -e ".[all,dev]"

pytest                      # test suite
ruff check src tests       # lint
pyright                     # type check
python benchmarks/eval_runner.py   # 50-case heuristic benchmark

Runtime artifacts (traces, trained models) are written to the per-user data directory; override with SUBVOCAL_DATA_DIR / SUBVOCAL_MODELS_DIR.

Contributions are welcome — see CONTRIBUTING.md for the workflow and quality gates, and SECURITY.md for vulnerability reporting.


📄 License

This repository is open-sourced under the MIT License. See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

subvocal-1.0.0rc1.tar.gz (245.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

subvocal-1.0.0rc1-py3-none-any.whl (103.4 kB view details)

Uploaded Python 3

File details

Details for the file subvocal-1.0.0rc1.tar.gz.

File metadata

  • Download URL: subvocal-1.0.0rc1.tar.gz
  • Upload date:
  • Size: 245.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for subvocal-1.0.0rc1.tar.gz
Algorithm Hash digest
SHA256 880ed3460132cfd2ae5c77e01f260dbe2a2f51e83866ceeeea667f469b9eda02
MD5 159c486d8acdba3be1cbea58b2b6af11
BLAKE2b-256 6b112238b8be8cd8b4eea081dce0ebcf8463cf19585d8e8ba2527afa971772be

See more details on using hashes here.

File details

Details for the file subvocal-1.0.0rc1-py3-none-any.whl.

File metadata

  • Download URL: subvocal-1.0.0rc1-py3-none-any.whl
  • Upload date:
  • Size: 103.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for subvocal-1.0.0rc1-py3-none-any.whl
Algorithm Hash digest
SHA256 7c990a82b109ab1e4645096bac6ca14dafbe5eba2974f051ec349464a1bfe4b4
MD5 c702d6f2d7ef98a71dc295baec04cf6e
BLAKE2b-256 99576bc2c0cb4f575fbc94db6634b6c696326cd9b872bb42c189e3fc17e93ba2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page