Skip to main content

CLI for Omi Med STT v1 medical speech-to-text

Project description

Omi Med STT Runtime

PyPI Tests Runtime smoke License: MIT

Runtime CLI for Omi Med STT v1, an English medical speech-to-text model.

This repository contains runtime code only. It does not contain model weights, private benchmark data, or training data.

Safety And Intended Use

Omi Med STT v1 is speech-to-text only. It is not a diagnostic, triage, prescribing, or clinical decision model, and it is not clinically validated. Transcripts must be reviewed before any clinical use.

Runtimes

omi-med-stt supports three runtime paths:

Runtime Best for Artifact
mlx Apple Silicon Macs omi-health/omi-med-stt-v1-mlx-q8
cpp Linux and Windows CPU fallback omi-health/omi-med-stt-v1-gguf / omi-med-stt-v1-q8_0.gguf
nemo NVIDIA CUDA servers and canonical NeMo checkpoint use omi-health/omi-med-stt-v1

The source-of-truth model is the NeMo checkpoint. MLX and GGUF are runtime exports.

Default quantization:

  • Apple Silicon MLX defaults to the q8 MLX export.
  • Linux/Windows CPU defaults to the q8_0 GGUF export through parakeet.cpp.
  • NVIDIA CUDA/NeMo defaults to the canonical full NeMo checkpoint. A NeMo q8 checkpoint is not provided because the current NeMo runtime path does not have a validated drop-in q8 .nemo inference artifact.

Install

From PyPI:

pip install -U omi-med-stt

For Apple Silicon / MLX:

pip install -U "omi-med-stt[mlx]"

For CUDA/Linux NeMo:

pip install -U "omi-med-stt[nemo]"

From this repository, for runtime development:

pip install -e ".[dev]"

Models

Platform Default artifact Eval (locked private benchmark)
Apple Silicon omi-med-stt-v1-mlx-q8 (0.94 GB) WER 8.61% · M-WER 2.75%
NVIDIA CUDA omi-med-stt-v1 (.nemo, 2.5 GB) WER 8.30% · M-WER 2.37%
Linux/Windows CPU omi-med-stt-v1-gguf (q8_0, 0.93 GB) WER 9.12% · M-WER 3.20%

Full evaluation and methodology: https://github.com/Omi-Health/omi-med-stt-runtime/blob/main/docs/evaluation-summary.md · aggregate summary in docs/evaluation-summary.md.

CUDA Driver / Torch Compatibility

PyPI's default torch wheels currently target CUDA 13 and require NVIDIA driver >= 580. On hosts with an older driver (for example cloud GPU images with driver 570.x / CUDA 12.8), --runtime nemo fails with "The NVIDIA driver on your system is too old". Install a torch build that matches your driver before installing the nemo extra:

pip install torch --index-url https://download.pytorch.org/whl/cu128
pip install -U "omi-med-stt[nemo]"

Check your driver's supported CUDA version with nvidia-smi.

Basic Usage

Simple path:

omi-med-stt audio.wav

Explicit MLX:

omi-med-stt audio.wav --runtime mlx

Explicit NeMo:

omi-med-stt audio.wav --runtime nemo

Explicit parakeet.cpp / GGUF:

omi-med-stt audio.wav --runtime cpp

JSON output:

omi-med-stt audio.wav --json

Dependency/runtime check:

omi-med-stt check

parakeet.cpp / GGUF Runtime

The cpp runtime is powered by parakeet.cpp, a C++/ggml inference engine for NVIDIA Parakeet ASR models.

Omi Med STT v1 includes a post-Conformer medical adapter. Until this adapter extension is upstreamed, omi-med-stt builds parakeet.cpp with the adapter patch included in this repository and caches the resulting libparakeet shared library plus parakeet-cli.

The default cpp path uses the parakeet.cpp C API directly. For long audio it keeps the model loaded once and sends in-memory 16 kHz PCM chunks to libparakeet, avoiding repeated model setup and temporary chunk WAV reads. parakeet-cli remains available as a fallback and for developer debugging.

Normal use:

omi-med-stt audio.wav --runtime cpp

Windows CPU is a first-class target. Install once, then transcribe:

pip install -U omi-med-stt
omi-med-stt install-cpp --cpp-backend cpu
omi-med-stt audio.wav --runtime cpp --cpp-backend cpu

On Windows CPU the runtime defaults to a capped thread count min(os.cpu_count(), 8) instead of blindly using the upstream engine default. This is more stable on small 4-vCPU machines and can be overridden:

omi-med-stt audio.wav --runtime cpp --cpp-backend cpu --cpp-threads 4
$Env:OMI_MED_STT_CPP_THREADS = "4"
omi-med-stt audio.wav --runtime cpp --cpp-backend cpu

Pre-install the C++ runtime explicitly:

omi-med-stt install-cpp

Choose a backend:

omi-med-stt install-cpp --cpp-backend cpu

Manual override remains available for developers:

omi-med-stt audio.wav --runtime cpp --parakeet-cli /path/to/parakeet-cli

Runtime toggles:

# force the old subprocess path
OMI_MED_STT_CPP_DISABLE_CAPI=1 omi-med-stt audio.wav --runtime cpp

# require C API and fail instead of falling back
OMI_MED_STT_CPP_REQUIRE_CAPI=1 omi-med-stt audio.wav --runtime cpp

# portable build instead of native CPU flags
OMI_MED_STT_GGML_NATIVE=OFF omi-med-stt install-cpp --force --cpp-backend cpu

For Linux and Windows CPU, install-cpp first downloads a pinned prebuilt runtime bundle from the Omi Med STT runtime GitHub release. If no bundle is available for the host, it falls back to building from parakeet.cpp with parakeet-cpp-omi-adapter.patch. The source-build fallback requires git and cmake. On Windows, normal use requires only the shared parakeet.dll runtime; it does not require a separate parakeet-cli.exe.

The cpp runtime downloads only the selected GGUF file. It does not download the NeMo .nemo checkpoint or the MLX model.safetensors.

By default, the cpp runtime is pinned to a specific Hugging Face repository revision and verifies the SHA256 checksum for the official f16/q8_0 GGUF files. You can override this for experiments:

omi-med-stt audio.wav --runtime cpp --revision main
omi-med-stt audio.wav --runtime cpp --no-verify-checksum

Long Audio

Omi Med STT v1 is based on Parakeet and can handle long-form audio. Start with the simple path:

omi-med-stt consult.wav

An explicit chunked path is still available for constrained environments:

omi-med-stt transcribe-long consult.wav --chunk-seconds 25 --overlap 3

Model Access

The runtime defaults to these Hugging Face model repositories:

  • omi-health/omi-med-stt-v1
  • omi-health/omi-med-stt-v1-mlx-q8
  • omi-health/omi-med-stt-v1-mlx for the full MLX export if explicitly selected
  • omi-health/omi-med-stt-v1-gguf

If the model repositories are private before launch, authenticate first:

huggingface-cli login

Pre-Publish Checks

Use these before pushing runtime changes to GitHub or publishing a new PyPI version. The checked-in unit tests do not download models and do not require audio files:

pip install -e ".[dev]"
python scripts/prepublish_check.py --skip-build

For real model smoke tests, put a small permitted audio file under local_smoke/ or pass any private local path. local_smoke/ and audio/model artifacts are git-ignored.

Apple Silicon Mac:

python scripts/prepublish_check.py --audio local_smoke/sample.wav --runtime mlx --skip-build

Linux with NVIDIA GPU:

python scripts/prepublish_check.py --audio local_smoke/sample.wav --runtime nemo --skip-build
python scripts/prepublish_check.py --audio local_smoke/sample.wav --runtime cpp:cuda --skip-build

Linux CPU:

python scripts/prepublish_check.py --audio local_smoke/sample.wav --runtime cpp:cpu --skip-build

Windows CPU, from PowerShell:

py -m scripts.prepublish_check --audio local_smoke\sample.wav --runtime cpp:cpu --skip-build
py -m scripts.prepublish_check --audio local_smoke\consult.wav --runtime cpp:cpu --long --skip-build

Long-audio smoke, using the runtime chunk path:

python scripts/prepublish_check.py --audio local_smoke/consult.wav --runtime cpp:cpu --long --skip-build

Windows NPU is currently an explicit expected skip. There is no NPU backend for Omi Med STT yet; use cpp CPU on Windows, MLX on Apple Silicon, or NeMo on a CUDA Linux machine.

The script writes the latest local report to local_smoke/prepublish_last.json.

Documentation

Additional public-safe maintainer documentation lives under docs/:

Agent/maintainer context is in AGENTS.md.

Attribution

This runtime uses or interoperates with:

  • NVIDIA NeMo / Parakeet, for the base ASR architecture.
  • parakeet-mlx, for Apple Silicon MLX inference.
  • parakeet.cpp, for GGUF / C++ / ggml inference.

See NOTICE.md.

License

Runtime code in this repository is MIT licensed.

Model weights are governed separately by the model repositories. Omi Med STT v1 is a derivative of nvidia/parakeet-tdt-0.6b-v2, whose model weights are licensed under CC-BY-4.0.

Omi Med STT v1 — links

Full evaluation & methodology https://github.com/Omi-Health/omi-med-stt-runtime/blob/main/docs/evaluation-summary.md
Canonical model (NeMo, source of truth) https://huggingface.co/omi-health/omi-med-stt-v1
PyPI https://pypi.org/project/omi-med-stt/
Related: SOAP summarization (sum-small) https://huggingface.co/omi-health/sum-small

Built by Omi Health · Scribe product page · Contact: hello@omi.health

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

omi_med_stt-0.1.18.tar.gz (42.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

omi_med_stt-0.1.18-py3-none-any.whl (27.5 kB view details)

Uploaded Python 3

File details

Details for the file omi_med_stt-0.1.18.tar.gz.

File metadata

  • Download URL: omi_med_stt-0.1.18.tar.gz
  • Upload date:
  • Size: 42.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for omi_med_stt-0.1.18.tar.gz
Algorithm Hash digest
SHA256 cae1f4ff8fb17e7c784569e39ad582898b79ffbd50dd66b39f1c7fa7c0a570a5
MD5 87d17abcde6c3de5dfe1892dea0dd656
BLAKE2b-256 eb0e2a1a529e669eddc1e22c5bfe08874ec449205a92c38a458cb025e7f31e25

See more details on using hashes here.

File details

Details for the file omi_med_stt-0.1.18-py3-none-any.whl.

File metadata

  • Download URL: omi_med_stt-0.1.18-py3-none-any.whl
  • Upload date:
  • Size: 27.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for omi_med_stt-0.1.18-py3-none-any.whl
Algorithm Hash digest
SHA256 00ad9fe8999a69ad3940f05bcdcc93b78e2447aa8c8dbecb96f49939a864e06c
MD5 040a5312fdefb416daf3379aeaf5cb62
BLAKE2b-256 08e6fc75168f20a783de9b82d8bb0bc5df2fa0d0195abe35850bb4e212f64cf2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page