CLI for Omi Med STT v1 medical speech-to-text
Project description
Omi Med STT Runtime
Runtime CLI for Omi Med STT v1, an English medical speech-to-text model.
This repository contains runtime code only. It does not contain model weights, private benchmark data, or training data.
Safety And Intended Use
Omi Med STT v1 is speech-to-text only. It is not a diagnostic, triage, prescribing, or clinical decision model, and it is not clinically validated. Transcripts must be reviewed before any clinical use.
Runtimes
omi-med-stt supports three runtime paths:
| Runtime | Best for | Artifact |
|---|---|---|
mlx |
Apple Silicon Macs | omi-health/omi-med-stt-v1-mlx-q8 |
cpp |
Linux and Windows CPU fallback | omi-health/omi-med-stt-v1-gguf / omi-med-stt-v1-q8_0.gguf |
nemo |
NVIDIA CUDA servers and canonical NeMo checkpoint use | omi-health/omi-med-stt-v1 |
The source-of-truth model is the NeMo checkpoint. MLX and GGUF are runtime exports.
Default quantization:
- Apple Silicon MLX defaults to the q8 MLX export.
- Linux/Windows CPU defaults to the q8_0 GGUF export through
parakeet.cpp. - NVIDIA CUDA/NeMo defaults to the canonical full NeMo checkpoint. A NeMo q8
checkpoint is not provided because the current NeMo runtime path does not have
a validated drop-in q8
.nemoinference artifact.
Install
From PyPI:
pip install -U omi-med-stt
For Apple Silicon / MLX:
pip install -U "omi-med-stt[mlx]"
For CUDA/Linux NeMo:
pip install -U "omi-med-stt[nemo]"
From this repository, for runtime development:
pip install -e ".[dev]"
Models
| Platform | Default artifact | Eval (locked private benchmark) |
|---|---|---|
| Apple Silicon | omi-med-stt-v1-mlx-q8 (0.94 GB) | WER 8.61% · M-WER 2.75% |
| NVIDIA CUDA | omi-med-stt-v1 (.nemo, 2.5 GB) | WER 8.30% · M-WER 2.37% |
| Linux/Windows CPU | omi-med-stt-v1-gguf (q8_0, 0.93 GB) | WER 9.12% · M-WER 3.20% |
Full evaluation and methodology: https://github.com/Omi-Health/omi-med-stt-runtime/blob/main/docs/evaluation-summary.md · aggregate summary in docs/evaluation-summary.md.
CUDA Driver / Torch Compatibility
PyPI's default torch wheels currently target CUDA 13 and require NVIDIA
driver >= 580. On hosts with an older driver (for example cloud GPU images
with driver 570.x / CUDA 12.8), --runtime nemo fails with
"The NVIDIA driver on your system is too old". Install a torch build that
matches your driver before installing the nemo extra:
pip install torch --index-url https://download.pytorch.org/whl/cu128
pip install -U "omi-med-stt[nemo]"
Check your driver's supported CUDA version with nvidia-smi.
Basic Usage
Simple path:
omi-med-stt audio.wav
Explicit MLX:
omi-med-stt audio.wav --runtime mlx
Explicit NeMo:
omi-med-stt audio.wav --runtime nemo
Explicit parakeet.cpp / GGUF:
omi-med-stt audio.wav --runtime cpp
JSON output:
omi-med-stt audio.wav --json
Dependency/runtime check:
omi-med-stt check
parakeet.cpp / GGUF Runtime
The cpp runtime is powered by
parakeet.cpp, a C++/ggml inference
engine for NVIDIA Parakeet ASR models.
Omi Med STT v1 includes a post-Conformer medical adapter. Until this adapter
extension is upstreamed, omi-med-stt builds parakeet.cpp with the adapter
patch included in this repository and caches the resulting libparakeet shared
library plus parakeet-cli.
The default cpp path uses the parakeet.cpp C API directly. For long audio it
keeps the model loaded once and sends in-memory 16 kHz PCM chunks to
libparakeet, avoiding repeated model setup and temporary chunk WAV reads.
parakeet-cli remains available as a fallback and for developer debugging.
Normal use:
omi-med-stt audio.wav --runtime cpp
Windows CPU is a first-class target. Install once, then transcribe:
pip install -U omi-med-stt
omi-med-stt install-cpp --cpp-backend cpu
omi-med-stt audio.wav --runtime cpp --cpp-backend cpu
On Windows CPU the runtime defaults to a capped thread count
min(os.cpu_count(), 8) instead of blindly using the upstream engine default.
This is more stable on small 4-vCPU machines and can be overridden:
omi-med-stt audio.wav --runtime cpp --cpp-backend cpu --cpp-threads 4
$Env:OMI_MED_STT_CPP_THREADS = "4"
omi-med-stt audio.wav --runtime cpp --cpp-backend cpu
Pre-install the C++ runtime explicitly:
omi-med-stt install-cpp
Choose a backend:
omi-med-stt install-cpp --cpp-backend cpu
Manual override remains available for developers:
omi-med-stt audio.wav --runtime cpp --parakeet-cli /path/to/parakeet-cli
Runtime toggles:
# force the old subprocess path
OMI_MED_STT_CPP_DISABLE_CAPI=1 omi-med-stt audio.wav --runtime cpp
# require C API and fail instead of falling back
OMI_MED_STT_CPP_REQUIRE_CAPI=1 omi-med-stt audio.wav --runtime cpp
# portable build instead of native CPU flags
OMI_MED_STT_GGML_NATIVE=OFF omi-med-stt install-cpp --force --cpp-backend cpu
For Linux and Windows CPU, install-cpp first downloads a pinned prebuilt
runtime bundle from the Omi Med STT runtime GitHub release. If no bundle is
available for the host, it falls back to building from parakeet.cpp with
parakeet-cpp-omi-adapter.patch. The source-build fallback requires git and
cmake. On Windows, normal use requires only the shared parakeet.dll runtime;
it does not require a separate parakeet-cli.exe.
The cpp runtime downloads only the selected GGUF file. It does not download
the NeMo .nemo checkpoint or the MLX model.safetensors.
By default, the cpp runtime is pinned to a specific Hugging Face repository
revision and verifies the SHA256 checksum for the official f16/q8_0 GGUF files.
You can override this for experiments:
omi-med-stt audio.wav --runtime cpp --revision main
omi-med-stt audio.wav --runtime cpp --no-verify-checksum
Long Audio
Omi Med STT v1 is based on Parakeet and can handle long-form audio. Start with the simple path:
omi-med-stt consult.wav
An explicit chunked path is still available for constrained environments:
omi-med-stt transcribe-long consult.wav --chunk-seconds 25 --overlap 3
Model Access
The runtime defaults to these Hugging Face model repositories:
omi-health/omi-med-stt-v1omi-health/omi-med-stt-v1-mlx-q8omi-health/omi-med-stt-v1-mlxfor the full MLX export if explicitly selectedomi-health/omi-med-stt-v1-gguf
If the model repositories are private before launch, authenticate first:
huggingface-cli login
Pre-Publish Checks
Use these before pushing runtime changes to GitHub or publishing a new PyPI version. The checked-in unit tests do not download models and do not require audio files:
pip install -e ".[dev]"
python scripts/prepublish_check.py --skip-build
For real model smoke tests, put a small permitted audio file under
local_smoke/ or pass any private local path. local_smoke/ and audio/model
artifacts are git-ignored.
Apple Silicon Mac:
python scripts/prepublish_check.py --audio local_smoke/sample.wav --runtime mlx --skip-build
Linux with NVIDIA GPU:
python scripts/prepublish_check.py --audio local_smoke/sample.wav --runtime nemo --skip-build
python scripts/prepublish_check.py --audio local_smoke/sample.wav --runtime cpp:cuda --skip-build
Linux CPU:
python scripts/prepublish_check.py --audio local_smoke/sample.wav --runtime cpp:cpu --skip-build
Windows CPU, from PowerShell:
py -m scripts.prepublish_check --audio local_smoke\sample.wav --runtime cpp:cpu --skip-build
py -m scripts.prepublish_check --audio local_smoke\consult.wav --runtime cpp:cpu --long --skip-build
Long-audio smoke, using the runtime chunk path:
python scripts/prepublish_check.py --audio local_smoke/consult.wav --runtime cpp:cpu --long --skip-build
Windows NPU is currently an explicit expected skip. There is no NPU backend for
Omi Med STT yet; use cpp CPU on Windows, MLX on Apple Silicon, or NeMo on a
CUDA Linux machine.
The script writes the latest local report to
local_smoke/prepublish_last.json.
Documentation
Additional public-safe maintainer documentation lives under docs/:
docs/evaluation-summary.md— aggregate benchmark and runtime results.docs/release-log.md— runtime release history and launch readiness.docs/runtime-qa.md— test matrix and smoke checks.docs/repo-hygiene.md— what stays public vs private.
Agent/maintainer context is in AGENTS.md.
Attribution
This runtime uses or interoperates with:
- NVIDIA NeMo / Parakeet, for the base ASR architecture.
parakeet-mlx, for Apple Silicon MLX inference.parakeet.cpp, for GGUF / C++ / ggml inference.
See NOTICE.md.
License
Runtime code in this repository is MIT licensed.
Model weights are governed separately by the model repositories. Omi Med STT v1
is a derivative of nvidia/parakeet-tdt-0.6b-v2, whose model weights are
licensed under CC-BY-4.0.
Omi Med STT v1 — links
| Full evaluation & methodology | https://github.com/Omi-Health/omi-med-stt-runtime/blob/main/docs/evaluation-summary.md |
| Canonical model (NeMo, source of truth) | https://huggingface.co/omi-health/omi-med-stt-v1 |
| PyPI | https://pypi.org/project/omi-med-stt/ |
| Related: SOAP summarization (sum-small) | https://huggingface.co/omi-health/sum-small |
Built by Omi Health · Scribe product page · Contact: hello@omi.health
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file omi_med_stt-0.1.18.tar.gz.
File metadata
- Download URL: omi_med_stt-0.1.18.tar.gz
- Upload date:
- Size: 42.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cae1f4ff8fb17e7c784569e39ad582898b79ffbd50dd66b39f1c7fa7c0a570a5
|
|
| MD5 |
87d17abcde6c3de5dfe1892dea0dd656
|
|
| BLAKE2b-256 |
eb0e2a1a529e669eddc1e22c5bfe08874ec449205a92c38a458cb025e7f31e25
|
File details
Details for the file omi_med_stt-0.1.18-py3-none-any.whl.
File metadata
- Download URL: omi_med_stt-0.1.18-py3-none-any.whl
- Upload date:
- Size: 27.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
00ad9fe8999a69ad3940f05bcdcc93b78e2447aa8c8dbecb96f49939a864e06c
|
|
| MD5 |
040a5312fdefb416daf3379aeaf5cb62
|
|
| BLAKE2b-256 |
08e6fc75168f20a783de9b82d8bb0bc5df2fa0d0195abe35850bb4e212f64cf2
|