VoiceTyper speech recognition server based on FunASR ONNX and Tornado.

These details have not been verified by PyPI

Project links

Project description

VoiceTyper Server

voice-typer-server 是 VoiceTyper 的语音识别服务端。它负责接收客户端上传的音频，完成识别、标点恢复，并可选调用 LLM 做二次纠错。

亮点

本地运行，默认不依赖云端 ASR
流式识别（默认）：WebSocket 双通道——录音时 HUD 实时预览（跟嘴），松手后离线整段复识别产出准确最终结果
非流式识别（兼容）：HTTP POST，支持热词，供 Linux 客户端及非流式场景使用
内置中文识别和标点恢复默认模型
可选启用 API Key
可选接入 OpenAI 兼容 LLM 做纠错
支持 python -m、命令行和脚本三种启动方式

适合谁

如果你只是想把 VoiceTyper 跑起来，这个 README 已经够用。

如果你要改代码、打包发布或二次开发，文末有开发者入口。

Python 版本

最低支持：Python 3.10
推荐版本：Python 3.12+

快速开始

最常见的用法是：

安装服务端
启动服务端
让客户端连接 127.0.0.1:6008

安装与启动

推荐方式：使用脚本

适合 Linux 和 macOS 用户。

cd server
./scripts/voice_typer_server.sh setup
./scripts/voice_typer_server.sh run

脚本会：

创建虚拟环境 ~/.venvs/voice-typer
安装 voice-typer-server
用一组默认参数启动服务

默认启动参数：

--host 127.0.0.1
--port 6008
--device cpu

命令行覆盖示例：

./scripts/voice_typer_server.sh run --host 0.0.0.0 --onnx-threads 2

直接使用 Python 包

如果你已经安装了 voice-typer-server，可以直接运行：

python -m voice_typer_server --host 127.0.0.1 --port 6008

或：

voice-typer-server --host 127.0.0.1 --port 6008

查看帮助：

voice-typer-server --help

Docker

如果你更喜欢容器方式：

docker build -t voice-typer-server:latest .
docker run -d -p 6008:6008 --name voice-typer voice-typer-server:latest

Windows 服务

在 Windows 上可将 VoiceTyper Server 注册为系统服务，实现开机自启和后台运行。

安装与注册

REM 1. 安装环境（自动安装 pywin32 依赖）
scripts\voice_typer_server.bat setup --local

REM 2. 注册为 Windows 服务（需管理员权限，默认开机自启、默认流式模式）
REM    Windows 原生客户端支持流式，无需额外参数；若连接的是 Linux 等非流式客户端，请追加 --no-streaming
scripts\voice_typer_server.bat install -- --host 127.0.0.1 --port 6008 --device cpu

REM 启用 LLM 校对（推荐，可显著提升识别准确率）
scripts\voice_typer_server.bat install -- --host 127.0.0.1 --port 6008 --device cpu ^
    --llm-base-url https://api.openai.com/v1 ^
    --llm-api-key sk-xxx ^
    --llm-model gpt-4o-mini

REM 手动启动模式（不随系统启动）
scripts\voice_typer_server.bat install --startup manual -- --host 127.0.0.1 --port 6008

管理服务

REM 启动服务
scripts\voice_typer_server.bat start

REM 停止服务
scripts\voice_typer_server.bat stop

REM 卸载服务
scripts\voice_typer_server.bat uninstall

也可以通过 services.msc（服务管理器）图形化操作，服务名为 VoiceTyper 语音识别服务。

服务日志

服务模式下日志写入文件：%USERPROFILE%\.voice-typer\server.log，最大 10MB，保留 3 个备份。

注意事项

安装、卸载、启停服务均需要管理员权限
服务默认以 LocalSystem 账户运行。如果模型已缓存在当前用户目录下，首次启动可能需要重新下载
修改运行参数需先卸载再重新安装服务

常用启动参数

--host：监听地址，默认 127.0.0.1
--port：监听端口，默认 6008
--streaming / --no-streaming：识别模式，默认流式（WebSocket）；--no-streaming 切换为非流式（HTTP）
--device：cpu / cuda / cuda:N
--model：流式预览模型（默认 paraformer-zh-streaming）或非流式识别模型（默认 paraformer-zh）
--offline-model：仅流式模式，松手后用于整段复识别的离线模型，默认 paraformer-zh
--chunk-size：流式 chunk 大小，格式 left,current,right（单位 60ms 帧），默认 0,10,5
--punc-model：标点模型，默认 ct-punc，设为 none 可禁用
--onnx-threads：ONNX Runtime 线程数，默认 4
--api-keys：API Key 列表，逗号分隔
--llm-base-url、--llm-api-key、--llm-model：启用 LLM 纠错

示例：

# 流式模式（默认）
voice-typer-server --host 0.0.0.0 --device cpu --api-keys akey

# 非流式兼容模式（支持热词）
voice-typer-server --no-streaming --host 0.0.0.0 --device cpu --api-keys akey

常见使用场景

仅本机使用

这是默认场景：

voice-typer-server --host 127.0.0.1 --port 6008

此时本机客户端可直接访问，一般不需要额外配置鉴权。

局域网远程使用

如果客户端和服务端不在同一台机器上，建议启用 API Key：

voice-typer-server --host 0.0.0.0 --api-keys your_key

然后在客户端配置中填入：

服务端 IP
对应端口
api_key

启用 LLM 纠错

voice-typer-server \
  --llm-base-url https://api.openai.com/v1 \
  --llm-api-key sk-xxx \
  --llm-model gpt-4o-mini

客户端再启用 llm_recorrect 即可。

接口

`/health`（GET）

通用健康检查，返回 {"status":"ok","ready":bool,"streaming":bool,"llm_enabled":bool}。

流式模式（默认）：`/recognize/stream`（WebSocket）

WebSocket 端点，客户端与服务端保持长连接，边发音频边获取识别片段。

协议概要：

连接后发送 {"type":"start","hotwords":"","sample_rate":16000}
录音期间持续发送 binary 帧（float32 PCM，每帧约 600ms = 9600 samples）
松开热键后发送 {"type":"finalize"}
服务端返回若干 {"type":"partial","text":"...","seq":N}（逐字预览，来自流式模型）和最终 {"type":"final","text":"...","asrElapsed":0.82}（准确结果，来自对完整音频的离线整段复识别）

两通道说明

消息类型	识别模型	用途	是否插入目标程序
`partial`	流式模型（`paraformer-zh-streaming`）	HUD 实时预览，跟嘴显示	否
`final`	离线整段模型（`paraformer-zh`）	准确最终结果，含标点和 LLM 纠错	是

注意：热词（hotwords）仅对离线整段模型（final）生效；流式模型本身不支持热词。

非流式模式（`--no-streaming`）：`/recognize`（HTTP POST）

提交整段音频，返回完整识别结果。支持热词。

推荐方式：

Content-Type: application/octet-stream
请求体直接放 16kHz float32 原始音频字节

可选参数：

请求头 X-Hotwords：URL-encoded 热词（空格分隔）
查询参数 llm_recorrect=true|false

同时也兼容旧版 multipart/form-data 上传。

示例：

curl -X POST "http://127.0.0.1:6008/recognize?llm_recorrect=false" \
     -H "Content-Type: application/octet-stream" \
     --data-binary @test.float32

带 API Key：

curl -X POST http://127.0.0.1:6008/recognize \
     -H "Authorization: Bearer your-api-key" \
     -F "audio=@test.wav"

模型与运行说明

服务端使用 onnxruntime
流式模式同时加载两个模型：
- paraformer-zh-streaming（--model）：产出 partial 预览
- paraformer-zh（--offline-model）：松手后对完整音频复识别，产出 final，支持热词，含标点
非流式模式仅加载一个模型：
- paraformer-zh（--model）：整段识别，支持热词，含标点
默认标点模型：ct-punc（仅挂在最终识别模型上，不重复加载）

短名会自动映射到官方 ONNX 模型，首次使用会从 ModelScope 自动下载。

如果模型目录中只有 model_quant.onnx，服务端会自动使用量化模型。

性能优化

NVIDIA GPU 加速

使用 CUDA 加速识别：

voice-typer-server --device cuda
# 多卡指定：
voice-typer-server --device cuda:1

内存优化

流式模式同时加载流式预览模型和离线识别模型，内存占用约比非流式多 220MB。如果内存紧张，可以：

切换到非流式模式（--no-streaming），仅加载一个模型
关闭标点模型，可降低部分资源占用：

voice-typer-server --punc-model none

常见问题

服务启动了，但客户端连不上

检查服务端实际监听地址
检查客户端配置中的 host 和 port
本机部署时，应优先使用 127.0.0.1:6008

远程调用返回 401

检查是否配置了 --api-keys
检查客户端是否正确带上 Authorization: Bearer ...

首次启动较慢

首次运行可能会下载模型，这是正常现象。

Apple Silicon 为什么没有 MPS

当前服务端只支持：

cpu
cuda
cuda:N

在 Apple Silicon 上建议直接使用 cpu。

开发者说明

如果你要修改代码或发布包，请查看：

主要代码位置：

voice_typer_server/cli.py
voice_typer_server/app.py
voice_typer_server/recognizer.py
voice_typer_server/llm_client.py
voice_typer_server/auth.py
voice_typer_server/win_service.py — Windows 服务包装（仅 Windows）

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.4.1

Jul 7, 2026

1.4.0

Jul 5, 2026

1.3.0

May 28, 2026

1.2.0

May 16, 2026

1.1.0

May 15, 2026

1.0.0

Mar 15, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

voice_typer_server-1.4.1.tar.gz (29.0 kB view details)

Uploaded Jul 7, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

voice_typer_server-1.4.1-py3-none-any.whl (28.5 kB view details)

Uploaded Jul 7, 2026 Python 3

File details

Details for the file voice_typer_server-1.4.1.tar.gz.

File metadata

Download URL: voice_typer_server-1.4.1.tar.gz
Upload date: Jul 7, 2026
Size: 29.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for voice_typer_server-1.4.1.tar.gz
Algorithm	Hash digest
SHA256	`5bf2f56dc64a88fd8ffe12672f6a8d8f8f628d64d90c2b71bc063a90fb2abf61`
MD5	`240f7732883b8834d24887edf0d70c9b`
BLAKE2b-256	`17516aac17f085a16f63c74d2ebc34f3a4c86c93f1386e9231e5e763f10c6e41`

See more details on using hashes here.

File details

Details for the file voice_typer_server-1.4.1-py3-none-any.whl.

File metadata

Download URL: voice_typer_server-1.4.1-py3-none-any.whl
Upload date: Jul 7, 2026
Size: 28.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for voice_typer_server-1.4.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5a98d239e7eca29007c67c794fee6ede578cb63a169f76caa35c4aa9314190ed`
MD5	`2f527e74437eb96ab91f7bc7c37e5736`
BLAKE2b-256	`040c0755085958e2e7750c791cd8c401dd65803e6c0c60477a3a5cb995772a92`

See more details on using hashes here.

voice-typer-server 1.4.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

VoiceTyper Server

亮点

适合谁

Python 版本

快速开始

安装与启动

推荐方式：使用脚本

直接使用 Python 包

Docker

Windows 服务

安装与注册

管理服务

服务日志

注意事项

常用启动参数

常见使用场景

仅本机使用

局域网远程使用

启用 LLM 纠错

接口

/health（GET）

流式模式（默认）：/recognize/stream（WebSocket）

非流式模式（--no-streaming）：/recognize（HTTP POST）

模型与运行说明

性能优化

NVIDIA GPU 加速

内存优化

常见问题

服务启动了，但客户端连不上

远程调用返回 401

首次启动较慢

Apple Silicon 为什么没有 MPS

开发者说明

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`/health`（GET）

流式模式（默认）：`/recognize/stream`（WebSocket）

非流式模式（`--no-streaming`）：`/recognize`（HTTP POST）